Menu Close

Automating SCOM maintenance mode for agents assigned to a Gateway when their Gateway is unavailable

image

Quick download:  SCOM.GatewayMaintenanceMode (github.com)

 

This is an often requested solution – how do we stop alert storms from agents being unavailable when there is a network outage causing it – not a real server down issue?

This post is dedicated to a Management Pack based solution for such.

This MP was part of the SCOMAThon HackaSCOM 2021 challenge.  You can watch the video presentation of this management pack here:

https://youtu.be/md21GGxAzUo?t=1556

Challenge:

When a Gateway is not available:

  • Suppress alerts from all agents that report to that Gateway
  • Trigger maintenance mode for these agents
  • Remove maintenance mode when the GW is available again

Components:

  • Monitor availability for each Gateway with ping
  • Recovery to START maintenance mode
  • Recovery to STOP maintenance mode
  • Group for each Gateway
  • Groups are populated with agents assigned to that Gateway
  • Groups are placed into and out of Maintenance mode
  • Views for Alerts, State, Group memberships

 

You will need to customize this MP to test it in your lab environment.  It will not work out of the box as it is customized for my lab server names.

 

Out of the box the views look like this:

image

image

image

image

 

There is a monitor that pings each Gateway in your environment:

image

 

If we detect the GW is unavailable, there is a recovery on this monitor to set the corresponding group for this Gateway’s agents into maintenance mode:

image

image

 

You can see above the Group is in Maintenance mode.  Below we will see the agents in that group in MM, with a special comment:

image

image

 

When the network outage is recovered – there is another recovery that runs to end maintenance mode:

image

 

What’s next?

The biggest issue with this solution is that is requires the customer to edit the XML to customize the MP for their environment.  It needs to be made more dynamic in nature to accommodate any environment.

image

 

Quick download:  SCOM.GatewayMaintenanceMode (github.com)

16 Comments

    • Kevin Holman

      It should be useable. You might have to adjust the minimum version requirements in the manifest, I don’t know. I don’t test with SCOM 2012 as that goes out of support this year.

      Keep in mind this is an example MP, and very “version 1.0”.

  1. Pingback:Automating Maintenance Mode for Computers Behind a Gateway: SCOM Management Pack by Kevin Holman - SCOMathon

  2. Saiyad Rahim

    Hi Kevin,

    What can be modified in here to put the Gateway and/or its connected Agents into Maintenance Mode manually when doing Windows Patching on the Gateway that always requires a reboot of the Gateway causing a Alert Storm?

  3. Pingback:Top 5 SCOM community recommendations: January SCOMathon Newsletter - SCOMathon

  4. Marlon

    Hi Kevin,

    How can I customize the MP to target GW in our environment? I tried to rename the SCOM group but the discovery is not running. Thanks.

  5. David Sjölund

    Awesome idea Kevin!
    I was wondering: We always deploy atleast two GW for HA/Failover for each untrusted domain.
    If one GW goes down, the agents failover to the other so triggering Maintenance Mode for a single GW failure is not correct.
    Would it be possible to perhaps create Resouce Pools for each untrusted domain and use the Resource Pool Availability monitor as source for Maintenance Mode instead?
    Or do you have a better idea? 🙂

    • Kevin Holman

      In a case like this – I’d rather ping the site router – this would be a better scenario that the “site is down”.

  6. Johan

    Nice solution Kevin!
    I’m thinking about the following. What happens to servers that are already in Maintenance Mode, that you don’t want to enable monitoring again when the gateway becomes healthy?

    • Kevin Holman

      They should stay in MM. Never versions of SCOM – were enhanced so that MM end date for objects will be retained if they were already in MM for an extended period.

  7. Jukka-Pekka Grohn

    I have tried to understand, how I can add my gw-servers to this management pack, but I can’t see that :(. Could it be possible create fragment to MP Author Pro / Studio for new gw-server? And I also hope that You would add documentation to Github, how to edit mp file to add new gw-servers to it (and I have many of them)?

    • Pieter van Blommestein

      Hi Jukka-Pekka,
      I see you were not answered to your question.

      @Kevin, will you be so kind as to provide some guidance here? I have 17 Gateways servers and need to add each separately, I assume. How do I create this?
      Or is there a way to pick this up through the system?
      I am using SCOM 2019 UR4, waiting for UR6 to come out and upgrade to UR5.
      I am also planning to upgrade the system to SCOM 2022

  8. Pieter van Blommestein

    @Kevin, will you be so kind as to provide some guidance here? I have 17 Gateways servers and need to add each separately, I assume. How do I create this?
    Or is there a way to pick this up through the system?
    I am using SCOM 2019 UR4, waiting for UR6 to come out and upgrade to UR5.
    I am also planning to upgrade the system to SCOM 2022

  9. Pieter van Blommestein

    Hi Kevin,
    This is an awesome MP but I need to know where I change the XML for my environment.
    Will you be so kind as to provide some guidance here? I have 17 Gateways servers and need to add each separately. How do I create or change this?

    Kind regards.
    Pieter

  10. Dean Ravenscroft

    Hi Kevin,

    There may be an issue that started in 2022 UR1 that can affect this process significantly (seconds to hours). It has to do with a change to a Stored Procedure. I have sent the full details and what was discussed with Microsoft, and what we did to resolve it via your Contact Me option, but it was kind of lengthy so not sure if I went over any character limit. Can you let me know if you didn’t receive it?

  11. Pieter van Blommestein

    Hi Kevin,
    This is an awesome MP but I need to know where I change the XML for my environment.
    Will you be so kind as to provide some guidance here? I have 17 Gateways servers and need to add each separately. How do I create or change this? Please can you respond to my query?

    Kind regards.
    Pieter

Leave a Reply

Your email address will not be published.