Menu Close

Automating SCOM maintenance mode for agents assigned to a Gateway when their Gateway is unavailable

image

Quick download:  SCOM.GatewayMaintenanceMode (github.com)

 

This is an often requested solution – how do we stop alert storms from agents being unavailable when there is a network outage causing it – not a real server down issue?

This post is dedicated to a Management Pack based solution for such.

This MP was part of the SCOMAThon HackaSCOM 2021 challenge.  You can watch the video presentation of this management pack here:

https://youtu.be/md21GGxAzUo?t=1556

Challenge:

When a Gateway is not available:

  • Suppress alerts from all agents that report to that Gateway
  • Trigger maintenance mode for these agents
  • Remove maintenance mode when the GW is available again

Components:

  • Monitor availability for each Gateway with ping
  • Recovery to START maintenance mode
  • Recovery to STOP maintenance mode
  • Group for each Gateway
  • Groups are populated with agents assigned to that Gateway
  • Groups are placed into and out of Maintenance mode
  • Views for Alerts, State, Group memberships

 

You will need to customize this MP to test it in your lab environment.  It will not work out of the box as it is customized for my lab server names.

 

Out of the box the views look like this:

image

image

image

image

 

There is a monitor that pings each Gateway in your environment:

image

 

If we detect the GW is unavailable, there is a recovery on this monitor to set the corresponding group for this Gateway’s agents into maintenance mode:

image

image

 

You can see above the Group is in Maintenance mode.  Below we will see the agents in that group in MM, with a special comment:

image

image

 

When the network outage is recovered – there is another recovery that runs to end maintenance mode:

image

 

What’s next?

The biggest issue with this solution is that is requires the customer to edit the XML to customize the MP for their environment.  It needs to be made more dynamic in nature to accommodate any environment.

image

 

Quick download:  SCOM.GatewayMaintenanceMode (github.com)

4 Comments

    • Kevin Holman

      It should be useable. You might have to adjust the minimum version requirements in the manifest, I don’t know. I don’t test with SCOM 2012 as that goes out of support this year.

      Keep in mind this is an example MP, and very “version 1.0”.

  1. Pingback:Automating Maintenance Mode for Computers Behind a Gateway: SCOM Management Pack by Kevin Holman - SCOMathon

  2. Saiyad Rahim

    Hi Kevin,

    What can be modified in here to put the Gateway and/or its connected Agents into Maintenance Mode manually when doing Windows Patching on the Gateway that always requires a reboot of the Gateway causing a Alert Storm?

Leave a Reply

Your email address will not be published. Required fields are marked *