Menu Close

SCOM Agent Initiated Maintenance mode with SCCM Maintenance Windows

image

 

Quick Download:  https://gallery.technet.microsoft.com/SCOM-Maintenance-Mode-3563f160

 

This MP was an idea from Jason Daggett, the original work he did is at: https://gallery.technet.microsoft.com/SCCM-Service-Window-to-12a6c8a5

I have simply modified his work in a few ways, which I will explain below.

This solution includes a Management Pack, a PowerShell script, and the MP guide.  The Management pack primarily contains a rule that looks for a specific event ID 9999, with special parameters, which will be used to trigger maintenance mode on the management servers.  When a special event 9999 is found, the write action response runs a PowerShell script on the parent management server, placing the agent into maintenance mode.   You can use this to trigger maintenance mode at any time from any number of tools, as long as they can create this event.  This would better than the older SCOM 2016 method of using a PowerShell cmdlet and the registry, which wasn’t really a good idea due to polling latency, and even the newer solution in SCOM 2019…. the default event does not provide enough data in my opinion, so I prefer this solution over it.

The PowerShell script sample I included just created the special Maintenance Mode trigger event.  It provides the example of how to incorporate this into your own custom scripts, such as a script you might have your patching system run in advance of any maintenance work, or could be run on demand by a user.

The second part of the MP is a rule that queries WMI on the agent, assuming you are using SCCM with maintenance windows.  If maintenance windows server found and meet our criteria, then the output will be that same event 9999 as above, which will trigger the standard SCOM maintenance mode.

First – the script: 

This creates the event 9999 in the Operations Manager event log:

image

The event contains specific parameters which are required to give the management servers side script the data needed for maintenance mode:

image

Param 1 is the entire description and what you see in the event log.

Param 2 is the duration

Param 3 is the reason

Param 4 is the comment

Param 5 is the account that created the event (triggering MM)

Param 6 is a local timestamp on the agent that the event was created

Param 7 is the computername that will be placed into MM

The script simply takes “Duration” as a parameter.  However, if no duration is given, then it will prompt the user for the duration for maintenance.  The other settings are hard coded in the script, but could be modified.

Next: The Management Pack

The MP has two primary rules.  One rule watches all agents for our special 9999 event and triggers SCOM Maintenance Mode.  The other rule monitors for SCCM Maintenance Windows on clients, and triggers the even when a matching maintenance window is found.  There are some settings you will need to configure on this rule.  First, the interval.  This rule runs on ALL agents, every 10 minutes.  It is DISABLED by default, so you can enable it for specific agents for testing.  In order to enable for all your SCOM agents, you need to ensure ALL SCOM agents have a SCCM client and you wish to trigger maintenance mode from SCCM Maintenance Windows.

You should be VERY careful with this rule.  Many customers have poor governance on their maintenance windows, and will have set some up to run VERY frequently, even when they don’t “do” anything.  You should not use this solution if that is your case.  You don’t want huge numbers of your agents to get put into maintenance mode (and hence no monitoring) when nothing is actually happening and there is no need for maintenance.  This also creates a large load on SCOM for a large number of agents going into and out of maintenance mode all the time on a frequent basis.  You might consider modifying the script to only trigger on VERY SPECIFIC maintenance windows you have set up for server patching.  You can examine this by looking at the events this script creates, which will output all your maintenance windows:

image

The things you can configure are in the Parameters section of the XML/Rule configuration:

<Parameters>
  <Parameter>
    <Name>ComputerName</Name>
    <Value>$Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/PrincipalName$</Value>
  </Parameter>
  <Parameter>
    <Name>MGName</Name>
    <Value>$Target/ManagementGroup/Name$</Value>
  </Parameter>
  <Parameter>
    <Name>MinDurationMinutes</Name>
    <Value>10</Value>
  </Parameter>
  <Parameter>
    <Name>MaxDurationMinutes</Name>
    <Value>1440</Value>
  </Parameter>
  <Parameter>
    <Name>TriggerAdvanceMinutes</Name>
    <Value>15</Value>
  </Parameter>             
</Parameters>

 

MinDurationMinutes is the minimum calculated duration of a SCCM Maintenance Window that will allow to trigger SCOM Maintenance Mode.  I have seen customers with zero duration, or 5 minute duration SCCM maintenance windows, which would NOT make sense to even attempt SCOM maintenance mode.

MaxDurationMinutes is the maximum calculated duration of a SCCM Maintenance Window that will allow to trigger SCOM Maintenance Mode.  By default, any Maintenance Window longer than 24 hours is ignored, and you can/should adjust this to your environment.

TriggerAdvanceMinutes is how far to “look ahead” towards the next SCCM Maintenance Window start time, to trigger SCOM Maintenance Mode in advance.  We don’t want to wait until the client is already INSIDE a SCCM Maintenance window to start SCOM maintenance mode.  This is because the process to get SCOM into Maintenance mode might take a while (up to 15 minutes depending on your environment, load, size, etc) so I default to start SCOM maintenance mode anytime a SCCM Service Window will be starting in the next 15 minutes.

The downside:

This solution might not scale well in really large environments with more than a couple thousand agents.  You need to evaluate and test.  Since we run a script writeaction on the Management Server, for each agent, your management servers with 1000 agents assigned would potentially run 1000 scripts all close to the same time and might overwhelm the management server.  It would be abnormal to have over 1000 agents all go into maintenance mode at the exact same time under normal circumstances…. and if you need that you generally should use SCOM groups and place SCOM groups into a schedule maintenance mode using the built in Scheduled maintenance in SCOM.  That said, the script on the management server is logged well (event ID 7777) and it will show you just how often this is happening in your environment.  You can customize the logging if needed, it is fairly verbose out of the box.

Additionally, if you don’t work closely with your SCCM team, they could create maintenance windows in SCCM that wreak havoc on your SCOM environment, flooding you with state changes from excessive maintenance mode triggering even when nothing is happening.  This can be very harmful to SCOM, so be careful.

8 Comments

  1. Rick Bywalski

    I just deployed this last week and it so far is working great. I am a little concerned about how well it will scale when we have a really big patch group running. I would be nice if I could exclude a few collections from being picked up by it.

    • Kevin Holman

      You can!

      One way – is to create groups in SCOM based on Collections in SCCM, and just schedule those groups for MM, instead of using this solution. I have a fragment for creating and populating groups based on a SQL query.

      Secondly – you could edit the powershell script – and add more criteria for what to bring into the $ServiceWindows object….. or script it to ignore specific maintenance windows with a specific ID, duration, etc.

      • Rick Bywalski

        I will have to look into it. The team had set up a few 24×7 windows on specific collectons which is causing a few minor issues at the moment. If I could exclude those collections that would be ideal. I like the MP over all so far as when we schedule our regular maintenance it makes automatic in SCOM what was a manual process in our current monitoring tool. I will look for your fragment and see if I can make it work. My other thought was remove the 24×7 windows and teach them to check the box on the deployments they are for to run the deployment outside a maintenance window. They dont require a reboot they just set it up so that the machine would install once it was put in that collection.

        • Kevin Holman

          Look at my new MP as well for this – it has a maxwindow threshold, if your big ones are always really long, and your important patching windows are not, then this can be used a solution to ignore super-long maintenance windows.

          • Rick Bywalski

            I installed your version of this in my test environment and it did not put anything into maintenance mode no matter how long the maintenance window was. I tested with a 24 hour window first since that is my pain point at the moment. Then I modified the window do be a 6 hour window and the machine I had in the collection in sccm did not enter maintenance mode even after I waited for about 40 minutes. I will do some more testing today but I would love to get your version of this working as it would solve all my biggest issues. Without my needing to modify how people have set up SCCM currently.

  2. Kevin Holman

    Rick – keep in mind, mine is disabled by default, and you need to enable it for a group, or across the board if putting in a small test environment. Additionally, it logs verbosely, so look at an agent and check the event logs for events.

  3. Rick Bywalski

    One thought I had after getting this working, I found out to my shock that I have machines out there that have SCOM but not SCCM which causes a warning error in the console. So rather than enable the rule globally could I create a discovery rule using MP Studio for machines that have SCCM agent then use that to populate a group then enable the rule for the machines in that group?

    • Kevin Holman

      Absolutely – that’s the intent. However, if you have servers not covered by SCCM, you might want to fix that. 🙂 I have a fragment for registry monitor, that will easily tell you which SCOM agents are missing the SCCM client.

Leave a Reply

Your email address will not be published. Required fields are marked *