Menu Close

SCOM Agent Initiated Maintenance mode with SCCM Maintenance Windows

image

 

Quick Download:  https://github.com/thekevinholman/MaintenanceModeFromSCCMWindow

This MP was an idea from Jason Daggett.  I have simply modified his work in a few ways, which I will explain below.

This solution includes a Management Pack, a PowerShell script, and the MP guide.  The Management pack primarily contains a rule that looks for a specific event ID 9999, with special parameters, which will be used to trigger maintenance mode on the management servers.  When a special event 9999 is found, the write action response runs a PowerShell script on the parent management server, placing the agent into maintenance mode.   You can use this to trigger maintenance mode at any time from any number of tools, as long as they can create this event.  This is a better solution than the older SCOM 2016 method of using a PowerShell cmdlet and the registry, which wasn’t really a good idea due to polling latency, and even the newer solution in SCOM 2019…. the default event does not provide enough data in my opinion, so I prefer this solution.

The PowerShell script sample I included just created the special Maintenance Mode trigger event.  It provides the example of how to incorporate this into your own custom scripts, such as a script you might have your patching system run in advance of any maintenance work, or could be run on demand by a user.

The second part of the MP is a rule that queries WMI on the agent, assuming you are using SCCM with maintenance windows.  If maintenance windows server found and meet our criteria, then the output will be that same event 9999 as above, which will trigger the standard SCOM maintenance mode.

First – the script: 

This creates the event 9999 in the Operations Manager event log:

image

The event contains specific parameters which are required to give the management servers side script the data needed for maintenance mode:

image

Param 1 is the entire description and what you see in the event log.

Param 2 is the duration

Param 3 is the reason

Param 4 is the comment

Param 5 is the account that created the event (triggering MM)

Param 6 is a local timestamp on the agent that the event was created

Param 7 is the computername that will be placed into MM

The script simply takes “Duration” as a parameter.  However, if no duration is given, then it will prompt the user for the duration for maintenance.  The other settings are hard coded in the script, but could be modified.

Next: The Management Pack

The MP has two primary rules.  One rule watches all agents for the special 9999 event – and triggers SCOM Maintenance Mode.  The other rule monitors for SCCM Maintenance Windows on clients, and writes the special 9999 event when a matching maintenance window is found.  There are some settings you will need to configure on this rule.  First, the interval.  This rule runs on ALL agents, every 10 minutes.  It is DISABLED by default, so you can enable it for specific agents for testing.  In order to enable for all your SCOM agents, you need to ensure ALL SCOM agents have a SCCM client and you wish to trigger maintenance mode from SCCM Maintenance Windows.

You should be VERY careful with this rule.  Many customers have poor governance on their maintenance windows, and will have set some up to run VERY frequently, even when they don’t “do” anything.  You should not use this solution if that is your case.  You don’t want huge numbers of your agents to get put into maintenance mode (and hence no monitoring) when nothing is actually happening and there is no need for maintenance.  This also creates a large load on SCOM for a large number of agents going into and out of maintenance mode all the time on a frequent basis.  You might consider modifying the script to only trigger on VERY SPECIFIC maintenance windows you have set up for server patching.  You can examine this by looking at the events this script creates, which will output all your maintenance windows:

image

The things you can configure are in the Parameters section of the XML/Rule configuration:

<Parameters>
  <Parameter>
    <Name>ComputerName</Name>
    <Value>$Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/PrincipalName$</Value>
  </Parameter>
  <Parameter>
    <Name>MGName</Name>
    <Value>$Target/ManagementGroup/Name$</Value>
  </Parameter>
  <Parameter>
    <Name>MinDurationMinutes</Name>
    <Value>10</Value>
  </Parameter>
  <Parameter>
    <Name>MaxDurationMinutes</Name>
    <Value>1440</Value>
  </Parameter>
  <Parameter>
    <Name>TriggerAdvanceMinutes</Name>
    <Value>15</Value>
  </Parameter>             
</Parameters>

 

MinDurationMinutes is the minimum calculated duration of a SCCM Maintenance Window that will allow to trigger SCOM Maintenance Mode.  I have seen customers with zero duration, or 5 minute duration SCCM maintenance windows, which would NOT make sense to even attempt SCOM maintenance mode.

MaxDurationMinutes is the maximum calculated duration of a SCCM Maintenance Window that will allow to trigger SCOM Maintenance Mode.  By default, any Maintenance Window longer than 24 hours is ignored, and you can/should adjust this to your environment.

TriggerAdvanceMinutes is how far to “look ahead” towards the next SCCM Maintenance Window start time, to trigger SCOM Maintenance Mode in advance.  We don’t want to wait until the client is already INSIDE a SCCM Maintenance window to start SCOM maintenance mode.  This is because the process to get SCOM into Maintenance mode might take a while (up to 15 minutes depending on your environment, load, size, etc) so I default to start SCOM maintenance mode anytime a SCCM Service Window will be starting in the next 15 minutes.

The downside:

This solution might not scale well in really large environments with more than a couple thousand agents.  You need to evaluate and test.  Since we run a script writeaction on the Management Server, for each agent, your management servers with 1000 agents assigned would potentially run 1000 scripts all close to the same time and might overwhelm the management server.  It would be abnormal to have over 1000 agents all go into maintenance mode at the exact same time under normal circumstances…. and if you need that you generally should use SCOM groups and place SCOM groups into a schedule maintenance mode using the built in Scheduled maintenance in SCOM.  That said, the script on the management server is logged well (event ID 7777) and it will show you just how often this is happening in your environment.  You can customize the logging if needed, it is fairly verbose out of the box.

Additionally, if you don’t work closely with your SCCM team, they could create maintenance windows in SCCM that wreak havoc on your SCOM environment, flooding you with state changes from excessive maintenance mode triggering even when nothing is happening.  This can be very harmful to SCOM, so be careful.

29 Comments

  1. Rick Bywalski

    I just deployed this last week and it so far is working great. I am a little concerned about how well it will scale when we have a really big patch group running. I would be nice if I could exclude a few collections from being picked up by it.

    • Kevin Holman

      You can!

      One way – is to create groups in SCOM based on Collections in SCCM, and just schedule those groups for MM, instead of using this solution. I have a fragment for creating and populating groups based on a SQL query.

      Secondly – you could edit the powershell script – and add more criteria for what to bring into the $ServiceWindows object….. or script it to ignore specific maintenance windows with a specific ID, duration, etc.

      • Rick Bywalski

        I will have to look into it. The team had set up a few 24×7 windows on specific collectons which is causing a few minor issues at the moment. If I could exclude those collections that would be ideal. I like the MP over all so far as when we schedule our regular maintenance it makes automatic in SCOM what was a manual process in our current monitoring tool. I will look for your fragment and see if I can make it work. My other thought was remove the 24×7 windows and teach them to check the box on the deployments they are for to run the deployment outside a maintenance window. They dont require a reboot they just set it up so that the machine would install once it was put in that collection.

        • Kevin Holman

          Look at my new MP as well for this – it has a maxwindow threshold, if your big ones are always really long, and your important patching windows are not, then this can be used a solution to ignore super-long maintenance windows.

          • Rick Bywalski

            I installed your version of this in my test environment and it did not put anything into maintenance mode no matter how long the maintenance window was. I tested with a 24 hour window first since that is my pain point at the moment. Then I modified the window do be a 6 hour window and the machine I had in the collection in sccm did not enter maintenance mode even after I waited for about 40 minutes. I will do some more testing today but I would love to get your version of this working as it would solve all my biggest issues. Without my needing to modify how people have set up SCCM currently.

  2. Kevin Holman

    Rick – keep in mind, mine is disabled by default, and you need to enable it for a group, or across the board if putting in a small test environment. Additionally, it logs verbosely, so look at an agent and check the event logs for events.

  3. Rick Bywalski

    One thought I had after getting this working, I found out to my shock that I have machines out there that have SCOM but not SCCM which causes a warning error in the console. So rather than enable the rule globally could I create a discovery rule using MP Studio for machines that have SCCM agent then use that to populate a group then enable the rule for the machines in that group?

    • Kevin Holman

      Absolutely – that’s the intent. However, if you have servers not covered by SCCM, you might want to fix that. 🙂 I have a fragment for registry monitor, that will easily tell you which SCOM agents are missing the SCCM client.

  4. Nico Weytens

    I created something similar by having an MP pick up a custom 999 event a few mins before doing a reboot and create an informational alert for it.
    Our connector would pick up this alert and set this agent in maintenance mode through PowerShell.
    Our largest patch group is about 800 servers, and I never observed performance issues with that. Indeed, for larger environments it could become an issue.

    But since SCOM 2016 has scheduled maintenance mode, we abandoned our previous approach… For our use case the scheduled MM seems straight-forward and a more robust mechanism.

    • kevinholman

      I agree. However I have customers with patch windows, but they don’t do patching every window. Others have global servers and grouping the time zones and patch groups gets tedious or unreliable for them.

  5. Rick Bywalski

    Is there a way to detect if a machine is removed from a collection with a maintenance window and remove it from maintenance mode in SCOM.

    • Kevin Holman

      That would be REALLY hard. Because once the agent is in MM, we not longer run workflows to take any actions. So even if we DID have a workflow running that was not in maintenance mode, and it could detect this change, it would be likely not possible to take any action. You’d almost need to change the way you do your solution, having the MS query the SCCM collections directly.

  6. Prabh

    How to change the rule that triggers an event on client side via .mp to run once a Day? or at say for example every 30 minutes and not 5 minutes.

      • Prabh

        Every five mintes and event gets logged in the client’s eventvwr. Thats can be too many events for the client’s liking. Is there a way to control this?

        • Kevin Holman

          Absolutely. Edit the script so it doesn’t log the events.

          So then I ask, what is the purpose of the OpsMgr event log, if not to log OpsMgr events?

          🙂

  7. Jonas Lenntun

    We did a lot of performance testing with a similar function that also saves objects already in maintenance mode on the server. For example websites or databases.

    One key finding was to avoid the SCOM PowerShell modules to manage Maintenance Mode and work with the SDK instead when running the script. We did see a dramatic decrease in execution time of the script, that ment we could scale much more.

    • Kevin Holman

      Hi Jonas,

      This solution actually bypasses the built in SCOM agent side MM, and uses SDK commands run on the MS. Have you improved upon how this script works? If so I’d love to know what you did.

      • Jonas Lenntun

        We are doing the same, but we had to remove all OperationsManager PowerShell module references and only load the parts we needed for the script to run. This was a major improvement.

        These are the parts we load.

        [System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.EnterpriseManagement.OperationsManager.Common”) | Out-Null
        [System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.EnterpriseManagement.OperationsManager”) | Out-Null
        [System.Reflection.Assembly]::LoadWithPartialName(‘Microsoft.EnterpriseManagement.Core’) | Out-Null
        [System.Reflection.Assembly]::LoadWithPartialName(‘Microsoft.EnterpriseManagement.Runtime’) | Out-Null

        And then continue with the script like this.
        $MgmtServer = $Env:COMPUTERNAME
        $MGConnSetting = New-Object Microsoft.EnterpriseManagement.ManagementGroupConnectionSettings($MgmtServer)
        $MG = New-Object Microsoft.EnterpriseManagement.ManagementGroup($MGConnSetting)
        $Criteria=”Microsoft.SystemCenter.Agent”
        $ClassCriteria = New-Object Microsoft.EnterpriseManagement.Configuration.MonitoringClassCriteria(“Name=’$Criteria'”)
        $AgentClass = $MG.GetMonitoringClasses($ClassCriteria)[0]
        $Criteria=$ComputerDisplayName
        $MonitoringObjectCriteria = New-Object Microsoft.EnterpriseManagement.Monitoring.MonitoringObjectCriteria(“DisplayName=’$Criteria'”,$AgentClass)
        $Agent = $MG.GetMonitoringObjects($MonitoringObjectCriteria)

        etc etc….

          • Santosh

            I just like to know the script which is provided by Jonas to use sdk to place serve in mm instead of power shell module..
            Do we have full script for that? I saw after some line it ended with etc etc..

  8. Robert

    I had initially implemented Jason Daggett’s solution and hadn’t noticed an issue I am now seeing due to the more complete logging in this solution.
    We have 1 group of servers in it’s own OU in AD that are XenApp “spinup” servers, these servers have the SCCM agent removed after being “cloned” from a master image but still use SCOM for their monitoring. You mentioned that the Powershell was easy enough to modify but I am wondering a couple of things before I attempt that again.

    1. Would the code change go near the start of the “Begin MAIN script section” where you determine whether SCCM is installed? (I like the idea of the SCOM warning for a server without SCCM normally, but I need to exclude
    the spinups).
    I thought of creating a dynamic group in SCOM (it would only contain the servers in that 1 OU in AD that are the spinup servers) and checking if the current system is in that group, if so then don’t bother with maintenance mode. However, my initial attempt at referencing that dynamic group via Powershell wasn’t successful.

    2. I then thought about doing an AD lookup and compare whether the current server was in the 1 OU. I am not sure this is the right approach either as every server would be doing an AD lookup every time the script runs.

    Any thoughts and suggestions on the best way to implement this in my situation would be more than welcome.

    Thanks for all the solutions and help that you have provided to this community over the years.
    Regards,
    Robert

    • Kevin Holman

      I would create a dynamic group of Windows Computers where OU=foo, then use this group to disable this MM workflow.

      Problem solved.

      However, it still might trigger once (and throw the error about no SCCM) so you could alternatively match on a specific naming scheme, or do the LDAP lookup. I probably would not like a LDAP lookup triggering over and over. Or, you could make the rule to trigger alerts on missing SCCM agent, a Repeated event detection rule, and only throw an alert after 5 events. Lots of ways to skin this cat.

  9. Robert

    Just for clarification by “disable this MM workflow” do you mean via an override and not in the actual Powershell code?

  10. David Smith

    Thanks for this it works great. I modified the XML to allow override parameters for the SCCMServiceWindow rule. The issue I have is I incorporated the ProbeAction under the DataSourceModuleType to run the script for
    SCCMServiceWindow. The rule that uses this data source wants a WriteActions section. As a work around I setup a ScriptWriteAction that runs a dummy script. Is there a better way to do this?

  11. Rick Bywalski

    Still troubleshooting an issue but last round of machines we patched are not coming back out of maintenance mode even though the patching window has expired. I have tried manually removing them and they start back up again. Comments are saying this pack is what put them in MM.

    • Andy Perry

      Hi Rick,

      Just thinking out loud here so forgive me if this is not the case.

      In your last round of patching, was your Management Servers included in that round? Wondering if the MS’s have been put in to Maintenance Mode meaning that the workflows wouldn’t work to take out?

      Andrew

Leave a Reply

Your email address will not be published. Required fields are marked *