Menu Close

Using a recovery in OpsMgr – Basic

This is a simple overview of using a recovery for a custom Monitor in OpsMgr

Lets say we create a simple service monitor in OpsMgr… for this example – I will use the Print Spooler service:

Create a new monitor, unit monitor, and choose windows services – Basic Service Monitor:

image

Choose an appropriate management pack to save it to… such as a Base OS custom rule MP you create.

Give it a name – such as “Check Windows Spooler Service” and choose a valid target, such as “Windows Server”

image

Browse the service name – and pick the Print Spooler (Spooler):

image

Accept defaults for health, and let it create an alert, or not – depending on your requirements.

Once the monitor is created…. open it up in the Authoring tab of the Ops console.  Choose the “Diagnostic and Recovery” tab.

Under “Configure Recovery Tasks” add a a recovery for Critical Health State.  Choose “Run Command” and click Next.

Give the recovery a name…. such as “Restart service” and click Next.

For the command line settings… we need to provide a path to the file we want to run.  For a simple service restart – we can use the “NET” command, as in “NET START (servicename)”  For the path – just specify the original executable – do not add any command line switches…. such as:  “%windir%\system32\net.exe”

Under “Parameters” – this is where we will add the command line switches…. such as “start spooler” in this case:

image

Click “Create”  Click OK.

Now – pick a managed agent – and stop the Spooler service.  This will create a state change for the monitor.  If you told the monitor to alert – it will also create an alert at this time.  As soon as the state change occurs, our recovery will run…. which should restart the service.

Check the system event log to view the activity.  I got the following two events:

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:24:44 AM
User:        N/A
Computer:    OMTERM
Description:
The Print Spooler service entered the stopped state.

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:25:04 AM
User:        N/A
Computer:    OMTERM
Description:
The Print Spooler service entered the running state.

So the service was down for about 20 seconds…. for the monitor to detect the unhealthy state, and then to run a recovery to restart the service.

Open health explorer for the computer object for the test machine, and find the “Print Spooler Service Check” monitor.  It should show up as healthy… if the recovery worked.  Select this monitor, and then click the “State Change Events” tab.  We should see the service is running currently as the last logged state change.  Find the “Service is Not running” state change just below the current one…. and in the details pane – we should be able to see the recovery output where the recovery task ran automatically, and logged the output:

image

So what if we want a more advanced recovery?  Perhaps we have a service that just doesn’t always start reliably on the first try.  Perhaps we want to try and start the service three time over a 3 minute period, and THEN create the alert?   This can be done…. but will have to be done using a custom script that provides this logic, and then create the alert, or creates an event, and then a rule will alert from the event created.

23 Comments

  1. Gurunath Reddy

    Hi Kevin,
    Need your help here-
    I wanted to run a recovery task to automatically login through RDP with the service account to an application server when it is logged out or restarted for any reason. I have the script which remotely executes and login to that Server successfully.But when I add it as a recovery task, it shows as succeeded but actually not. I suspect that it may be because the recovery task is being run on the application server directly because the monitor is targeted to that. Is there a way to execute the recovery task on any other server apart from the monitoring target.

    Thank you
    guru

  2. Dave

    the one thing I’ve never been able to understand is why the diagnostic and Recovery tab is missing from a log monitor. Even on scom 2019 the tab is still missing. Is there an easy way to add Diagnostic and Recovery tasks for a log monitor? Do we know if perhaps the tab will be added in 2019 UR1?

    • Kevin Holman

      Wow, I had no idea that was the case.

      Nothing will get into a UR, without a customer raising a bug escalation/case. Have you opened a case requesting this? Or opened a uservoice?

      Which example of a “log monitor” are you using?

  3. DAVID YANEZ

    using a simple event detection, timer reset. have not opened a case yet or a uservoice. essentially looking to monitor a sccm log file when patches fail to download….and when the criteria is met run a recovery task.

    • Kevin Holman

      You cannot. Alerts are sent on the statechange. Recoveries are triggered on the statechange. If you need data output from the recovery, then drop events using a recovery script and generate a new alert from those events, or develop a composite datasource that includes monitoring AND recovery actions in the datasource, before the statechange. Alerts can only use what is in the statechange context output.

  4. Vishal Shetty

    Thanks for this prompt response Kevin . The same answer applies for the Diagnostic task as well right ? or is there a way to transfer the Output of a diagnostic task to alert description using a property bag
    .

  5. Amit Kumar GUpta

    Hello All,
    I want to restart the Health service from the SCOm console it’self whenever it went into “stopping ,starting or stopped” state. I know it can be done from SCOM. could you guys please help me how to do that . is there any script need for that ? if yes then where we put the script in scom ? . Please help me with that .

  6. curtiss

    i object to the scom community’s use of the word “restart” when describing a recovery that *START*s a service that has already stopped. i don’t “restart” a laptop when it’s powered off, do i? 🙂 i want to *restart a service when it hits a CPU threshold; that is “stop it and start it”. any examples of this? from a command line i can do “net stop spooler & net start spooler” but scom doesn’t seem to like this. do i have to create a recovery on the CPU monitor that NET STOPs the service and another recovery on the availability monitor that NET STARTs the stopped service? seems like that would be prone to delay.

    • Kevin Holman

      1. This is an odd rant. I give your rant 2 out of 5 stars. Your rant needs a restart. 🙂

      2. You can do net start & net stop, but I dont like using it….. it doesn’t handle errors. I recommend looking at my service monitoring fragments with recoveries and adapt the script. The first thing I do is get the current running state, you could easily modify it to handle the stop, add sleeps and checks like I do after startup attempt.

    • Curtiss

      full path to file:
      c:\windows\system32\windowspowershell\v1.0\powershell.exe

      Parameters:
      -command “&{restart-service $Target/Property[Type=”MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService”]/ServiceName$}”

      working directory:

      c:\windows\system32\windowspowershell\v1.0\

    • Kevin Holman

      No, not easily. The alert is generated by the StateChange on a monitor. The diagnostics and recoveries are also triggered by the statechange.

      The only way to get this data into the alert, would be to have some mechanism where the recovery runs, and outputs this data as context into something like an event. Then a rule could be written to look for that event, then run a script writeaction on the management server to find the previous alert, and modify something like the custom fields, adding the data.

      The best practice, if you do not like an alert and wish to have more context inside the alert, is to re-write the monitor or rule to capture the additional data in the datasource, and output that data to the alert description.

  7. Joe P

    Hey Kevin, in working with fragments and building MP’s when we put recovery tasks in them we do not see them show up under properties. Is this expected behavior as part of the MP or should the recovery task show up in the properties.

  8. NVKumar

    Hello Kevin,

    Please provide guidance on how to attach this diagnostic output which can be seen in SCOM console can be send to SCOM Email notification as well along with alert notification?

    • Kevin Holman

      You cant. Alerts are already generated when diagnostics are run. You’d need to re-write the monitor to include diagnostics and outputs in the initial datasource.

  9. Manoj S

    Hi Kevin

    Will there any performance impact on the Management Servers if we have around 15-20 Recovery Tasks configured. Also, can I use the Invoke-Command to make a script run on the Mangement Server itself.

Leave a Reply

Your email address will not be published.