Menu Close

Using a recovery in OpsMgr – Basic

This is a simple overview of using a recovery for a custom Monitor in OpsMgr

Lets say we create a simple service monitor in OpsMgr… for this example – I will use the Print Spooler service:

Create a new monitor, unit monitor, and choose windows services – Basic Service Monitor:

image

Choose an appropriate management pack to save it to… such as a Base OS custom rule MP you create.

Give it a name – such as “Check Windows Spooler Service” and choose a valid target, such as “Windows Server”

image

Browse the service name – and pick the Print Spooler (Spooler):

image

Accept defaults for health, and let it create an alert, or not – depending on your requirements.

Once the monitor is created…. open it up in the Authoring tab of the Ops console.  Choose the “Diagnostic and Recovery” tab.

Under “Configure Recovery Tasks” add a a recovery for Critical Health State.  Choose “Run Command” and click Next.

Give the recovery a name…. such as “Restart service” and click Next.

For the command line settings… we need to provide a path to the file we want to run.  For a simple service restart – we can use the “NET” command, as in “NET START (servicename)”  For the path – just specify the original executable – do not add any command line switches…. such as:  “%windir%\system32\net.exe”

Under “Parameters” – this is where we will add the command line switches…. such as “start spooler” in this case:

image

Click “Create”  Click OK.

Now – pick a managed agent – and stop the Spooler service.  This will create a state change for the monitor.  If you told the monitor to alert – it will also create an alert at this time.  As soon as the state change occurs, our recovery will run…. which should restart the service.

Check the system event log to view the activity.  I got the following two events:

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:24:44 AM
User:        N/A
Computer:    OMTERM
Description:
The Print Spooler service entered the stopped state.

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:25:04 AM
User:        N/A
Computer:    OMTERM
Description:
The Print Spooler service entered the running state.

So the service was down for about 20 seconds…. for the monitor to detect the unhealthy state, and then to run a recovery to restart the service.

Open health explorer for the computer object for the test machine, and find the “Print Spooler Service Check” monitor.  It should show up as healthy… if the recovery worked.  Select this monitor, and then click the “State Change Events” tab.  We should see the service is running currently as the last logged state change.  Find the “Service is Not running” state change just below the current one…. and in the details pane – we should be able to see the recovery output where the recovery task ran automatically, and logged the output:

image

So what if we want a more advanced recovery?  Perhaps we have a service that just doesn’t always start reliably on the first try.  Perhaps we want to try and start the service three time over a 3 minute period, and THEN create the alert?   This can be done…. but will have to be done using a custom script that provides this logic, and then create the alert, or creates an event, and then a rule will alert from the event created.

13 Comments

  1. Gurunath Reddy

    Hi Kevin,
    Need your help here-
    I wanted to run a recovery task to automatically login through RDP with the service account to an application server when it is logged out or restarted for any reason. I have the script which remotely executes and login to that Server successfully.But when I add it as a recovery task, it shows as succeeded but actually not. I suspect that it may be because the recovery task is being run on the application server directly because the monitor is targeted to that. Is there a way to execute the recovery task on any other server apart from the monitoring target.

    Thank you
    guru

  2. Dave

    the one thing I’ve never been able to understand is why the diagnostic and Recovery tab is missing from a log monitor. Even on scom 2019 the tab is still missing. Is there an easy way to add Diagnostic and Recovery tasks for a log monitor? Do we know if perhaps the tab will be added in 2019 UR1?

    • Kevin Holman

      Wow, I had no idea that was the case.

      Nothing will get into a UR, without a customer raising a bug escalation/case. Have you opened a case requesting this? Or opened a uservoice?

      Which example of a “log monitor” are you using?

  3. DAVID YANEZ

    using a simple event detection, timer reset. have not opened a case yet or a uservoice. essentially looking to monitor a sccm log file when patches fail to download….and when the criteria is met run a recovery task.

    • Kevin Holman

      You cannot. Alerts are sent on the statechange. Recoveries are triggered on the statechange. If you need data output from the recovery, then drop events using a recovery script and generate a new alert from those events, or develop a composite datasource that includes monitoring AND recovery actions in the datasource, before the statechange. Alerts can only use what is in the statechange context output.

  4. Vishal Shetty

    Thanks for this prompt response Kevin . The same answer applies for the Diagnostic task as well right ? or is there a way to transfer the Output of a diagnostic task to alert description using a property bag
    .

  5. Amit Kumar GUpta

    Hello All,
    I want to restart the Health service from the SCOm console it’self whenever it went into “stopping ,starting or stopped” state. I know it can be done from SCOM. could you guys please help me how to do that . is there any script need for that ? if yes then where we put the script in scom ? . Please help me with that .

Leave a Reply

Your email address will not be published. Required fields are marked *