Menu Close

Monitoring Processes in SCOM

Monitoring a Process in SCOM can be pretty straightforward, or it can be pretty tricky depending on the application.  In this article, I will run through some examples of how to be successful, and what to avoid.

 

First, SCOM provides a “Process Monitoring” template in the console, that works pretty well.  It has a nice wizard based UI, that lets you pick each process by name, and alert when a process is not running when expected to be, or running when not expected, including duration of the running process, min and max expected process counts, and CPU and Memory monitors for each process.

There are a couple downsides to our process monitoring template.  The biggest being that since it uses a group, and enables discovery of the “process class” for members of the group.  However, if the group membership changes and servers are removed from the group, we do not “undiscover” the process class members that are removed.  It can become messy over time, but simplicity comes at a cost.

Another challenge, is what if you wish to monitor some counter that isn’t CPU or Memory?  Such as Handle Count?  You are on your own.

You could always just use a Windows Performance Counter Monitor wizard in the console….. such as Consecutive Samples Over Threshold Monitor.  Then just choose the performance counter you wish to monitor.

image

This works great, UNLESS there are multiple instances of the process.  When there will only ever be a single instance of “MonitoringHost” running, then the monitor works exactly as intended.  However, if there are TWO MonitoringHost (or whatever process) instances running, and one process is over a threshold while one process is not – bad things happen.  With two processes, these are monitored in series and will cause the monitor to behave erratically, causing the monitor to “flip flop” back and forth multiple times in a single second.  This does bad things like opens an alert then immediately closes it…. every 60 seconds!

 

To resolve these issues, I prefer targeting discovered application classes, for process monitoring.  I have created some Management Pack Fragments to help with this.

First off is the fragment: Monitor.Process.mpx

This fragment monitors if the process is within the thresholds for Minimum expected processes running, and Maximum expected processes running.  The monitor uses the built in System.ProcessInformationProvider to get information about the process, then allows you to input important information like ProcessName, MinProcessCount, MaxProcessCount.  It also has configurable frequency for how often to check, and number of consecutive samples to check before alerting to control temporary transient conditions.

When you load this fragment into Visual Studio or Silect MP Author, you just need to replace/provide limited information:

image

 

Next is the fragment:  Monitor.Process.Performance.ConsecSamples.TwoState.mpx

This is a performance fragment optimized for Windows Processes, which will allow you to monitor any performance counter for a process, and it will not matter if there is one or more processes running.  It uses a Process module included with SCOM – the Microsoft.SystemCenter.Process.ConsecutiveSamplesThreshold.ErrorOnTooHigh monitortype, which is included in Microsoft.SystemCenter.ProcessMonitoring.Library.  You simply need to provide the basic data for it to work:

image

 

What makes this monitortype so special that it can handle multiple instances?  That is because it uses a special ConditionDetection .  Most Monitors in SCOM end up using the ConditionDetection System.ExpressionFilterHowever, one of the challenges with this ConditionDetection is that it processes multiple dataitems passed to it in sequential order, which causes the “flip flop”.  Instead, in this example we are using System.LogicalSet.ExpressionFilter.  What makes this CD special, is that it can receive the dataitems from the datasource as a “set” and then proceed or block based on whether “Any” dataitems match a condition, or require “All” dataitems match a condition.  This is very useful when the datasource outputs multiple dataitems, such as when multiple instances of a process exist.  We can say the monitor is healthy when “ALL” processes performance counters are under a threshold, but the monitor is unhealthy is “ANY” process breaches a threshold, with zero flip flop.  Here is an example:

<ConditionDetection ID="ThresholdNotBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>LessEqual</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Passthrough</EmptySet> <SetEvaluation>All</SetEvaluation> </ConditionDetection> <ConditionDetection ID="ThresholdBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>Greater</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Block</EmptySet> <SetEvaluation>Any</SetEvaluation> </ConditionDetection>

Which results in a single state change, regardless of how many processes exist:

image

 

You can try this fragment and more using Visual Studio or Silect MP Author.  Find out more at

Authoring Management Packs – the fast and easy way, using Visual Studio??? – Kevin Holman’s Blog

Management Pack authoring the REALLY fast and easy way, using Silect MP Author and Fragments – Kevin Holman’s Blog

MP Author Professional – Silect Software

VSAE Made Easy with Fragments – YouTube

Management Pack Authoring using Fragments Webinar – YouTube

5 Comments

  1. Rodrigo

    Hello Kevin
    A question about process monitoring, SCOM brings Process Monitoring by default. My question is to monitor a process that is in a state of “not responding”.
    Even having a process in memory, would it be able to get this state?

    Thank you.

    • Kevin Holman

      To monitor whether a process is responding or not has very little to do with the process existence. It has to do with “how can you interact with the process to determine if the process is healthy or not”. Normally, the process needs some method of interaction – either by providing an API, or by responding to stimulus in some manner. Then, the typical method would be to write a PowerShell monitor to interact and then measure the response from the process.

  2. Mike Lovasco

    Hi Kevin – thank you for this. I have a team that would like to pull CPU/Mem metrics for a process into another system from SCOM, which I can do via SQL, but can’t figure out the query to periodically pull this info. Do you have a template for pulling process performance from the DB? Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *