Monitoring a Process in SCOM can be pretty straightforward, or it can be pretty tricky depending on the application. In this article, I will run through some examples of how to be successful, and what to avoid.
First, SCOM provides a “Process Monitoring” template in the console, that works pretty well. It has a nice wizard based UI, that lets you pick each process by name, and alert when a process is not running when expected to be, or running when not expected, including duration of the running process, min and max expected process counts, and CPU and Memory monitors for each process.
There are a couple downsides to our process monitoring template. The biggest being that since it uses a group, and enables discovery of the “process class” for members of the group. However, if the group membership changes and servers are removed from the group, we do not “undiscover” the process class members that are removed. It can become messy over time, but simplicity comes at a cost.
Another challenge, is what if you wish to monitor some counter that isn’t CPU or Memory? Such as Handle Count? You are on your own.
You could always just use a Windows Performance Counter Monitor wizard in the console….. such as Consecutive Samples Over Threshold Monitor. Then just choose the performance counter you wish to monitor.
This works great, UNLESS there are multiple instances of the process. When there will only ever be a single instance of “MonitoringHost” running, then the monitor works exactly as intended. However, if there are TWO MonitoringHost (or whatever process) instances running, and one process is over a threshold while one process is not – bad things happen. With two processes, these are monitored in series and will cause the monitor to behave erratically, causing the monitor to “flip flop” back and forth multiple times in a single second. This does bad things like opens an alert then immediately closes it…. every 60 seconds!
To resolve these issues, I prefer targeting discovered application classes, for process monitoring. I have created some Management Pack Fragments to help with this.
First off is the fragment: Monitor.Process.mpx
This fragment monitors if the process is within the thresholds for Minimum expected processes running, and Maximum expected processes running. The monitor uses the built in System.ProcessInformationProvider to get information about the process, then allows you to input important information like ProcessName, MinProcessCount, MaxProcessCount. It also has configurable frequency for how often to check, and number of consecutive samples to check before alerting to control temporary transient conditions.
When you load this fragment into Visual Studio or Silect MP Author, you just need to replace/provide limited information:
Next is the fragment: Monitor.Process.Performance.ConsecSamples.TwoState.mpx
This is a performance fragment optimized for Windows Processes, which will allow you to monitor any performance counter for a process, and it will not matter if there is one or more processes running. It uses a Process module included with SCOM – the Microsoft.SystemCenter.Process.ConsecutiveSamplesThreshold.ErrorOnTooHigh monitortype, which is included in Microsoft.SystemCenter.ProcessMonitoring.Library. You simply need to provide the basic data for it to work:
What makes this monitortype so special that it can handle multiple instances? That is because it uses a special ConditionDetection . Most Monitors in SCOM end up using the ConditionDetection System.ExpressionFilter. However, one of the challenges with this ConditionDetection is that it processes multiple dataitems passed to it in sequential order, which causes the “flip flop”. Instead, in this example we are using System.LogicalSet.ExpressionFilter. What makes this CD special, is that it can receive the dataitems from the datasource as a “set” and then proceed or block based on whether “Any” dataitems match a condition, or require “All” dataitems match a condition. This is very useful when the datasource outputs multiple dataitems, such as when multiple instances of a process exist. We can say the monitor is healthy when “ALL” processes performance counters are under a threshold, but the monitor is unhealthy is “ANY” process breaches a threshold, with zero flip flop. Here is an example:
<ConditionDetection ID="ThresholdNotBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>LessEqual</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Passthrough</EmptySet> <SetEvaluation>All</SetEvaluation> </ConditionDetection> <ConditionDetection ID="ThresholdBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>Greater</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Block</EmptySet> <SetEvaluation>Any</SetEvaluation> </ConditionDetection>
Which results in a single state change, regardless of how many processes exist:
You can try this fragment and more using Visual Studio or Silect MP Author. Find out more at