Monitoring a Process in SCOM can be pretty straightforward, or it can be pretty tricky depending on the application. In this article, I will run through some examples of how to be successful, and what to avoid.
First, SCOM provides a “Process Monitoring” template in the console, that works pretty well. It has a nice wizard based UI, that lets you pick each process by name, and alert when a process is not running when expected to be, or running when not expected, including duration of the running process, min and max expected process counts, and CPU and Memory monitors for each process.
There are a couple downsides to our process monitoring template. The biggest being that since it uses a group, and enables discovery of the “process class” for members of the group. However, if the group membership changes and servers are removed from the group, we do not “undiscover” the process class members that are removed. It can become messy over time, but simplicity comes at a cost.
Another challenge, is what if you wish to monitor some counter that isn’t CPU or Memory? Such as Handle Count? You are on your own.
You could always just use a Windows Performance Counter Monitor wizard in the console….. such as Consecutive Samples Over Threshold Monitor. Then just choose the performance counter you wish to monitor.
This works great, UNLESS there are multiple instances of the process. When there will only ever be a single instance of “MonitoringHost” running, then the monitor works exactly as intended. However, if there are TWO MonitoringHost (or whatever process) instances running, and one process is over a threshold while one process is not – bad things happen. With two processes, these are monitored in series and will cause the monitor to behave erratically, causing the monitor to “flip flop” back and forth multiple times in a single second. This does bad things like opens an alert then immediately closes it…. every 60 seconds!
To resolve these issues, I prefer targeting discovered application classes, for process monitoring. I have created some Management Pack Fragments to help with this.
First off is the fragment: Monitor.Process.mpx
This fragment monitors if the process is within the thresholds for Minimum expected processes running, and Maximum expected processes running. The monitor uses the built in System.ProcessInformationProvider to get information about the process, then allows you to input important information like ProcessName, MinProcessCount, MaxProcessCount. It also has configurable frequency for how often to check, and number of consecutive samples to check before alerting to control temporary transient conditions.
When you load this fragment into Visual Studio or Silect MP Author, you just need to replace/provide limited information:
Next is the fragment: Monitor.Process.Performance.ConsecSamples.TwoState.mpx
This is a performance fragment optimized for Windows Processes, which will allow you to monitor any performance counter for a process, and it will not matter if there is one or more processes running. It uses a Process module included with SCOM – the Microsoft.SystemCenter.Process.ConsecutiveSamplesThreshold.ErrorOnTooHigh monitortype, which is included in Microsoft.SystemCenter.ProcessMonitoring.Library. You simply need to provide the basic data for it to work:
What makes this monitortype so special that it can handle multiple instances? That is because it uses a special ConditionDetection . Most Monitors in SCOM end up using the ConditionDetection System.ExpressionFilter. However, one of the challenges with this ConditionDetection is that it processes multiple dataitems passed to it in sequential order, which causes the “flip flop”. Instead, in this example we are using System.LogicalSet.ExpressionFilter. What makes this CD special, is that it can receive the dataitems from the datasource as a “set” and then proceed or block based on whether “Any” dataitems match a condition, or require “All” dataitems match a condition. This is very useful when the datasource outputs multiple dataitems, such as when multiple instances of a process exist. We can say the monitor is healthy when “ALL” processes performance counters are under a threshold, but the monitor is unhealthy is “ANY” process breaches a threshold, with zero flip flop. Here is an example:
<ConditionDetection ID="ThresholdNotBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>LessEqual</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Passthrough</EmptySet> <SetEvaluation>All</SetEvaluation> </ConditionDetection> <ConditionDetection ID="ThresholdBreached" TypeID="System!System.LogicalSet.ExpressionFilter"> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double">Value</XPathQuery> </ValueExpression> <Operator>Greater</Operator> <ValueExpression> <Value Type="Double">$Config/Threshold$</Value> </ValueExpression> </SimpleExpression> </Expression> <EmptySet>Block</EmptySet> <SetEvaluation>Any</SetEvaluation> </ConditionDetection>
Which results in a single state change, regardless of how many processes exist:
You can try this fragment and more using Visual Studio or Silect MP Author. Find out more at
Authoring Management Packs – the fast and easy way, using Visual Studio??? – Kevin Holman’s Blog
MP Author Professional – Silect Software
Hello Kevin
A question about process monitoring, SCOM brings Process Monitoring by default. My question is to monitor a process that is in a state of “not responding”.
Even having a process in memory, would it be able to get this state?
Thank you.
To monitor whether a process is responding or not has very little to do with the process existence. It has to do with “how can you interact with the process to determine if the process is healthy or not”. Normally, the process needs some method of interaction – either by providing an API, or by responding to stimulus in some manner. Then, the typical method would be to write a PowerShell monitor to interact and then measure the response from the process.
Thanks Kevin, i will study something for this way.
Hi Kevin – thank you for this. I have a team that would like to pull CPU/Mem metrics for a process into another system from SCOM, which I can do via SQL, but can’t figure out the query to periodically pull this info. Do you have a template for pulling process performance from the DB? Thanks!
Never mind! Got it from your other blog post! 🙂
https://kevinholman.com/2016/11/11/scom-sql-queries/
Hi Kevin,
The second example using “Monitor.Process.Performance.ConsecSamples.TwoState.mpx” doesn’t seem to have the reference to System.LogicalSet.ExpressionFilter in the Fragment?
Is this using instead something like “Monitor.Performance.MultiInstance.ConsecSamples.TwoState.mpx”?
Hi Kevin – How would you collect or monitor the total sum of a process counter with multiple instances?
I would use PowerShell in a monitor and have the script add up all the process instance data, most likely.
Pingback:Top Process PowerShell script - Kevin Justin's Blog
“However, if the group membership changes and servers are removed from the group, we do not “undiscover” the process class members that are removed. ”
How do I remove a server from the discovery once it’s been added?
Disable the discovery for those instances, then run https://kevinholman.com/2021/05/13/demystifying-remove-scomdisabledclassinstance/
Wow, thanks for the quick reply on this old blog page. Appreciated 🙂