The “built in” service monitor in SCOM is hard-coded for how often it checks the service state, and how many service checks have to return “not running” before it alarms. This is a bit unfortunate, as customers would often want to customize this. This article will explain how.
All the built in service monitoring uses Monitors that reference the Microsoft.Windows.CheckNTServiceStateMonitorType monitortype, which is in the Microsoft.Windows.Library mp.
This MonitorType has a hard coded definition with <Frequency>30</Frequency> and <MatchCount>2</MatchCount>. This means by default, monitors that use this will inspect the service state every 30 seconds, and alarm when it is not running after two consecutive checks. However – the challenge is – Microsoft did not expose these values as override-able parameters.
What if you want to check the service every 60 seconds, and alarm only after it has been consistently down for 15 samples (15 consecutive minutes)? We can do that. We have the tools.
Basically – we need to create our own MonitorType –which will expose these. Here is an example:
<UnitMonitorType ID="Contoso.Demo.Service.MonitorType" Accessibility="Internal"> <MonitorTypeStates> <MonitorTypeState ID="Running" NoDetection="false" /> <MonitorTypeState ID="NotRunning" NoDetection="false" /> </MonitorTypeStates> <Configuration> <xsd:element name="ComputerName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" /> <xsd:element name="ServiceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" /> <xsd:element name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" /> <xsd:element name="CheckStartupType" minOccurs="0" maxOccurs="1" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" /> <xsd:element name="Samples" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" /> </Configuration> <OverrideableParameters> <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" /> <OverrideableParameter ID="CheckStartupType" Selector="$Config/CheckStartupType$" ParameterType="string" /> <OverrideableParameter ID="Samples" Selector="$Config/Samples$" ParameterType="int" /> </OverrideableParameters> <MonitorImplementation> <MemberModules> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.Win32ServiceInformationProvider"> <ComputerName>$Config/ComputerName$</ComputerName> <ServiceName>$Config/ServiceName$</ServiceName> <Frequency>$Config/IntervalSeconds$</Frequency> <DisableCaching>true</DisableCaching> <CheckStartupType>$Config/CheckStartupType$</CheckStartupType> </DataSource> <ProbeAction ID="Probe" TypeID="Windows!Microsoft.Windows.Win32ServiceInformationProbe"> <ComputerName>$Config/ComputerName$</ComputerName> <ServiceName>$Config/ServiceName$</ServiceName> </ProbeAction> <ConditionDetection ID="ServiceRunning" TypeID="System!System.ExpressionFilter"> <Expression> <Or> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Value Type="String">$Config/CheckStartupType$</Value> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value Type="String">false</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Integer">Property[@Name='StartMode']</XPathQuery> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value Type="Integer">2</Value> <!-- 0=BootStart 1=SystemStart 2=Automatic 3=Manual 4=Disabled --> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Integer">Property[@Name='State']</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="Integer">4</Value> <!-- 0=Unknown 1=Stopped 2=StartPending 3=StopPending 4=Running 5=ContinuePending 6=PausePending 7=Paused 8=ServiceNotFound 9=ServerNotFound --> </ValueExpression> </SimpleExpression> </Expression> </Or> </Expression> </ConditionDetection> <ConditionDetection ID="ServiceNotRunning" TypeID="System!System.ExpressionFilter"> <Expression> <And> <Expression> <Or> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Integer">Property[@Name='StartMode']</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="Integer">2</Value> <!-- 0=BootStart 1=SystemStart 2=Automatic 3=Manual 4=Disabled --> </ValueExpression> </SimpleExpression> </Expression> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Value Type="String">$Config/CheckStartupType$</Value> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">false</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Integer">Property[@Name='StartMode']</XPathQuery> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value Type="Integer">2</Value> <!-- 0=BootStart 1=SystemStart 2=Automatic 3=Manual 4=Disabled --> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Or> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Integer">Property[@Name='State']</XPathQuery> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value Type="Integer">4</Value> <!-- 0=Unknown 1=Stopped 2=StartPending 3=StopPending 4=Running 5=ContinuePending 6=PausePending 7=Paused 8=ServiceNotFound 9=ServerNotFound --> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> <SuppressionSettings> <MatchCount>$Config/Samples$</MatchCount> </SuppressionSettings> </ConditionDetection> </MemberModules> <RegularDetections> <RegularDetection MonitorTypeStateID="Running"> <Node ID="ServiceRunning"> <Node ID="DS" /> </Node> </RegularDetection> <RegularDetection MonitorTypeStateID="NotRunning"> <Node ID="ServiceNotRunning"> <Node ID="DS" /> </Node> </RegularDetection> </RegularDetections> <OnDemandDetections> <OnDemandDetection MonitorTypeStateID="Running"> <Node ID="ServiceRunning"> <Node ID="Probe" /> </Node> </OnDemandDetection> <OnDemandDetection MonitorTypeStateID="NotRunning"> <Node ID="ServiceNotRunning"> <Node ID="Probe" /> </Node> </OnDemandDetection> </OnDemandDetections> </MonitorImplementation> </UnitMonitorType>
Essentially – we have taken the hard-coded values, and changed them to allow a $Config/Value$ passed parameter. This will allow the monitor to PASS this value to the MonitorType, and be used in the DataSource or ConditionDetection. Even if you don’t fully understand that, it’s ok…. because I will be wrapping all this up in a consumable VSAE Fragment that is easy to implement.
The changes made to allow data to be passed in were:
<Frequency>$Config/IntervalSeconds$</Frequency>
<MatchCount>$Config/Samples$</MatchCount>
In the <Configuration> section we added:
<xsd:element name=”IntervalSeconds” type=”xsd:integer” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” />
<xsd:element name=”Samples” type=”xsd:integer” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” />
In the <OverrideableParameters> section – we added:
<OverrideableParameter ID=”IntervalSeconds” Selector=”$Config/IntervalSeconds$” ParameterType=”int” />
<OverrideableParameter ID=”Samples” Selector=”$Config/Samples$” ParameterType=”int” />
In the DataSource – one new value that should be added when using Microsoft.Windows.Win32ServiceInformationProvider and multiple runs, is the following:
<DisableCaching>true</DisableCaching>
This is very important, as this will cause the datasource to output data every time, even if nothing has changed. We need this for the number of samples (MatchCount) to work as desired.
Now that we have this new MonitorType – we can reference it in our own Monitors. Here is an example of a Monitor using this:
<UnitMonitor ID="Contoso.Demo.Spooler.Service.Monitor" Accessibility="Public" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Contoso.Demo.Service.MonitorType" ConfirmDelivery="false"> <Category>AvailabilityHealth</Category> <AlertSettings AlertMessage="Contoso.Demo.Spooler.Service.Monitor.Alert.Message"> <AlertOnState>Error</AlertOnState> <AutoResolve>true</AutoResolve> <AlertPriority>Normal</AlertPriority> <AlertSeverity>Error</AlertSeverity> <AlertParameters> <AlertParameter1>$Data/Context/Property[@Name='Name']$</AlertParameter1> <AlertParameter2>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter2> </AlertParameters> </AlertSettings> <OperationalStates> <OperationalState ID="Running" MonitorTypeStateID="Running" HealthState="Success" /> <OperationalState ID="NotRunning" MonitorTypeStateID="NotRunning" HealthState="Error" /> </OperationalStates> <Configuration> <ComputerName /> <ServiceName>spooler</ServiceName> <IntervalSeconds>30</IntervalSeconds> <CheckStartupType>true</CheckStartupType> <Samples>2</Samples> </Configuration> </UnitMonitor>
Once you implement this Monitor – you will see the new options exposed in overrides:
So the key takeaways are:
- The built in service monitoring does not allow for configurable Interval and Sample count.
- We can customize this using a custom MonitorType that allows for these variables to be passed in.
- Using the Microsoft.Windows.Win32ServiceInformationProvider we MUST set <DisableCaching>true</DisableCaching>
This example has been added to my Fragment Library for you to download at:
https://github.com/thekevinholman/FragmentLibrary
(see: Monitor.Service.WithAlert.FreqAndSamples.mpx)
To learn more about using MP Fragments, and how EASY they are to use with Visual Studio:
https://www.youtube.com/watch?v=9CpUrT983Gc
To make using fragments REALLY EASY, using Silect MP Author Pro, watch the video:
https://www.youtube.com/watch?v=E5nnuvPikFw
Hi Kevin,
I am curious can this be implemented with “Application Pool availability” monitor from IIS management pack? We often restart Application Pools and that results in alerts that close in a next interval. It would be good for us if we could create “Application Pool availability” monitor that will alert only in Application Pool is disabled after, say 5 checks/intervals?
Thanks,
Niksa
That will take a little work – but yes it can be made MUCH better. You could add a consecutive samples filter, or use$Config/Samples$ just like I did above for the service monitor – you will need to re-create the Monitortype like I did above. Silect has a super handy tool that will grab this workflow, and pull all these parts into a new MP for you to customize, if you have that. Otherwise you can forklift it into XML.
I am testing just such an Application Pool Availability monitor using the MatchCount expression filter. I added the Samples as a configuration element and an overrideable parameter to the data source module and to the unit monitor type, and to the member module that references the data source module and the MatchCount setting in the unit monitor type. Finally, add the Samples as a configuration item in the unit monitor. The result is an application pool up/down monitor with frequency and samples as overrideable parameters.
Since the MatchCount filter is part of the System.ExpressionFilter in the System Library, it can be used in any unit monitor where you don’t want to change state until a condition has occurred x times in a row.
That’s awesome Fred. I have done the exact same monitor for several customers, for things like app pools. It would be nice if we would go back and make MatchCount a standard and mandatory, ANYWHERE the System.ExpressionFilter is used…. and make it override-able.
Hi Kevin, I need your help.
Do I need to create the Classes first, then the type of monitor and then the service monitor? In the fragments folder I do not see the type of monitor.
I created the monito class and the monitor service for my application, and now I need to create the type of monitor?
This is my code for “Monitor service”: but I do not know how to insert the for the monitor to work…
AvailabilityHealth
Error
true
Normal
Error
$Data/Context/Property[@Name=’Name’]$
$Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/PrincipalName$
MSSQLSERVER
BCBA SQLApp MSSQLSERVER Service Monitor
Running
Not Running
BCBA SQLApp MSSQLSERVER service is not running
Service {0} is not running on {1}
ty for your help.
Hi Kevin
I have created a Custom App MP, based on some of your fragments and few of my own, and is working fine.
I have created the MP monitors to initially generate an alert when HealthStateChanges to Warning, Priority=High, Severity=MatchMonitorHealth – produces a Yellow Health state and Yellow alert. The App Manager can now create overrides if they want to have the Alert Red by changing the Severity from MatchMonitorHealth to Error/Critical.
The issue is, the Health against the Service still shows Yellow, while the Alert shows red.
If the App Manager creates an Override with enabling the Override parameter “Alert on State” and changes this from “The monitor is in a Warning State” to “The monitor is in a Critical State”, no alert is received. Makes sense because the Monitor is not in Health=Critical State, it is in the initial default Health=Warning State.
Can an overrideable parameter be created to change the Health State of the Service monitor from Warning to Critical, so that when the Priority and Severity are overridden to create Yellow or Red Alert, the Healthy State will be Warning/Yellow or Critical/Red to match the colour of the Alert?
Hope this makes sense.
Thanks for any insight.
It makes perfect sense – would totally be awesome, and is totally not possible. 🙁
Hello Kevin,
Is there a way to implement this new monitoring type to be used through the System Center Operation Manager GUI (Create a monitor)? I’m begging to explore the customization of SCOM through XML and this is not entirely clear to me.
Building and importing a Management pack with just the new Monitor Type did not seem to accomplish this.
Thanks in advance!
No – there isnt, not in the SCOM UI.
However, there IS a way to do this easily, by using my fragments and Silect’s MP Author Pro, or little less simple using Visual Studio Authoring extensions and my fragments.
https://kevinholman.com/2019/07/15/advanced-mp-authoring-mpu-may-2019/
Hello Kevin,
Is it possible to follow this path, and modify IIS 10 MP ? (add overrides for Interval Frequency and Samples )
Yes, absolutely. I’ve done this many times for customers. You have to disable the built-in monitors and replace them with new ones that have a match count suppression property added to the condition detection.
BUt this looks much more complicated, comparing it to this article, where we added really few things biult-in service monitor
It is the same concept. The thing that makes this easy, is that we are including the new monitortype into the fragment. So instead of using a single built in monitortype, you are creating a new one with each MP you deploy, Not as efficient, but very easy to use.
When replacing/disabling monitors in a sealed MP, and replacing those monitors (and potentially monitortypes, datasources, and probe actions) it is more complex. But it follows the same concept.