The Basic Service Unit Monitor is a very common monitor type to check the running status of any Windows Service.
The design of this Monitor by default – is to ONLY monitor the service – if the Startup Type is set to “Automatic”
This is because many services are set to manual or disabled by design, and we don’t want to consider those as a “failed” state creating noise out of the box. Therefore – they are ignored.
Probably the biggest complaint about this behavior – is the UI. Health explorer will show “Healthy” for the service monitor, EVEN if the service is not running, or doesn’t exist. Let me explain. If the service is set to Manual or Disabled, and not running – the monitor will initialize, ignore the service, and show healthy. This is probably not the best behavior and it would be nice if we could control this to show warning state or unmonitored state, but that is another topic. Additionally, if the service does not exist – the monitor will also show as healthy. It is simply ignored.
So – to recap – the default Service Monitor will only monitor Automatic startup type services:
Automatic | Running | Healthy |
Automatic | Not Running | Not Healthy |
Manual | Running | Healthy |
Manual | Not Running | Healthy |
Disabled | Not Running | Healthy |
Does Not Exist | Not Running | Healthy |
The PROPER way to monitor a service, NO MATTER the startup type – is to OVERRIDE the Unit monitor, setting the “Alert only if service startup type is automatic” to “False”
Doing the above will now monitor the service, no matter the startup type setting…. it will ignore the startup type and only check to ensure the service is running or not.
Using the override set to false:
Automatic | Running | Healthy |
Automatic | Not Running | Not Healthy |
Manual | Running | Healthy |
Manual | Not Running | Not Healthy |
Disabled | Not Running | Not Healthy |
Does Not Exist | Not Running | Not Healthy |
Now – let me explain why and how this works.
The Basic Service Monitor utilizes a specific MonitorType. The MonitorType is “CheckNTServiceStateMonitorType” from the Microsoft.Windows.Library. This MonitorType contains Member Modules of a DataSource, two expression based condition detections, and a Probe.
The datasource is “Win32ServiceInformationProvider” which is a native module to inspect a Windows Service. In the datasource, we will pass the ComputerName, the ServiceName, the Frequency, and the CheckStartupType. The Frequency default is 60 seconds… so we will inspect the service running state every 60 seconds. The “CheckStartupType” is simply a value of True or False, to examine the startup type or not.
The two condition detections are based on System.ExpressionFilter, which is a simple expression. This is where “CheckStartupType” comes into play.
The “ServiceRunning” CD (Condition Detection) uses a complex formula:
The above means – that we consider the monitor healthy (ServiceRunning): when ( ( ( CheckStartupType Does not = false ) AND ( StartMode Does not = 2 ) ) OR ( State = 4 ) )
Here – you can clearly see why we treat disabled or non-existent services as healthy, when CheckStartupType = True (which is the default)
When we override CheckStartupType to false, we can see why they change to Unhealthy…. as this condition will no longer match.
The “ServiceNotRunning” CD (Condition Detection) uses a complex formula:
The above means – that we consider the monitor unhealthy (ServiceNotRunning): when ( ( ( StartMode = 2 ) OR ( ( CheckStartupType = false ) AND ( StartMode Does not equal 2 ) ) ) AND ( State Does not equal 4 ) )
So for a service to be considered “Not Running”, it must be State = 4 (not running) *AND* also be ONE of the following… set to Automatic, *OR* set to Manual/Disabled and StartupType = false.
Ok – that explains the Monitor and how/why it works as it does, with and without the overrides.
There are some blogs out there which document the ability to edit the XML, and set <CheckStartupType>false</CheckStartupType>. This is hard coding the CheckStartupType value. I don’t recommend doing this – for a few reasons:
1. The override use gives more granular options, over which agents you need to set this to.
2. If you ever EDIT the monitor again in any way using the UI (even to change something simple like an alert property, severity, etc…) this will force the XML back to <CheckStartupType>true</CheckStartupType> and break your monitoring. That is simply because the UI expects this setting. As you can see – using the override in this case is far more effective.
Lets look at the XML of a Service Unit Monitor.
When we create the Service Monitor using the UI – it will look like the following:
<UnitMonitor ID="UIGeneratedMonitor8b9d2b9c2ada46a284429b5569b8185b" Accessibility="Public" Enabled="true" Target="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="MicrosoftWindowsLibrary6172210!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false"> <Category>Custom</Category> <OperationalStates> <OperationalState ID="UIGeneratedOpStateId8f7f4049ca124f9db3e4c0a4b3a1c730" MonitorTypeStateID="Running" HealthState="Success" /> <OperationalState ID="UIGeneratedOpStateId98d7e3348650477598849feb6776f583" MonitorTypeStateID="NotRunning" HealthState="Warning" /> </OperationalStates> <Configuration> <ComputerName>$Target/Host/Property[Type="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <ServiceName>Spooler</ServiceName> <CheckStartupType>true</CheckStartupType> </Configuration> </UnitMonitor>
When we create the Service Monitor using the Authoring Console – it will look like the following:
<UnitMonitor ID="Spooler.Auth.SpoolerSrv" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false"> <Category>AvailabilityHealth</Category> <OperationalStates> <OperationalState ID="Running" MonitorTypeStateID="Running" HealthState="Success" /> <OperationalState ID="NotRunning" MonitorTypeStateID="NotRunning" HealthState="Warning" /> </OperationalStates> <Configuration> <ComputerName /> <ServiceName>Spooler</ServiceName> <CheckStartupType /> </Configuration> </UnitMonitor>
Note that BOTH uses a slightly different method to set CheckStartupType value, but both have the same effect – setting it to true.
If the Monitor has NO configuration for CheckStartupType – then the override will not work and will always assume “True”.
So – if you want to monitor services set other than Automatic, use the override. It is the best way. Editing the XML and hard coding to false will also work, but your changes will be lost of anyone edits the monitor in any way in the future. Using the override, this will not happen.
There are some advanced scenarios where the basic design wont work well. The scenario that comes to mind, is a setting where you want to monitor the service in manual startup type, but if this service is clustered, you get alerts from the passive node. This is caused when you target your service monitor at a non-cluster aware class, such as “Windows Server Operating System”. On those cases, you should create a new class that is cluster aware, and then target your service monitor at the new custom class. Take a look at “SQL DBEngine” – it behaves perfectly in this way.
You should target your service monitors to the appropriate class. You should NEVER use “Windows Computer” or “Windows Server” as a monitoring target. If you use a widespread generic class, like “Windows Server Operating System” you must ONLY monitor a service that would exist on ALL Windows Server Operating Systems. If it doesn’t, then you will see false monitoring conditions, or creating an unhealthy state for a computer which does not have the service. In those cases, you should enable your monitor only for a group of systems, or (better) create a new class of systems that will always contain that service or application.
Lastly – you could create some advanced MonitorTypes if you don’t like this one. Use the existing MonitorType as an example, and then change the Expression based Condition Detections as you see fit. You could make a MonitorType that ignores Disabled, but does monitor Auto and Manual services by default, quite easily.
Probably my only complaint in all of this, is that by default, when a service does not exist on a machine, we show the monitor as healthy. To me, we should have some other condition detection capability to consider this an unhealthy condition.
What about delayed start automatic service?
Delayed start automatic is still automatic so it is monitored the same.
I have got used to this behaviour over the years. ” Additionally, if the service does not exist – the monitor will also show as healthy. It is simply ignored.” However recently I am coming across situations where the service has been removed (so in state 8) and some servers stay green but some go red and create an alert. But the behaviour is not consistent even on a single server. I had one server where an alert was raised for the Computer Browser service which was no longer on that server. In health explorer the state was critical even though it was state 8 whereas a few days again when it came up the same service stayed green even though it also showed state 8. I am not sure what has caused this behaviour to change but it is the inconsistency which is annoying.
That’s odd that it is not consistent. It should be. I dont like the build in service monitor type at all, and I do not use it. I use the one in my fragment for service monitoring, which I feel is a better solution overall. It does not ignore missing services, and the expression is MUCH simpler.
If it hasn’t been said anywhere else, I have noticed that the Win32ServiceInformationProvider does not properly null all fields before populating it for return to this monitor type. That causes a situation where sometimes the Start Mode is randomly filled in as 2 and, since the State is not 4, it satisfies the error condition and thus alerts.
Based on this erratic behavior, I opted to write my own version of the CheckNTServiceStateMonitorType as well, but simply added the State is 8 or other standard logic as healthy and that the state does not equal 8 in addition to the standard logic for error.
I do something similar. I don’t like the overly complicated built in expression so in my fragments I provide for this, I use:
StartMode: 0=BootStart 1=SystemStart 2=Automatic 3=Manual 4=Disabled
State: 0=Unknown 1=Stopped 2=StartPending 3=StopPending 4=Running 5=ContinuePending 6=PausePending 7=Paused 8=ServiceNotFound 9=ServerNotFound
Service Unhealthy: (StartMode <> 4) AND (State <> 4)
Service Healthy: (StartMode = 4) OR (State = 4)
https://github.com/thekevinholman/FragmentLibrary
If you intend to monitor a service that you expect to be enabled on all servers to identify systems which do not have for example Windows Defender would this be the best method or would a custom monitor be required to check for the existence of a service on all computers?
You can do either one. Personally I prefer my registry monitor example to look for missing registry entries such as a service