I had a customer request recently, where they wanted to generate an alert on the existence of a “Bad” event, but ONLY if it was NOT followed by a “Healthy” event after 5 minutes.
One of the scenarios for this was a Redundant Power Supply temporarily losing input power. It was common for their power supplies to log events that one side had lost AC power, but then it would return within seconds. They only wanted to be alerted if it was a sustained power loss.
We have a monitor example of this – in the UI – called the Correlated Missing Event Detection Monitor type. The problem with this monitor, is that sometimes we don’t want to affect health state, or having a reliable reset mechanism can be troublesome.
I will show how to write a rule with these properties.
Most rules are simple – they contain a Datasource (Microsoft.Windows.EventProvider) and a WriteAction (GenerateAlert). Simply match the expression for the event, and the write action fires.
This rule will be unique, because it will contain TWO datasources, and an additional component: a Condition Detection.
I’ll start with an example of the Datasource: First = the Good, Healthy, or “clearing” event:
<DataSource ID="GoodEventDS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">EventCreate</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">102</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource>
Next the Bad, Unhealthy, or “trigger” event:
<DataSource ID="BadEventDS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">EventCreate</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">101</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource>
Next up – the condition detection. We actually have some fancy developed condition detections such as System.CorrelatorAutoMissingCondition which is defined at https://msdn.microsoft.com/en-us/library/ff521631.aspx However, I could never get these to work with a rule. It is odd – because it works great with a monitor. Instead, I chose to peel back the onion and just use the System.Correlator Module – defined at https://msdn.microsoft.com/en-us/library/ff458713.aspx. And with this module – I will just write my own expression for the missing event.
Here is the XML:
<ConditionDetection ID="Correlator" TypeID="System!System.CorrelatorCondition"> <Correlator> <CorrelationExpression /> <Count>1</Count> <Interval>30</Interval> <CorrelationOrder>InSequence</CorrelationOrder> <CorrelationItemPolicy>ResetWindow</CorrelationItemPolicy> </Correlator> <Expression> <Or> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">0</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Value Type="String">$Config/Correlator/CorrelationOrder$</Value> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">AnyOrder</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery> </ValueExpression> <Operator>GreaterEqual</Operator> <ValueExpression> <Value Type="UnsignedInteger">1</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery> </ValueExpression> <Operator>Less</Operator> <ValueExpression> <Value Type="UnsignedInteger">$Config/Correlator/Count$</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Or> </Expression> </ConditionDetection>
It is rather long – but most of that is the complicated expression.
The correlator part is quite simple:
<ConditionDetection ID=”Correlator” TypeID=”System!System.CorrelatorCondition”>
<Correlator>
<CorrelationExpression />
<Count>1</Count>
<Interval>30</Interval>
<CorrelationOrder>InSequence</CorrelationOrder>
<CorrelationItemPolicy>ResetWindow</CorrelationItemPolicy>
</Correlator>
Count is the number of “good” events required to not generate an alert.
Interval is the time window to allow the “good” events to show up after a bad event is observed.
CorrelationOrder specifies whether or not the items are to be correlated in a set sequence or are to be evaluated regardless of order.
CorrelationItemPolicy specifies how the module handles multiple incoming primary data items within a single time interval.
These above are all defined here: https://msdn.microsoft.com/en-us/library/ff458712.aspx
The expression part is likely the most difficult. The ordering of Item0Count and Item1Count was perplexing. What I found in a rule, is that the first Datasource (event) becomes Item1Count, while the second Datasource (event) becomes Item0Count. So be aware – order matters here.
Therefore – my expression states to “match” (generate alert), when Item0Count (bad event) = 1 and Item1Count (good event) = 0 (or missing) in the time frame. OR – when Item0Count is greater than 1 in the time period, while Item1Count is LessThan the configured “Count” value I talked about above.
So remember: Our rule will have these components in order:
<DS Healthy Event>
<DS Bad Event>
<Correlator Condition Detection>
<Write Action to Generate Alert>
I’ll attach my complete management pack below. This sample is designed to fire an alert when a Bad event ID 101 is observed, but a Good event 102 is not fired within 30 seconds of the bad event.
***Note – you may notice a slight delay of longer than 30 seconds for the alert to fire. This is because the correlator condition detection has two optional properties – Latency and DrainWait which add a small amount of time before alerting.
Hello, very useful article, thank you very much for the good explanation of the complicated logic.
What if I did not have to react to the first bad event, but to the logic: 4 bad events OR Time elapsed without good event showing up?
I’ve tried to adapt code so see below. But it does not work – no alert at 5 times collected bad event …
$Target/Property[Type=”Windows!Microsoft.Windows.Computer”]/NetworkName$
MyLog
PublisherName
Equal
MySource
EventDisplayNumber
Equal
94
$Target/Property[Type=”Windows!Microsoft.Windows.Computer”]/NetworkName$
MyLog
PublisherName
Equal
MySource
EventDisplayNumber
Equal
95
1
432000
InSequence
ResetWindow
Item0Count
Equal
1
Item1Count
Equal
0
$Config/Correlator/CorrelationOrder$
Equal
AnyOrder
Item0Count
GreaterEqual
4
Item1Count
Less
$Config/Correlator/Count$
The XML is unable to open. could you please check.
”
ResourceNotFound
The specified resource does not exist. RequestId:da5c927b-401e-000a-7f8f-4221e0000000 Time:2022-03-28T10:33:03.3415983Z
“
Hi Kevin,
Sample xml (mp) is not available. May i know the sample mp for this one