Consider the scenario – you want to monitor the event logs for a specific event, however, this event has a tendency to “storm” or log hundreds of events in a short time window. Not a good condition for a monitoring system, as you can quickly overwhelm the system, nor do you want hundreds or thousands of alerts for a single condition.
The traditional approach to this would be to enable “Alert Suppression” which will increment a repeat counter on the alert. This has a few negative effects:
1. You still overwhelm the monitoring system, as you have to write this incremented counter to both the OpsDB and the DW. Although this is not as expensive and creating multiple individual alerts, it still has significant impact.
2. You will only get a notification on your FIRST alert. All subsequent alerts will increment the counter, but you will never get another email/ticket on this again, as long as the original alert is still open.
Another approach – is to use a consolidator condition detection. This is similar to the solution I provided here: https://kevinholman.com/2014/12/18/creating-a-repeated-event-detection-rule/
The different, however, is instead of waiting for a specific “count” of events to fire in a specific time window, this example will do the following:
- Wait for the event to exist in the event log.
- Start a timer upon the first event, then wait for the timer to expire
- Create an alert for the event(s), no matter if there was a single event or thousands of events in the timed window.
The XML is fairly simple for this. We will have the following components:
- Event datasource (Microsoft.Windows.EventProvider)
- Consolidation Condition Detection (System.ConsolidatorCondition)
- Alert Write Action (System.Health.GenerateAlert)
Here is the datasource: we simply look for event ID “123”
<Rule ID="Demo.AlertOnConsolidatedEvent.Event123.Alert.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Alert</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">123</Value> </ValueExpression> </SimpleExpression> </Expression> </DataSource> </DataSources>
Here is the condition detection. Notice there is no counting condition, simply the timer window, where my example uses 30 seconds.
<ConditionDetection ID="CD" TypeID="System!System.ConsolidatorCondition"> <Consolidator> <ConsolidationProperties> </ConsolidationProperties> <TimeControl> <WithinTimeSchedule> <Interval>30</Interval> <!-- seconds --> </WithinTimeSchedule> </TimeControl> <CountingCondition> <CountMode>OnNewItemNOP_OnTimerOutputRestart</CountMode> </CountingCondition> </Consolidator> </ConditionDetection>
And finally – a simple write action to generate the alert:
<WriteActions> <WriteAction ID="WA" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertMessageId>$MPElement[Name="Demo.AlertOnConsolidatedEvent.Event123.Alert.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/Count$</AlertParameter1> <AlertParameter2>$Data/TimeWindowStart$</AlertParameter2> <AlertParameter3>$Data/TimeWindowEnd$</AlertParameter3> <AlertParameter4>$Data/Context/DataItem/EventDescription$</AlertParameter4> </AlertParameters> </WriteAction>
When I fire off a LOT of Event ID 123 events:
eventcreate /T ERROR /ID 123 /L APPLICATION /SO TEST /D “This is a Test event 123”
I only get a single, consolidated Alert, after the 30 second time window expires:
I will attach the entire MP example here:
Hi Kevin,
I don’t know if you got a message already, as when I was typing I hit something and the message disappeared, not sure if it sent or deleted before I finished.
I need to use Suppression in my Event Rules for an application that tends to generate a lot of Event traffic, and after seeing your blog it seems a better approach is to use a consolidator condition detection.
I added into my Event Rule with mixed results. The consolidator condition detection was created in the Event Rule. After testing I deployed to Production where I found consolidator condition detection was not in the Event Rule after all.
Racking my brains, what the heck is going on, tested further in my Test environment, imported un-sealed MP and the consolidator condition detection was created. Removed and imported sealed, without modification, and the consolidator condition detection was not created.
Is there an issue with using consolidator condition detection in sealed MPs? Our standard is to seal all Custom Application MPs.
Hopefully you might be able to shed some light on this and confirm if can be used in sealed MP, if so what’s the secret when the MP doesn’t change between sealed and un-sealed versions?
Thank you.
Graham Parker.
Senior Operations Specialist – ESM
Digital Infrastructure Services, eHealth QLD | Queensland Health
Hi Kevin,
So just to clarify, the functionality for the consolidator condition detection existed inside the rule with the “Edit…” located in the rule’s Configuration tab, but when sealed the “Edit…” functionality does not exist in the Configuration tab of the rule? Does does this mean it is not useable in a sealed MP or just a GUI issue between sealed and unsealed MPs?
Thanks.
Graham Parker.
Senior Operations Specialist – ESM
Digital Infrastructure Services, eHealth QLD | Queensland Health