Consider the scenario – you want to monitor the event logs for a specific event, however, this event has a tendency to “storm” or log hundreds of events in a short time window. Not a good condition for a monitoring system, as you can quickly overwhelm the system, nor do you want hundreds or thousands of alerts for a single condition.
The traditional approach to this would be to enable “Alert Suppression” which will increment a repeat counter on the alert. This has a few negative effects:
1. You still overwhelm the monitoring system, as you have to write this incremented counter to both the OpsDB and the DW. Although this is not as expensive and creating multiple individual alerts, it still has significant impact.
2. You will only get a notification on your FIRST alert. All subsequent alerts will increment the counter, but you will never get another email/ticket on this again, as long as the original alert is still open.
Another approach – is to use a consolidator condition detection. This is similar to the solution I provided here: https://kevinholman.com/2014/12/18/creating-a-repeated-event-detection-rule/
The different, however, is instead of waiting for a specific “count” of events to fire in a specific time window, this example will do the following:
- Wait for the event to exist in the event log.
- Start a timer upon the first event, then wait for the timer to expire
- Create an alert for the event(s), no matter if there was a single event or thousands of events in the timed window.
The XML is fairly simple for this. We will have the following components:
- Event datasource (Microsoft.Windows.EventProvider)
- Consolidation Condition Detection (System.ConsolidatorCondition)
- Alert Write Action (System.Health.GenerateAlert)
Here is the datasource: we simply look for event ID “123”
<Rule ID="Demo.AlertOnConsolidatedEvent.Event123.Alert.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100">
<DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider">
Here is the condition detection. Notice there is no counting condition, simply the timer window, where my example uses 30 seconds.
<ConditionDetection ID="CD" TypeID="System!System.ConsolidatorCondition">
<!-- seconds -->
And finally – a simple write action to generate the alert:
<WriteAction ID="WA" TypeID="Health!System.Health.GenerateAlert">
When I fire off a LOT of Event ID 123 events:
eventcreate /T ERROR /ID 123 /L APPLICATION /SO TEST /D “This is a Test event 123”
I only get a single, consolidated Alert, after the 30 second time window expires:
I will attach the entire MP example here: