Menu Close

Creating a Repeated Event Detection *Rule*

One of the built in Monitor types in the SCOM Console, is a repeated event detection monitor.  This is a cool way of creating an alert when we want to know when multiple similar events are recorded in a specified time frame.  This is helpful for applications that might log one one or two events, and that might not be evidence of an actual issue, but when the log gets flooded, or logs a large number of events in a short time frame, that is evidence of a problem.

The issue I have always had with the SCOM console – is that it only provides us with a repeated event detection MONITOR, and not a rule.  The problem with using a monitor is that it assumes we want to drive health state around this.  Often times, we don’t, we just want simple alerting on the condition.  The other problem with a monitor is that we need a way to “reset” it back to healthy, and our options leave a lot to be desired.  We could use “Manual” reset… which is near worthless.  Manual reset monitors should almost NEVER be used in any case, they create labor intensive problems where customers have to use the console and reset a monitor, otherwise we will stop monitoring for a condition until the monitor is manually reset back to healthy.  Another alternative is Event reset, where we would use another, different event to trigger a “healthy” condition.  This would be ok, IF the application truly had an event that showed the previous condition had cleared up.  Most of the time, this is not the case.  Lastly, we could use a timer reset.  I end up using these often, simply because there is no other choice.  It is still a poor solution, because now the “health state” I am driving is completely meaningless, and I am only resetting it with a timer to clear it up so I can get additional alerts in the future.

This leaves us with the need to have a simple rule type, with repeated event detection.  It is actually quite simple to create, we just cannot create it using the SCOM UI.

For this example, I will show how to author this using the SCOM 2007 R2 Authoring Console, because that is still the simplest tool to use for this type of authoring.  http://www.microsoft.com/en-us/download/details.aspx?id=18222

Open the Authoring console and create a new empty MP:

image

Give the MP a display name and hit Create.

Choose the Health Model pane, and select Rules.   Choose New, Custom Rule:

image

Give the rule a proper ID:

image

On the General Tab – provide a DisplayName for the rule, and a good target class.  Never target Windows Computer – I like to use Windows Server Operating System as a good generic class:

image

On the Modules tab – this is where the magic happens baby!  Smile 

In a typical alert generating event rule, we have a datasource (the event log and expression) and a Write Action (the alert).  In this example – we will add a condition detection, that must be met before moving on to the write action.  The Condition Detection will be the repeat criteria.

In the Data Sources – select Create.  Choose the Microsoft.Windows.EventProvider, which is a simple composite datasource that combines the Microsoft.Windows.BaseEventProvider with a Condition Detection that provides an expression for the event criteria.  http://msdn.microsoft.com/en-us/library/ee809339.aspx

image

Provide a name for that Datasource (DS) and click OK.  Edit the Datasource we just created, then click Configure:

image

Here we will find the familiar UI for providing a log and event ID, source, etc.  This is the “Expression” I referenced above:

image

image

Hit OK on everything to get back to the Modules tab.

Click Create on the Condition Detection.  We want the System.ConsolidatorCondition   http://msdn.microsoft.com/en-us/library/ee809324.aspx

image

Provide a name for the Condition Detection (CD) and click OK.

Now Edit the CD we just created, and click the Configure button.  For this example, we can choose to trigger on count (sliding) which will allow us to alert anytime our even happens “x” times in any window of “y” seconds.  Set the compare count to 10, and the interval to 60 seconds for this example:

image

Click Ok twice to get back to the Modules page.

Now, create a write action.  We will use the System.Health.GenerateAlert

image

Provide a name for the Write Action (WA) and click OK.  Then Edit, and Configure the write action.

image

Save, import the MP, and test.  Voila:

image

 

Here is the MP XML:

<ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <Manifest> <Identity> <ID>example.eventrules</ID> <Version>1.0.0.0</Version> </Identity> <Name>example.eventrules</Name> <References> <Reference Alias="SC"> <ID>Microsoft.SystemCenter.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Windows"> <ID>Microsoft.Windows.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Health"> <ID>System.Health.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="System"> <ID>System.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> </References> </Manifest> <Monitoring> <Rules> <Rule ID="example.eventrules.repeatevent1000" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Custom</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1000</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">TEST</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource> </DataSources> <ConditionDetection ID="CD" TypeID="System!System.ConsolidatorCondition"> <Consolidator> <ConsolidationProperties /> <TimeControl> <WithinTimeSchedule> <Interval>60</Interval> </WithinTimeSchedule> </TimeControl> <CountingCondition> <Count>10</Count> <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode> </CountingCondition> </Consolidator> </ConditionDetection> <WriteActions> <WriteAction ID="WA" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertMessageId>$MPElement[Name="AlertMessageID0e8694e125494edab211685387e39a1b"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/Count$</AlertParameter1> <AlertParameter2>$Data/TimeWindowStart$</AlertParameter2> <AlertParameter3>$Data/TimeWindowEnd$</AlertParameter3> <AlertParameter4>$Data/Context/DataItem/EventDescription$</AlertParameter4> </AlertParameters> </WriteAction> </WriteActions> </Rule> </Rules> </Monitoring> <Presentation> <StringResources> <StringResource ID="AlertMessageID0e8694e125494edab211685387e39a1b" /> </StringResources> </Presentation> <LanguagePacks> <LanguagePack ID="ENU" IsDefault="true"> <DisplayStrings> <DisplayString ElementID="AlertMessageID0e8694e125494edab211685387e39a1b"> <Name>Event 1000 has ocurred multiple times</Name> <Description>The event 1000 has occurred {0} times between {1} and {2} Event Description: {3} </Description> </DisplayString> <DisplayString ElementID="example.eventrules"> <Name>Example EventRules</Name> </DisplayString> <DisplayString ElementID="example.eventrules.repeatevent1000"> <Name>Repeated Event 1000 Rule</Name> <Description /> </DisplayString> </DisplayStrings> </LanguagePack> </LanguagePacks> </ManagementPack>

As you can see by looking at the XML – this is just like any other typical alert generating event based rule – we simply add a condition detection for the consolidator module, and pass specific criteria to that module, like the time window interval, the count, and countmode.

15 Comments

  1. Ollie Woodall

    Do the variables $Data/TimeWindowStart$ and $Data/TimeWindowEnd$ exist for a Repeat Monitor as well? Or are these only exposed via the Rule?

    • Kevin Holman

      They exist for monitors as well. Look at the monitor statechange context. It will probably be something like $Data/Context/TimeWindowStart$ and $Data/Context/TimeWindowEnd$

  2. Onkar umarani

    Hi Kevin,
    Is it possible to consider description of a event? If any text is coming again and again then we consider that as well while creating rule. Like even 1000 and description testdb. If this repeat twice then only generate alert.

  3. Nick

    I’ve got a question slightly related to this; I have repeat event monitor that I’m currently testing out. It creates the alert after 3 events within 2 min pop, but in the alert description I have $Data/Context/EventDescription$ to get the event description preferably of the most recent event that made the alert pop. It doesn’t work and the “Alert Parameter Replacement Failure” rule pops which makes sense. Is there a way to get an event description of the most recent event into the alert description, when the monitor itself uses multiple events? I saw you had $Data/Context/DataItem/EventDescription$ in the alert description but that didn’t work for me either.

  4. Alfredo Colon

    Hi Kevin. I tried testing the new rule using the Simulate feature. Attempting to start the simulation returns the following message box (error):

    The Workflow Simulator needs the OpsMgr R2 agent installed. The agent does not need to be started, running, or belong to any management group. The agent install can be found in the OpsMgr install directory on any Management Server or can be downloaded with the evaluation version of OpsMgr. Please see help for details.

    I looked in the 2019 install media, but didn’t see anything resembling ‘OpsMgr R2’ agent. A quick web search also returned nothing relevant.

    Is this the recommended way to test the new rule? If not, how do I go about testing it?

  5. Andrew

    Thanks Kevin,

    I have a slightly similar situation where we don’t want to drive a health state but we do have an event that says it has resolved. This is specifically with SNMP, where one value will be bad and another will be good.

    I want to have a rule similar to a 2 state monitor where the alert will close once the good value is seen.

    Is there a way to steal similar config from the monitor with a rule?

    Thanks

  6. Mandy

    Hello, I was curious if you have a way to detect ANY event ID that has a certain number of occurrences or if you can only create one for each event ID?

  7. Rahul Vaish

    Hello Kevin,

    I have a situation where i am getting 16949 ( unhealthy event ) and 16950 ( healthy event ) after every second.
    I want to create a Repeated event based monitor. Which event ID will go in the “Simple Event Expression” tab?
    Healthy Event ID : 16950
    OR
    Unhealthy Event ID : 16949/16947
    Since , there are healthy and un-healthy events are getting logged simultaneously, i don’t think repeated event monitor can auto-close once 16950 will be logged.

  8. Mike

    Thank you for detailed explanation. If 1000 event is generated 3 times I get alert and can we mark auto close once we get 900 event.

    Can we create using monitor instead of alert.

    What will be the additional steps to achieve these.

Leave a Reply

Your email address will not be published.