One of the features that has existed since SCOM 2007R2, is the ability to adjust Alert Storm thresholds on an agent by agent basis. The default is any agent that generates 50 alerts, in a 60 second window – will auto-disable that workflow for 10 minutes to control alert storms.
All of this activity occurs on the agent itself. There is an alert generated to let you know you are having an alert storm – but the alert is in response to an event in the SCOM agent’s event log only.
Log Name: Operations Manager
Event ID: 5399
Computer: SERVER.opsmgr.net
Description:
A rule has generated 3 alerts in the last 5 seconds. Usually, when a rule generates this many alerts, it is because the rule definition is misconfigured. Please examine the rule for errors. In order to avoid excessive load, this rule will be temporarily suspended until 2018-10-30T15:23:42.2572524-05:00.
Rule: SCOM.Management.TestEvent100.Rule
This is configurable on a per-agent basis, if you wish – via the registry:
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\<MGNAME>\
Create three REG_DWORD values:
Alert Count – number of alerts from a single workflow to trigger an event about the alert storm
Alert Count Interval – the time period in SECONDS in which the number of alerts will be observed
Alert Suspend Interval – the number of SECONDS you want the workflow temporarily disabled
You will need to restart the Microsoft Monitoring Agent (Healthservice) on the agent, in order for these changes to take effect. You could consider even changing these on a large scale using a SCOM workflow, task, script, or via GPO.
Hi Kevin,
How is this useful in the case of an upstream Switch or Router failing for a certain network/site and SCOM sends a Alert Storm for every server that it has lost connectivity with in that network?
Not very. Different scenario.
Is there any way to suppress alerts from SCOM 2012 ,for failed to connect to computer?
Sure – use maintenance mode to suppress alerts from anything in SCOM.
Very droll, however they could be asking due to the management server not being able to ping the server with the agent on it. Making it just more noise when the server has a real heartbeat failure for one of a multitude of reasons. But that’s realistically out of scope of this article
This article deals with common issues things like dynamic named rules, or linux log file monitoring (set to separate alert so you get useful notification) have that can see them hit that 50 limit per Rule in no time flat.
In POOJA’s case a simple disable override is the way to go if you can’t have ping traffic allowed to that part of the network.
I know we are in that exact situation. If we loose access to our gateways on a remote site having 2 alerts per server for 300+ servers is a bit much.
Yes I agree maintenance mode for the win to stop noise. We use it in scheduled form (since scom 2007R2) and with our automated systems (ie patching) extensively. The more objects you have the more removal of noise matters.
Hi Kevin,
How to suppress alerts for a monitor(Two state monitor). I created a two state monitor for Unix servers. I tested for one server, it is continuously throwing alerts for every minute. Please let me know how to trigger the first alert and suppress the other alerts.
A monitor will not do that. Unless you used something bad, like a timer reset.