This is a script example I did a demo on recently. It is designed to help you keep up with agents that are not communicating with SCOM, fix them, or help you categorize them into groups for troubleshooting.
I was working with a customer that has a very large environment, with 1000 agents not communicating. Most of these were because they don’t have an integrated decommission process, so people retire servers and do not tell the monitoring team. This creates server-down alerts that just get ignored because the operations team receiving them recognize them as decommissioned servers. This is a bad practice, as monitoring removal should be part of any customer server decommission process.
Quick Download: https://github.com/thekevinholman/RemoteAgentHealthSCOMServerReport
When you run it, it will dump a CSV report to C:\windows\temp directory, and output a grid to the screen:
The script gets all your agents that have critical Health Service Watcher object, and loops through each one, checking to see:
- Is the server in maintenance mode?
- when was the server last communicating or reset?
- What are the management server assignments?
- Can we resolve the agent from DNS?
- Can we ping the agent now?
- Can we connect to the remote Service Control Manager?
- Can we get the status of Healthservice?
- If stopped, start it
- If disabled, fix it
- If someone uninstalled the agent, lets us know
This is really helpful when you have a large environment, and a large number of agents that are not communicating.
Obviously, firewalls create issues for running a script like this, and you must have rights on the agent machines in order to remotely interrogate or fix services.
always perfect:)
This is cool, thanks!
Great work and very helpful, is there an option to report on the last agent communication? I see that you can see the heartbeat last modified, but can you report on when the agent last spoke to SCOM
I wish there was an easier way to do this. The HB last modified should show this, technically, because it would show the last time the HB monitor went critical, which should be the last time the agent communicated. Unfortunately the DB does not store the raw HB data, and there are even scenarios where HB is working fine, but the agent is no longer sending alerts, events, perf, or statechanges. The only other way I know is to scan the perf tables and get the “last” performance data sent by the specific agent, which would only be as long as you contain perf data in the OpsDB.
This sound great, but in my enviroment I get the below. Any ideas?
Get-SCOMClassInstance : An object of class ManagementPackClass with ID 00000000-0000-0000-0000-000000000000 was not found.
At G:\SCOMADMIN\SCOM Scripts\RemoteAgentHealthSCOMServerReport.ps1:58 char:29
+ $HSWInstances = $HSWClass | Get-SCOMClassInstance | Where {$_.HealthS …
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (Microsoft.Syste…InstanceCommand:GetSCClassInstanceCommand) [Get-SCOMClassInstance], ObjectNotFoundExcepti
on
+ FullyQualifiedErrorId : ExecutionError,Microsoft.SystemCenter.OperationsManagerV10.Commands.GetSCClassInstanceCommand