Menu Close

SCOM Remote Agent Health Report Script

This is a script example I did a demo on recently.  It is designed to help you keep up with agents that are not communicating with SCOM, fix them, or help you categorize them into groups for troubleshooting.

I was working with a customer that has a very large environment, with 1000 agents not communicating.  Most of these were because they don’t have an integrated decommission process, so people retire servers and do not tell the monitoring team.  This creates server-down alerts that just get ignored because the operations team receiving them recognize them as decommissioned servers.  This is a bad practice, as monitoring removal should be part of any customer server decommission process.

Quick Download:  https://github.com/thekevinholman/RemoteAgentHealthSCOMServerReport

When you run it, it will dump a CSV report to C:\windows\temp directory, and output a grid to the screen:

image

 

The script gets all your agents that have critical Health Service Watcher object, and loops through each one, checking to see:

  • Is the server in maintenance mode?
  • when was the server last communicating or reset?
  • What are the management server assignments?
  • Can we resolve the agent from DNS?
  • Can we ping the agent now?
  • Can we connect to the remote Service Control Manager?
  • Can we get the status of Healthservice?
  • If stopped, start it
  • If disabled, fix it
  • If someone uninstalled the agent, lets us know

This is really helpful when you have a large environment, and a large number of agents that are not communicating.

Obviously, firewalls create issues for running a script like this, and you must have rights on the agent machines in order to remotely interrogate or fix services.

4 Comments

  1. Dave Smith

    Great work and very helpful, is there an option to report on the last agent communication? I see that you can see the heartbeat last modified, but can you report on when the agent last spoke to SCOM

    • Kevin Holman

      I wish there was an easier way to do this. The HB last modified should show this, technically, because it would show the last time the HB monitor went critical, which should be the last time the agent communicated. Unfortunately the DB does not store the raw HB data, and there are even scenarios where HB is working fine, but the agent is no longer sending alerts, events, perf, or statechanges. The only other way I know is to scan the perf tables and get the “last” performance data sent by the specific agent, which would only be as long as you contain perf data in the OpsDB.

Leave a Reply

Your email address will not be published. Required fields are marked *