Menu Close

Automating Agent Load Balancing for Management Servers and Gateways

This is something I have worked with pretty much every customer on.  If you assign agents manually to management servers or gateways, you might want to automate load balancing these agents across multiple management servers.  There are some community solutions out there already, but they often can move agents every day, unnecessarily.  This solution incorporates a threshold, where a percentage of total agents needs to be unbalanced before load balancing.

A common scenario I see is when you have something like 5 management servers, 3 MS are dedicated to monitoring Windows Agents but 2 MS are dedicated to UNIX/Linux, or URL monitoring, or Network monitoring, etc.  You might wish to keep agents from reporting to, or even failing over to these dedicated management servers.

Another common scenario is Gateways, when you deploy multiple gateways in a specific network location for high availability.  Agents assigned to a GW only communicate with their assigned GW.  To get them to fail over to a second GW you must use PowerShell and the SCOM SDK to manually configure this for all the agents.  However, when you use my automation for load balancing, this also handles adding the other GW as a failover for any agents that get moved.

Quick Download:  SCOM Agent Load Balancing Management Pack on GitHub

Once you import the MP, you will find two rules targeting the “All Management Servers Resource Pool”

image

These rules are disabled and require configuration first.  Open the Rule for Management Servers first.

Set the rule to Enabled, then go to the Configuration tab and edit the data source:

image

By default this rule runs once per day at 5:01 AM.  Change this if needed.

Next, edit the Write Action:

image

Edit the SCOMServerList to a comma separated list of SCOM Management servers that you wish to load balance as a unit.

The script will only load balance agents if more than 5 percent of the agents are out of balance.  This keeps from moving agents every day unnecessarily.  You can change this if you like.

Save the rule and it will start running at the scheduled time.

This workflow logs Event 8001 to the Operations Log on one of your management servers (wherever the AMSRP is hosted).  It will always run on the same Management server, unless that MS is down.

image

If there are errors, then alerts will be raised:

image

 

You can use the GW rule in the same way.  If you have multiple GW pairs, then I recommend creating a new rule for each GW pair, by copying and pasting the GW rule several times, and editing the configuration in the XML.

 

Quick Download:  SCOM Agent Load Balancing Management Pack on GitHub

9 Comments

  1. Nolan

    Hi Kevin

    Damn this is long overdue and has always been a pain, I even tried scripting it into the agent install but never worked perfectly, great job!

  2. Brian

    Does this overcome the bug with changing primary management server through the console. The bug where some agents will generate heartbeat alerts because they don’t actually change when told?

    • Kevin Holman

      I haven’t ever seen that bug…. unless you are talking about Gateways – in which case that’s by design. But I haven’t ever seen it moving from one MS to another, as long as you provide the failover server list, and that list contains at least ONE server that was in the agents configuration previously.

      • Michael Stefansen

        Hi Kevin
        Is it possible to change the target, from “All Management Servers Resource Pool”, to be something you can override in the Management Pack?

        • Kevin Holman

          I feel like that would be a mistake. Why would you want to do that? Technically you could change the target to Collection Management Server – and just enable the rule for a specific management server – but this is backwards, as you lose high availability for the rule. When you force a rule only to run on a defined MS so it makes it easy for you to know which MS is running the rule – you destroy high availability, which is the benefit of targeting a pool. The MS that owns the AMSRP will always own it, unless the number of MS changes, or that MS is down. If you truly need to limit this or control it in some way, you could create your own custom resource pool with two servers in it, and then target the rule to that RP.

          • Michael Stefansen

            But Gateway servers are not member of the “All Management Servers Resource Pool”, or they are not in any of my SCOM Management Groups.
            Agent loadbalancing on the Gateways wouldn’t work with your Management Pack, for the GW rule then.

          • Kevin Holman

            Where the rule runs has ZERO relationship to what is being load balanced. The AMSRP is simply the target for “where to run this rule, that can access the SDK”. What you are load balancing has ZERO relationship to that.

  3. Michael Stefansen

    Hi Kevin
    Where does this management pack differ from the one Tao Yang created in his “OpsMgr Self Maintenance Management Pack”?

    • Kevin Holman

      They are similar. Both target the AMSRP for high availability. His uses a resource pool to select management servers, which I always through was odd because resource pools do not have anything to do with agents. He uses a custom resource pool for load balancing simply as a selection criteria for which management servers to load balance.

      Also, his does not have a threshold for number of agents BEFORE balancing. It will load balance even if there is ONE agent that needs to be moved… so it ends up moving agents, likely every single day. My solution uses a threshold of percentage of agents that need to be moved before triggering a rebalance. This ensures we are not load balancing something every day, or with every change in number of agents.

      Lastly, my solution works fine with Gateways as well, as there is a rule for Management Servers, then there is a rule for Gateways. A customer might have multiple GW served network locations, each with multiple GW pairs. My solution makes it pretty easy to copy and paste the GW rule for each GW pair.

Leave a Reply

Your email address will not be published. Required fields are marked *