Menu Close

How to multihome a large number of agents in SCOM

Quick download:  https://gallery.technet.microsoft.com/SCOM-MultiHome-management-557aba93

 

I have written solutions that include tasks to add and remove management group assignments to SCOM agents before:

https://kevinholman.com/2017/05/09/scom-management-mp-making-a-scom-admins-life-a-little-easier/

 

But, what if you are doing a side by side SCOM migration to a new management group, and you have thousands of agents to move?  There are a lot of challenges with that:

 

1.  Moving them manually with a task would be very time consuming.

2.  Agents that are down or in maintenance mode are not available to multi-home

3.  If you move all the agents at once, you will overwhelm the destination management group.

 

I have written a Management Pack called “SCOM.MultiHome” that will manage these issues more gracefully.

 

It contains one (disabled) rule, which will multihome your agents to your intended ManagementGroup and ManagementServer.  This is also override-able so you can specify different management servers initially if you wish:

 

image

 

This rule is special – in how it runs.  It is configured to check once per day (86400 seconds) to see if it needs to multi-home the agent.  If it is already multi-homed, it will do nothing.  If it is not multi-homed to the desired manaement group, it will add the new management group and management server.

But what is most special, is the timing.  Once enabled, it has a special scheduler datasource parameter using SpreadInitializationOverInterval.  This is very powerful:

<DataSource ID="Scheduler" TypeID="System!System.Scheduler"> <Scheduler> <SimpleReccuringSchedule> <Interval Unit="Seconds">86400</Interval> <SpreadInitializationOverInterval Unit="Seconds">14400</SpreadInitializationOverInterval> </SimpleReccuringSchedule> <ExcludeDates /> </Scheduler> </DataSource>

 

What this will do, is run once per day, but the workflow will not initialize immediately.  It will initialize randomly within the time window provided.  In the example above – this is 14400 seconds, or 4 hours.  This means if I enable the above rule for all agents, they will not run it immediately, but randomly pick a time between NOW and 4 hours from now to run the multi-home script.  This keeps us from overwhelming the new environment with hundreds or thousands of agents all at once.  You can even make this window bigger or smaller if you desire by editing the XML here.

 

Next up – the Groups.  This MP contains 8 Groups.

 

image

Let’s say you have a management group with 4000 agents.  If you multi-homed all of these to a new management group at once, it would overwhelm the new management group and take a very long time to catch up.  You will see terrible SQL blocking on your OpsMgr database and 2115 events about binding on discovery data while this happens.

The idea is to break up your agents into groups, then override the multi-home rule using these groups in a phased approach.  You can start with 500 agents over a 4 hour period, and see how that works and how long it takes to catch up.  Then add more and more groups until all agents are multi-homed.

These groups will self-populate, dividing up the number of agents you have per group.  They query the SCOM database and use an integer to do this.  By default each group contains 500 agents, but you will need to adjust this for your total agent count.


  <DataSource ID=”DS” TypeID=”SCOM.MultiHome.SQLBased.Group.Discovery.DataSource”>
    <IntervalSeconds>86400</IntervalSeconds>
    <SyncTime>20:00</SyncTime>
    <GroupID>Group1</GroupID>
    <StartNumber>1</StartNumber>
    <EndNumber>500</EndNumber>
          
    <TimeoutSeconds>300</TimeoutSeconds>
  </DataSource>
</Discovery>

Also note there is a sync time set on each group, about 5 minutes apart.  This keeps all the groups from populating at once.  You will need to set this to your desired time, or wait until 10pm local time for them to start populating.

 

Wrap up:

Using this MP, we resolve the biggest issues with side by side migrations:

 

1.  No manual multi-homing is required.

2.  Agents that are down or in maintenance mode will multi-home when they come back up gracefully.

3.  Using the groups, you can control the load placed on the new management group and test the migration in phases.

4.  Using the groups, you can load balance the destination management group across different management servers easily.

8 Comments

  1. Raleine-Ann Asis

    Can I apply this to only select group of Agents? Where will this MP be installed – in the management group where agents are currently running or the Management group where the agents will be migrated over?
    Will there be an impact on the monitoring behavior if the agent version of my new management group is different from the current management group that is live?

    • Kevin Holman

      Yes, via a group. Install this MP in the “old” management group. As far as different agent versions, you should review the supported coexistence statements in the product documentation.

  2. Vipin Prasad K

    Hi Kevin,

    I have installed the MP on SCOM 2016, though i dont see the agents getting populated in the group automatically. Do we have to enable any other settings apart from enabling the rule. I have changed the agent count to 100, as for now we are monitoring 600 Servers overall.

    Regards,
    Vipin

    • Kevin Holman

      Those groups only populate once a day based on a sync time. Wait long enough, and look on your management servers for the events they log to see whats going on.

  3. Greg Smith

    Hi Kevin, We just started using the MP on a SCOM 2019 install. I believe that I found an issue with the way that the MStoASSIGN$ parameter gets passed to the SCOM.MultiHome.AddMG.Rule.WA.ps1 PowerShell script. For some reason, a space is getting added to the end of the string which caused an exception when the AddManagementGroup method was called? I updated the PowerShell code and added $MStoASSIGN = $MStoASSIGN.Trim() and the issue seems to be fixed.

    Exception calling “AddManagementGroup” with “3” argument(s): “The parameter is incorrect. (Exception from HRESULT: 0x80070057 (E_INVALIDARG))”

Leave a Reply

Your email address will not be published. Required fields are marked *