Menu Close

Understanding SCOM Resource Pools


image

 

 

Resource pools are nothing new – they were introduced in SCOM 2012 RTM, for two reasons:

1.  To remove the single-point-of-failure that was the RMS role in SCOM 2007.

2.  To provide a mechanism for high availability of agentless/remote workflows, such as Unix/Linux, Network, and URL monitoring, among others.

 

That said – they are often not fully understood.

 

Lets talk about the primary components of a Resource Pool.  I am going to “dumb this down” a lot…. because it is actually quite complex behind the scenes.  So I will break this down more into “roles” with regard to Resource Pools.  The primary “role” components we will discuss are:

1.  Members

2.  Observers

3.  Default Observer

 

Members of a pool are either a Management Server or a Gateway Server.

Observers are “observer-only” roles.  These will be a Management Server or a Gateway server, that do NOT participate in loading workflows for the pool, however they participate in quorum decisions.  This is actually pretty rare to do anything with Healthservice based observer-only roles…. but you would use these if you wanted high availability for your pool, but only a limited number of Healthservices actually running pool workflows.  This is rarely used under normal circumstances.

Default Observer is the SCOM Operations Database.  This is set to “Enabled” or “Disabled” for every pool.  This is set to enabled by default for all pools created in the UI.  It is set to disabled by default, for all pools created via PowerShell, using the New-ResourcePool command.  The “reason” this exists is for the following:

To allow for a pool to have high availability when you have two management servers in a pool

 

Let’s talk about that.

A pool requires ONE or more members.

A pool requires THREE (quorum voting) members to establish high availability.

High availability is the ability to have a member be unavailable, with no loss of monitoring.

 

The reason we need THREE (quorum voting) members (not two) for high availability is because of the quorum algorithm.  We require that MORE than 50% of the quorum voting members in a pool be available.  If you have only two members of a pool, and one is down, you have lost quorum, because of the “greater than 50%” rule.

Therefore – the “Default Observer” was dreamed up, so customers would not HAVE to deploy a minimum of THREE management servers just to get high availability for their Resource Pools.  It is a special quorum voting “observer” role, to allow for high availability of pools when you have two management servers deployed.  This reduced cost and complexity for a basic SCOM deployment.

 

Lets break this into “scenarios”

 

Single Management server in pool

The default observer is enabled by default.

There is no high availability, because the management server is a single point of failure.

The default observer provides no benefit (nor harm) in this case.

 

Two management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are three voting members (2 MS + Default Observer)

If you disable the default observer, you will lose high availability for the pool.

 

Three management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are four voting members (3 MS + Default Observer)

By default – you can only have ONE management server down, to maintain the pool. (greater than 50% rule) because if two MS are down, this is 50% of voting members, so pool suicides.

The default observer in this case provides NO value.  It does not increase the number of management servers that can be down, therefore it does not increase pool stability.

You can consider removing the DO (Default Observer) in this scenario.

 

Four management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are five voting members (4 MS + Default Observer)

By default – you can only have TWO management server down, to maintain the pool. (greater than 50% rule) because if three MS are down, this is greater than 50% of voting members, so pool suicides.

The default observer in this case provides significant value, because it increases the number of management servers that can be down.  Without the DO in this case, you’d only have 4 quorum members, which only allows for ONE to be unavailable.

 

Five or more management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are 6 voting members (5 MS + Default Observer)

By default – you can only have TWO management server down, to maintain the pool. (greater than 50% rule) because if three MS are down, this is exactly 50% of voting members, so pool suicides.

The default observer in this case provides NO value.  It does not increase the number of management servers that can be down, therefore it does not increase pool stability.

You can consider removing the DO (Default Observer) in this scenario.

 

One could argue – that once you have 3 or more management servers in a pool, any “odd” number of management servers would be a good consideration to remove the DO from the pool.  I’d also argue that once you hit 5 management servers, you are probably big enough that the database is under significant load (you wouldn’t typically have 5 management servers in a small environment).  When the database is under heavy load, the default observer might not perform well, and might experience latency in resource pool calculations/voting.

The way the default observer plays a role – is that each MANAGEMENT SERVER in the pool, queries its own local SDK service – which allows it to get data from the database.  There is a table in the SCOM Operations database for the default observer.  So if the SDK service is under load, or the database, we could experience latency that otherwise would not exist.

 

Gateways as resource pool members

 

Next – we should discuss the Gateway role as it pertains to Resource Pools.  Microsoft support resource pool membership for Management Servers, AND for Gateway servers.

For instance, a customer might monitor Unix/Linux servers in a firewalled off DMZ, or across a small WAN circuit where you want the agentless communication localized.  In this scenario, a customer might create dedicated resource pools for Gateways in those locations, to perform monitoring.

 

Single Gateway server in pool

The default observer is enabled by default.

There is no high availability, because the Gateway server is a single point of failure.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

 

Two Gateway servers in pool

The default observer is enabled by default.

One would THINK there is high availability for the pool, because there are two GW’s in the pool, right?  HOWEVER – that is NOT the case.  As we discussed above – we need three voting members to establish high availability for a pool.  Since the Default Observer is NEVER valid for a pool consisting of Gateways, there are only TWO members of this pool.  The pool will run, and will load balance workflows, but if either pool member goes down, the pool suicides.  In this case – you actually have WORSE availability than if you placed a single member in the pool!

In order to maintain high availability for a pool made of Gateways, you need to have THREE GW’s in the pool.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

 

Three Gateway servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are three voting members (3 GW)

By default – you can only have ONE Gateway server down, to maintain the pool. (greater than 50% rule) because if two GW are down, this is >50% of voting members, so pool suicides.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

 

 

Let’s take a minute and process this.

 

What we have learned, is that you should remove the DO from any pool comprised of Gateways.

You should consider removing the DO from pools when 5 or more Management Servers are present.

If your pools are stable….. and you aren’t having any problems with high availability….. then this really doesn’t make much difference….. which is why the defaults are set like they are.

 

So we have talked about pool members, and the default observer…… but what about the “observer” role?

This role is really unique, and will not be used very often.  I cannot think of a single enterprise deployment where I have seen it used.  Generally speaking – if we are adding a dedicated observer for a pool (which is a management server or a GW server) then why not just make that server a full blown pool member?

There is only one scenario where I can think of where this might be useful.  Such as a company with a datacenter with SCOM deployed.  In the SAME DATACENTER, they have a DMZ with two gateways deployed because of firewall rules.  In this case, you could potentially make their parent management server a dedicated observer only, and this would work because tcp_5723 is open already for Healthservice communication.  This is incredibly rare, and the best practice would be to just go ahead and plan for three Gateways servers in the DMZ.

 

Remember – for resource pool members – Microsoft supports Management Servers and Gateways.

For resource pool observers – the same, Management Servers and Gateways.

 

That said – I have done some testing making an *agent* a dedicated observer, such as the DMZ scenario above, and it does work.  The agent becomes a voting member for quorum, and high availability is created by this.  Microsoft didn’t plan or test this scenario – so it is technically unsupported.

Which got me to thinking – “what if I create a resource pool, and make its membership strictly agents”???

Well, that works too.  You cannot do this using the UI, but you can in PowerShell.  I create a resource pool of only agents, then set up URL monitoring to that pool, and high availability and load distribution worked great.  Again, not technically supported by Microsoft, but a unique capability nonetheless.

 

Lastly – I will demonstrate some PowerShell commands to work with this stuff.

 

To view the pools, their Default Observer status, and if they are Automatic or Manual:

$pools = Get-SCOMResourcePool $pools | fl DisplayName,UseDefaultObserver,IsDynamic

 

To DISABLE the default observer for a pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $pool.UseDefaultObserver = $false $pool.ApplyChanges()

 

To ENABLE the default observer for a pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $pool.UseDefaultObserver = $true $pool.ApplyChanges()

 

To set a pool to MANUAL membership:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $pool | Set-SCOMResourcePool -EnableAutomaticMembership $false $pool.ApplyChanges()

 

To set a pool to AUTOMATIC membership:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $pool | Set-SCOMResourcePool -EnableAutomaticMembership $true $pool.ApplyChanges()

 

To add or remove Management Servers or Gateways from a manual pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $MS = Get-SCOMManagementServer -Name "YourMSorGW.domain.com" $pool | Set-SCOMResourcePool -Member $MS -Action "Add" $pool | Set-SCOMResourcePool -Member $MS -Action "Remove"

 

To add or remove Management Servers or Gateways as Observers only to a pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name" $Observer = Get-SCOMManagementServer -Name "YourMSorGW.domain.com" $pool | Set-SCOMResourcePool -Observer $Observer -Action "Add" $pool | Set-SCOMResourcePool -Observer $Observer -Action "Remove"

 

If you want to play with adding AGENTS as a resource pool member or observer (not supported) then simply change “Get-SCOMManagementServer” above – to “Get-SCOMAgent”

 

 

Credits:

A debt of gratitude to Mihai Sarbulescu at Microsoft for his guidance on this topic – he has forgotten more about Resource Pools than most people at Microsoft ever knew.  Smile

12 Comments

  1. peter

    is it possible to define group members by resource pool alone, if server 1 is resource pool 1 for my unix computers that use local accounts, and server 2 is resource pool 2 for my unix computers that use ad integrated accounts accounts, i need to distribute the different run as/ action accounts credentials to the relevant servers/resource pools.

    server naming conventions and OS versions are so mixed and similar , the only was I can define a group is by adding names one by one to the group or define each object in the profiles run as accounts . creating groups based on their resource pool would be less messy and easier to add and remove servers. any ideas ?

    • Kevin Holman

      To make sure I understand – you would like to create a Group, defined in a Management Pack, that contains Windows Computer objects, that are members of a specific Resource Group?

  2. Mark Van Doren

    Hi Kevin,

    We currently manage about 300 Linux servers in a resource pool comprised of 3 SCOM 2012 R2 management servers. We have 20 gateway hosts that we used to manage our Windows infrastructure. I was wondering if it was possible to create a resource pool comprised of gateway servers to monitor our linux environment. I’m not seeing any practical examples of this — other people seem to only use management servers in their Unix/Linux resource pool (as we do now). Is a resource pool with gateway hosts used to monitor unix/linux something that is supported in SCOM 2012 R2? I can’t seem to find any documentation around that scenario, and it seems as though the gateways don’t have a clean way to import/house the requisite certs. Thanks for any advice you may be able to provide!

  3. Mark Van Doren

    Right now we have 5 Management servers, 3 of which are dedicated to our unix/linux resource pool. We monitor ~1100 windows hosts, and ~300 linux hosts. The benefit of using the gateways for linux monitoring would be that it would solve a lot of firewall headaches for us due to their positioning in our infrastructure. If you have any documentation around creating a resource pool comprised of only gateway servers, and how they are configured in terms of importing the necessary certs/communicating with the management hosts for linux, that would be extremely helpful. Or, do you not need certs on the gateway hosts that manage linux, as the requisite certs are already installed on the management hosts that they would be sending their info through? We are running 2012 R2 with UR14.

  4. Alex

    Would be great if Microsoft would would let Kevin put documentation up because the official documentation was not helpful.

    For anyone authoring mp that leverage resource groups for Gateway Pools…. the key is as Kevin called out Gateways of NOT to use default observer.

  5. Thug User

    Hi Kevin. I want to implement SCOM 2019 across 2 Datacenters (primary/secondary). So will have management servers (MS) and gateway servers (GS) in both DCs. It will have sql alwayson DB cluster. Can i put I MS and GS from both DCs in the same resource pool? I also want agent to automatically failover if one datacenter fails. Pls how can i achieve this

    • Kevin Holman

      I do not recommend this. In general, I do not recommend putting management servers in more than one datacenter. SCOM management servers need to be VERY low latency to the database, or each other. The more agents you have the more important this becomes. Management servers should not spread a resource pool across datacenters either, in general. The exception is when you have a DR datacenter a VERY short distance away, and the network latency is VERY, VERY low (always under 5ms). Even then you will see degraded performance as the MS in the report datacenter has higher latency for SQL locks/writes/stored procedures.

      If the purpose is DR, I recommend using a replication technology to replicate your Management Servers to the DR datacenter, and then boot them up in the case of a disaster.

  6. Thug Usher

    Thanks Kevin. This is for two “primary” datacenters. Please can you advise on the best setup I can use using gateway servers and management servers. This is for 600 servers and 600 network devices

    • Kevin Holman

      Why do you feel you “need” anything in the “remote” datacenter? If the bandwidth is sufficient to remote DC, I wouldn’t put anything in there and manage all resources from the primary DC. Especially so for Windows Computers…. the difference between managing them remotely and managing them through a local Gateway is very, very small. Everything has to end up back in the database.

      If you have significant resources in the remote DC, AND you are concerned about SCOM consuming bandwidth, you potentially could save some bandwidth for monitoring the remote network devices by placing three gateway servers in the remote DC, and creating a resource pool for those network devices (removing the default observer). But honestly, I’d only do this if your network monitoring is consuming significant bandwidth.

      People automatically assume “I have a datacenter, there should be some monitoring infrastructure there” but many times, this thought process just increases complexity, lowers availability, and harms performance and stability.

Leave a Reply

Your email address will not be published. Required fields are marked *