Menu Close

Monitoring UNIX/Linux with SCOM 2022

 

image

High Level Overview:

  • Import Management Packs
  • Create a resource pool for monitoring Unix/Linux servers
  • Configure the Xplat certificates (export/import) for each management server in the pool.
  • Create and Configure Run As accounts for Unix/Linux.
  • Configure the sudoers file
  • Discover and deploy the agents

 
Import Management Packs:

The core Unix/Linux libraries are already imported when you install SCOM 2022, but not the detailed MP’s for each Linux OS version.  These are on the installation media, in the \ManagementPacks directory.  Import ONLY the specific ones for the Unix or Linux Operating systems that you plan to monitor.

Additionally, there is a download location for Unix/Linux MP’s which have been *UPDATED*, however, the updated MP’s do not contain all Unix/Linux packs, so you should always START by importing the relevant management packs from the SCOM 2022 Media.

image

Here is an example of the MP’s I will import, which is all the important core libraries, and includes Red Hat, and Universal Linux (CentOS, Debian, Oracle, Ubuntu, etc)

image

Once these above are imported – THEN we can update to the most current ones available for those MP’s that have updates:

The *LATEST* version of these MP’s (and the ones you should be using) are located for download at:

https://www.microsoft.com/en-us/download/details.aspx?id=104213

Download those, and then import any relevant updated libraries.  The following screenshot shows version 10.22.1039.0 which was from the SCOM 2022 UR1 timeframe as an example:

image

***NOTE: You will need to restart the Microsoft Monitoring Agent service on all Management Servers that will monitor Linux systems, after importing these management packs, before continuing. This restart is required to allow each MS to deploy the agent files locally.  You can verify you have the correct Linux agent files deployed here:

image

 

Create a resource pool for monitoring UNIX/Linux servers

This pool will be used and associated with management servers that are dedicated for monitoring Unix/Linux systems in larger environments, or may include existing management servers that also manage Windows agents or Gateways in smaller environments.  Regardless, it is a best practice to create a new resource pool for this purpose, and will ease administration, and scalability expansion in the future.

Under Administration, find Resource Pools in the console:

image

Let’s create a new one by selecting “Create Resource Pool” from the task pane on the right, and call it “UNIX/Linux Monitoring Resource Pool”

Click Add and then click Search to display all management servers.  Select the Management servers that you want to perform Unix and Linux Monitoring.  If you only have 1 MS, this will be easy.  For high availability – you need at least two management servers in the pool.

Add your management servers and create the pool.  In the actions pane – select “View Resource Pool Members” to verify membership.

image

 

Configure the Xplat certificates (export/import) for each management server in the pool

Operations Manager uses certificates to authenticate access to the computers it is managing. When the Discovery Wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent.

To configure for high availability, each management server in the resource pool must have all the root certificates that are used to sign the certificates that are deployed to the agents on the UNIX and Linux computers. Otherwise, if a management server becomes unavailable, the other management servers would not be able to trust the certificates that were signed by the server that failed.

We provide a tool to handle the certificates, named scxcertconfig.exe.  Essentially what you must do, is to log on to EACH management server that will be part of a Unix/Linux monitoring resource pool, and export their SCX (cross plat) certificate to a file share.  Then import each others certificates so they are trusted.

If you only have a SINGLE management server, or a single management server in your pool, you can skip this step, then perform it later if you ever add Management Servers to the Unix/Linux Monitoring resource pool.

In this example – I have two management servers in my Unix/Linux resource pool, OM1 and OM2.  Open a command prompt on each MS, and export the cert:

On OM1:

C:\Program Files\Microsoft System Center\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\OM1.cer

On OM2:

C:\Program Files\Microsoft System Center\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\OM2.cer

Once all certs are exported, you must IMPORT the other management server’s certificate:

On OM1:

C:\Program Files\Microsoft System Center\Operations Manager\Server>scxcertconfig.exe -import \\servername\sharename\OM2.cer

On OM2:

C:\Program Files\Microsoft System Center\Operations Manager\Server>scxcertconfig.exe -import \\servername\sharename\OM1.cer

If you fail to perform the above steps – you will get errors when running the Linux agent deployment wizard later.

You can verify these certificates in the MMC, in the Certificates snap-in, by viewing them under Trusted Root Certification Authorities:

image

 

Create and Configure Run As accounts for Unix/Linux

Next up we need to create our run-as accounts for Linux monitoring.   This is documented here: (Link)

We will need Three RunAs accounts (but only two credentials)

Credentials in my example:

  • scommaint – an account used for agent maintenance and SSH to install, sign certificates, and uninstall agents.  Uses sudo elevation.
  • scommon – an account used for monitoring.  Uses sudo elevation on some workflows, and not on others.

RunAs accounts:

  • UNIX/Linux Agent Maintenance Account
  • UNIX/Linux Privileged Monitoring Account
  • UNIX/Linux Monitoring Account

We need to select “UNIX/Linux Accounts” under administration, then “Create Run As Account” from the task pane.  This kicks off a special wizard for creating these accounts.

image

First – we will create the Agent Maintenance Account.

This account is used for SSH, to be able to deploy, install, uninstall, upgrade, sign certificates, all dealing with the agent on the UNIX/Linux system, and will use elevation.

Select “Create Run As Account” and choose “Agent maintenance account”.    Give the account a name such as “UNIX/Linux Agent Maintenance Account

From here you can choose to use a SSH key, or a username and password credential only.  You also can choose to leverage a privileged account, or a regular account that uses sudo.  I will be choosing the most typical – which is an account that will leverage sudo.  My account name is “scommaint”.

 

Next – depending on your OS and elevation standards – choose to use sudo elevation:

image

image

image

Always choose More Secure and click Create.  

Now – since we chose More Secure – we must configure the distribution of the Run As account.  Find your “UNIX/Linux Agent Maintenance Account” you just created, and open the properties.  On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

image

Click Save.

 

Next, Lets create the Privileged Monitoring account, which will use elevation when needed.  Give the monitoring account a display name such as UNIX/Linux Privileged Monitoring Account, and click Next.

On the next screen, type in the credentials that you want to use for monitoring the UNIX/Linux system(s).  These accounts must exist on each UNIX/Linux system and have the required permissions granted.  My account name is “scommon”

image

On the above screen – select to elevate this account with sudo.  This is a privileged account that we will associate with a Privileged RunAs profile for workflows that require sudo elevation. 

On the next screen, always choose “more secure” and click “Create”.

Now – since we chose More Secure – we must configure the distribution of the Run As account.  Find your “UNIX/Linux Privileged Monitoring Account” you just created, and open the properties.  On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

 image

Click Save.

 

Last, lets create the Monitoring account (that does not use sudo).  Give the monitoring account a display name such as UNIX/Linux Monitoring Account, and click Next.

On the next screen, type in the credentials that you want to use for monitoring the UNIX/Linux system(s).  These accounts must exist on each UNIX/Linux system and have the required permissions granted.  My account name is “scommon”

image

On the above screen – select “Do not use elevation”.  This is an unprivileged account that we will associate with a RunAs profile for workflows that does not require sudo elevation. 

On the next screen, always choose “more secure” and click “Create”.

Now – since we chose More Secure – we must configure the distribution of the Run As account.  Find your “UNIX/Linux Monitoring Account” you just created, and open the properties.  On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

image

Here is what it will look like when complete:

image

 

Now that our accounts are created, we must configure the Run As profiles.

There are three profiles for Unix/Linux accounts:

image

The Unix/Linux Agent Maintenance Account profile is strictly for agent installs, signing, updates, uninstalls, anything that requires SSH.  This will always be associated with a privileged (or sudo elevated) account that has access via SSH, and was created using the Run As account wizard above.

The other two Profiles are used for Monitoring workflows.  These are:

Unix/Linux Privileged account

Unix/Linux Action Account

The Privileged Account Profile will always be associated with a Run As account like we created above, that is Privileged OR a unprivileged account that has been configured with elevation via sudo.  This is what any workflows that typically require elevated rights will execute as.

The Action account is what all your basic monitoring workflows will run as.  This will generally be associated with a Run As account, like we created above, but would be used with a non-privileged user account on the Linux systems, and wont request sudo elevation.

I will start with the Unix/Linux Agent Maintenance Account profile.  Right click it – choose properties, and on the Run As Accounts screen, click Add, then select our “Unix/Linux Agent Maintenance Account ”.  Leave the default of “All Targeted Objects” and click OK, then Save.

Repeat this same process for the Unix/Linux Privileged Account profile, and associate it with your “UNIX/Linux Privileged Monitoring Account”.

Repeat this same process for the Unix/Linux Action Account profile, but use the “Unix/Linux Monitoring Account”.

 

Configure sudoers file

I need to modify the sudoers file on each UNIX/Linux server, to grant the granular permissions.

NOTE:  The sudoers configuration changes with each version of SCOM. 

Here is a sample sudoers file for Universal operating systems, in SCOM 2022, taken from here: https://social.technet.microsoft.com/wiki/contents/articles/7375.scom-configuring-sudo-elevation-for-unix-and-linux-monitoring.aspx

#---------------------------------------------------------------------------------------- #Example user configuration for SCOM agent #Example assumes users named: scommaint & scommon #Replace usernames & corresponding /tmp/scx-<username> specification for your environment #General requirements Defaults:scommaint !requiretty #Agent maintenance ##Certificate signing scommaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scommaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scommaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scommaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem scommaint ALL=(root) NOPASSWD: /bin/sh -c if test -f /opt/microsoft/omsagent/bin/service_control; then cp /tmp/scx-scommaint/omsadmin.conf /etc/opt/microsoft/omsagent/scom/conf/omsadmin.conf; /opt/microsoft/omsagent/bin/service_control restart scom; fi ##Install or upgrade scommaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scommaint/scx-1.[5-9].[0-9][0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install --enable-opsmgr; EC=$?; cd /tmp; rm -rf /tmp/scx-scommaint; exit $EC scommaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scommaint/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install --enable-opsmgr; EC=$?; cd /tmp; rm -rf /tmp/scx-scommaint; exit $EC scommaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scommaint/scx-1.[5-9].[0-9][0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --enable-opsmgr; EC=$?; cd /tmp; rm -rf /tmp/scx-scommaint; exit $EC scommaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scommaint/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --enable-opsmgr; EC=$?; cd /tmp; rm -rf /tmp/scx-scommaint; exit $EC ##Uninstall scommaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall scommaint ALL=(root) NOPASSWD: /bin/sh -c if test -f /opt/microsoft/omsagent/bin/omsadmin.sh; then if test "$(/opt/microsoft/omsagent/bin/omsadmin.sh -l | grep scom | wc -l)" \= "1" && test "$(/opt/microsoft/omsagent/bin/omsadmin.sh -l | wc -l)" \= "1" || test "$(/opt/microsoft/omsagent/bin/omsadmin.sh -l)" \= "No Workspace"; then /opt/microsoft/omsagent/bin/uninstall; else /opt/microsoft/omsagent/bin/omsadmin.sh -x scom; fi; else /opt/microsoft/scx/bin/uninstall; fi ##Log file monitoring scommon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p ###Examples #Custom shell command monitoring example – replace <shell command> with the correct command string #scommon ALL=(root) NOPASSWD: /bin/sh -c echo error ##For ubuntu18 #scommon ALL=(root) NOPASSWD: /bin/bash -c echo error #Daemon diagnostic and restart recovery tasks example (using cron) #scommon ALL=(root) NOPASSWD: /bin/sh -c ps -ef | grep cron | grep -v grep #scommon ALL=(root) NOPASSWD: /usr/sbin/cron & #End user configuration for SCOM agent #-----------------------------------------------------------------------------------

I will edit my sudoers file on my Linux servers and insert this configuration.  You can use vi, visudo, or my personal favorite since I am a Windows guy – download and install WINSCP, which will allow a gui editor of the files and helps anytime you need to transfer files to and from Windows and UNIX/Linux using SSH.  Generally we want to place this configuration in the appropriate section of the sudoers file – not at the end.  There are items at the end of the file that need to stay there.  I put this right after the existing “Defaults” section in the existing sudoers configuration, and save it.

 

Discover and deploy the agents

Run the discovery wizard.

image

Click “Add”:

image

Here you will type in the FQDN of the Linux/Unix agent, its SSH port, and then choose All Computers in the discovery type.  ((We have another option for discovery type – if you were manually installing the Unix/Linux agent (which is really just a simple provider) and then using a signed certificate to authenticate))

Check the box next to “Use Run As Credentials”.  This will leverage our existing Agent Maintenance account for the discovery and deployment.

image

Click “Save”.  On the next screen – select a resource pool.  We will choose the resource pool that we already created.  Click “Discover

There are MANY reasons discovery might fail.  Look at the results and see if they are documented:  SCOM Wiki

image

Check the box next to your discovered system – and click “Manage” to deploy the agent.

image

Oops!

image

That failed.  I click details and see:

Agent verification failed. Error detail: The server certificate on the destination computer (ubuntu20.opsmgr.net:1270) has the following errors:    
The SSL certificate contains a common name (CN) that does not match the hostname.
It is possible that:
   1. The destination certificate is signed by another certificate authority not trusted by the management server.
   2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: ubuntu20.opsmgr.net.
   3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

Essentially – I need to regenerate the certificate using a command to include the hostname, as that is how SCOM will interact with it:

/opt/microsoft/scx/bin/tools/scxsslconfig -f -h <hostname> -d <domain.name>

From:  Certificate Issues | Microsoft Learn

image

This will take some time to complete, as the agent is checked for the correct FQDN and certificate, the management servers are inspected to ensure they all have trusted SCX certificates (that we exported/imported above) and the connection is made over SSH, the package is copied down, installed, and the final certificate signing occurs.  If all of these checks pass, we get a success!

There are several things that can fail at this point.  See the troubleshooting section at the end of this article.

 

Monitoring Linux servers:

Assuming we got all the way to this point with a successful discovery and agent installation, we need to verify that monitoring is working.  After an agent is deployed, the Run As accounts will start being used to run discoveries, and start monitoring.  Once enough time has passed for these, check in the Administration pane, under Unix/Linux Computers, and verify that the systems are not listed as “Unknown” but discovered as a specific version of the OS:

Here is what we expect after a few minutes:

image

Next – go to the Monitoring pane – and select the “Unix/Linux Computers” view at the top.  Look that your systems are present and there is a green healthy check mark next to them:

image

Next – expand the Unix/Linux Computers folder in the left tree (near the bottom) and make sure we have discovered the individual objects, like Linux Server State, Logical Disk State, and Network Adapter state:

image

Run Health explorer on one of the discovered Linux Server State objects.  Remove the filter at the top to see all the monitors for the system:

image

Close health explorer.

Select the Operating System Performance view.   Review the performance counters we collect out of the box for each monitored OS.

image

Out of the box – we discover and apply a default monitoring template to the following objects:

  • Operating System
  • Logical disk
  • Network Adapters

Optionally, you can enable discoveries for:

  • Individual Logical Processors
  • Physical Disks

I don’t recommend enabling additional discoveries unless you are sure that your monitoring requirements cannot be met without discovering these additional objects, as they will reduce the scalability of your environment.

Out of the box – for an OS like RedHat – here is a list of the monitors in place, and the object they target:

image

There are also 50 or more rules enabled out of the box.  46 are performance collection rules for reporting, and 4 rules are event based, dealing with security.  Two are informational letting you know whenever a direct login is made using root credentials via SSH, and when su elevation occurs by a user session.  The other two deal with failed attempts for SSH or SU.

To get more out of your monitoring – you might have other services, processes, or log files that you need to monitor.  For that, we provide Authoring Templates with wizards to help you add additional monitoring, in the Authoring pane of the console under Management Pack templates:

image

image

image

In the reporting pane – we also offer a large number of reports you can leverage, or you can always create your own using our generic report templates, or custom ones designed in Visual Studio for SQL reporting services.

image

As you can see, it is a fairly well rounded solution to include Unix and Linux monitoring into a single pane of glass for your other systems, from the Hardware, to the Operating System, to the network layer, to the applications.

Partners and 3rd party vendors also supply additional management packs which extend our Unix and Linux monitoring, to discover and provide detailed monitoring on non-Microsoft applications that run on these Unix and Linux systems.

 
 
Troubleshooting:

image

The majority of troubleshooting comes in the form of failed discovery/agent deployments.

Microsoft has written a wiki on this topic, which covers the majority of these, and how to resolve:

http://social.technet.microsoft.com/wiki/contents/articles/4966.aspx

And here:

Troubleshooting UNIX and Linux Monitoring | Microsoft Learn

21 Comments

  1. George S.

    Hi Kevin, thank you for the article, great information as always 🙂

    It is a shame that support was dropped for AIX, I have a question for this: are SCOM 2019 AIX agents still able to report to SCOM 2022 (albeit unsupported), or would SCOM 2022 not recognise an AIX agent reporting back at all?

    -George S.

  2. Hamza

    Hello Kevin,
    Would it be possible to monitor server Linux RedHat 8 having PowerPc architecture with this version of Scom ?
    Thanks

  3. Reuven singer

    Hi, you should modify the paths since this applies to SCOM 2022 and not SCOM 2016…

    C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\OM2.cer

    By me this was installed by default in the following path: C:\Program Files\Microsoft System Center\Operations Manager\Server

    • Kevin Holman

      You can use a single monitoring account with different profiles for elevate and not elevate. Or, some prefer to use separate monitoring accounts for this.

  4. Dimitri

    I followed all your instructions, but I always get this error:

    Exception message: Unable to create certificate context ; {ASN1 bad tag value met. }

    I tried to:

    Renew certificates
    Delete and renew certificates
    Cross check the permissions on the profiles

    Nothing worked. I also find very little documentation on the web about this specific error. Any hint ?

    Thanks 🙂

  5. Dimitri

    I forgot: when I start the process to add the server, it does not say “Install agent and manage”, but instead it says “Sign Certificate and Manage”.
    Do I miss something here ?

    • Kevin Holman

      This means someone already installed a SCOM agent on this server, and it will try to overwrite the certificate. Which probably won’t work. You likely need to remove the agent and any certificate files left behind from previous installations, and attempt a clean install.

      • Dimitri

        Omg. I feel very stupid XD
        I will try to understand who installed it and how to remove. By the way, I am using SCOM 2019, but I think the instructions are the same.

        Thanks for help

  6. Dimitri

    I managed to remove the agent installed probably by mistake when we started working on this area (we are not expert with Linux).
    Now it lets me install the angent, but it fails with this odd message:

    Failed to install kit. Exit code: 1
    Standard Output: Sudo path: /usr/bin/

    Standard Error:
    We trust you have received the usual lecture from the local System
    Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

    sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper

    Exception Message:

    Nothing else is written.
    Now I am confused….

  7. Jon Black

    Hello,

    Just to confirm.

    Looking here: https://learn.microsoft.com/en-us/system-center/scom/plan-supported-crossplat-os?view=sc-om-2022#universal-linux-debian-package-1

    With the hotfix: https://support.microsoft.com/en-us/topic/system-center-operations-manager-2022-now-has-openssl3-0-integration-kb-5024286-331bd221-10f9-42d5-bc06-775eaabe3081

    RHEL 9 should be supported. I seem to be having an issue in the earliest part of the discovery process, getting this:
    Exception Message:An exception (-1073479162) caused the SSH command to fail – Server unexpectedly closed network connection

    I’ve check many things and imported RHEL 8 systems without issue. I can connect via SSH from the SCOM MGMT server using the credentials I am using in the discovery, so have confirmed connectivity from that host.

    I’ve confirmed the steps above.

    Any suggestions?

    Thanks,

    Jon

  8. Topher

    This is very helpful, thank you! If a Linux service is detected to be stopped, can SCOM restart the service automatically?

    • Kevin Holman

      Absolutely. You’d need to write a recovery action for this. It’s likely not really easy to do in the SCOM UI, however.

Leave a Reply

Your email address will not be published.