Microsoft started including Unix and Linux monitoring in OpsMgr directly in OpsMgr 2007 R2, which shipped in 2009. Some significant updates have been made to this for OpsMgr 2012. Primarily these updates are around:
- Highly available Monitoring via Resource Pools
- Sudo elevation support for using a low priv account with elevation rights for specific workflows.
- ssh key authentication
- New wizards for discovery, agent upgrade, and agent uninstallation
- Additional PowerShell cmdlets
- Performance and scalability improvements
- New monitoring templates for common monitoring tasks
Now – with SCOM 2016 – we have added:
- Support for additional releases of operating systems: (Link)
- Increased scalability (2x) with asynchronous monitoring workflows
- Easier agent deployment using existing RunAs account credentials
- New Management Packs and Providers for LAMP stack
- New UNIX/Linux Script templates to ease authoring (Link)
- Discovery filters for file systems (Link)
I am going to do a step by step guide for getting this deployed with SCOM 2016. As always – a big thanks to Tim Helton of Microsoft for assisting me with all things Unix and Linux.
High Level Overview:
- Import Management Packs
- Create a resource pool for monitoring Unix/Linux servers
- Configure the Xplat certificates (export/import) for each management server in the pool.
- Create and Configure Run As accounts for Unix/Linux.
- Discover and deploy the agents
Import Management Packs:
The core Unix/Linux libraries are already imported when you install OpsMgr 2016, but not the detailed MP’s for each OS version. These are on the installation media, in the \ManagementPacks directory. Import the specific ones for the Unix or Linux Operating systems that you plan to monitor.
Additionally, there is a download location for Unix/Linux MP’s which have been *UPDATED*, however, the updated MP’s do not contain all Unix/Linux packs, so you should always START by importing the relevant management packs from the SCOM 2016 Media.
Here is an example of the MP’s I will import, which is all the important core libraries, and includes Red Hat, SUSE, and Universal Linux (CentOS, Debian, Oracle, Ubuntu)
Once these above are imported – THEN we can update to the most current ones available for those MP’s that have updates:
The *LATEST* version of these MP’s (and the ones you should be using) are located for download at:
https://www.microsoft.com/en-us/download/details.aspx?id=29696
Download those, and then import any relevant updated libraries. The following screenshot shows version 7.6.1072.0 which was from the SCOM 2016 UR2 timeframe.
***NOTE: You will need to restart the Microsoft Monitoring Agent service on all Management Servers that will monitor Linux systems, after importing these management packs, before continuing. This restart is required to allow each MS to deploy the agent files locally.
Create a resource pool for monitoring Unix/Linux servers
The FIRST step is to create a Unix/Linux Monitoring Resource pool. This pool will be used and associated with management servers that are dedicated for monitoring Unix/Linux systems in larger environments, or may include existing management servers that also manage Windows agents or Gateways in smaller environments. Regardless, it is a best practice to create a new resource pool for this purpose, and will ease administration, and scalability expansion in the future.
Under Administration, find Resource Pools in the console:
OpsMgr ships 3 resource pools by default:
Let’s create a new one by selecting “Create Resource Pool” from the task pane on the right, and call it “UNIX/Linux Monitoring Resource Pool”
Click Add and then click Search to display all management servers. Select the Management servers that you want to perform Unix and Linux Monitoring. If you only have 1 MS, this will be easy. For high availability – you need at least two management servers in the pool.
Add your management servers and create the pool. In the actions pane – select “View Resource Pool Members” to verify membership.
Configure the Xplat certificates (export/import) for each management server in the pool
Operations Manager uses certificates to authenticate access to the computers it is managing. When the Discovery Wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent.
To configure for high availability, each management server in the resource pool must have all the root certificates that are used to sign the certificates that are deployed to the agents on the UNIX and Linux computers. Otherwise, if a management server becomes unavailable, the other management servers would not be able to trust the certificates that were signed by the server that failed.
We provide a tool to handle the certificates, named scxcertconfig.exe. Essentially what you must do, is to log on to EACH management server that will be part of a Unix/Linux monitoring resource pool, and export their SCX (cross plat) certificate to a file share. Then import each others certificates so they are trusted.
If you only have a SINGLE management server, or a single management server in your pool, you can skip this step, then perform it later if you ever add Management Servers to the Unix/Linux Monitoring resource pool.
In this example – I have two management servers in my Unix/Linux resource pool, MS1 and MS2. Open a command prompt on each MS, and export the cert:
On MS1:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS1.cer
On MS2:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS2.cer
Once all certs are exported, you must IMPORT the other management server’s certificate:
On MS1:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS2.cer
On MS2:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS1.cer
If you fail to perform the above steps – you will get errors when running the Linux agent deployment wizard later.
Create and Configure Run As accounts for Unix/Linux
Next up we need to create our run-as accounts for Linux monitoring. This is documented here: (Link)
We need to select “UNIX/Linux Accounts” under administration, then “Create Run As Account” from the task pane. This kicks off a special wizard for creating these accounts.
Lets create the Monitoring account first. Give the monitoring account a display name, and click Next.
On the next screen, type in the credentials that you want to use for monitoring the UNIX/Linux system(s). These accounts must exist on each UNIX/Linux system and have the required permissions granted:
On the above screen – you have two choices. You can use a privileged account for handling monitoring, or you can use an account that is not privileged, but elevated via sudo. I will configure this with the most typical customer scenario – which is to leverage sudo elevationwhich is specifically granted in the sudoers file. (more on that later)
On the next screen, always choose “more secure” and click “Create”
Now – since we chose More Secure – we must choose the distribution of the Run As account. Find your “UNIX/Linux Monitoring Account” under the UNIX/Linux Accounts screen, and open the properties. On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search. Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK. This will distribute this account credential to all Management servers in our pool:
Next up – we will create the Agent Maintenance Account.
This account is used for SSH, to be able to deploy, install, uninstall, upgrade, sign certificates, all dealing with the agent on the UNIX/Linux system.
Give the account a name:
From here you can choose to use a SSH key, or a username and password credential only. You also can choose to leverage a privileged account, or a regular account that uses sudo. I will be choosing the most typical – which is an account that will leverage sudo:
Next – depending on your OS and elevation standards – choose to use SUDO or SU:
On the next screen, always choose “more secure” and click “Create”
Now – since we chose More Secure – we must choose the distribution of the Run As account. Find your “UNIX/Linux Agent Maintenance Account” under the UNIX/Linux Accounts screen, and open the properties. On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search. Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK. This will distribute this account credential to all Management servers in our pool:
Next up – we must configure the Run As profiles.
There are three profiles for Unix/Linux accounts:
The agent maintenance account is strictly for agent updates, uninstalls, anything that requires SSH. This will always be associated with a privileged (or sudo elevated) account that has access via SSH, and was created using the Run As account wizard above.
The other two Profiles are used for Monitoring workflows. These are:
Unix/Linux Privileged account
Unix/Linux Action Account
The Privileged Account Profile will always be associated with a Run As account like we created above, that is Privileged OR a unprivileged account that has been configured with elevation via sudo. This is what any workflows that typically require elevated rights will execute as.
The Action account is what all your basic monitoring workflows will run as. This will generally be associated with a Run As account, like we created above, but would be used with a non-privileged user account on the Linux systems, and wont request sudo elevation.
***A note on sudo elevated accounts:
- sudo elevation must be passwordless.
- requiredtty must be disabled for the user.
For my example – I am keeping it very simple. I created two Run As accounts, one for monitoring and one for agent maintenance. I will associate these Run As account to the appropriate RunAs profiles.
I will start with the Unix/Linux Action Account profile. Right click it – choose properties, and on the Run As Accounts screen, click Add, then select our “UNIX/Linux Monitoring Account”. Leave the default of “All Targeted Objects” and click OK, then save.
Repeat this same process for the Unix/Linux Privileged Account profile, and associate it with your “UNIX/Linux Monitoring Account”.
Repeat this same process for the Unix/Linux Agent Maintenance Account profile, but use the “Unix/Linux Agent Maintenance Account”.
Discover and deploy the agents
Run the discovery wizard.
Click “Add”:
Here you will type in the FQDN of the Linux/Unix agent, its SSH port, and then choose All Computers in the discovery type. ((We have another option for discovery type – if you were manually installing the Unix/Linux agent (which is really just a simple provider) and then using a signed certificate to authenticate))
Check the box next to “Use Run As Credentials”. This will leverage our existing Agent Maintenance account for the discovery and deployment.
Click “Save”. On the next screen – select a resource pool. We will choose the resource pool that we already created.
Click Discover, and the results will be displayed:
Check the box next to your discovered system – and click “Manage” to deploy the agent.
DOH!
There are many reasons this could fail. The most common is rights on the UNIX/Linux systems you are trying to manage. In this case – I didn’t configure SUDO on the Linux box. Lets discuss that now.
I need to modify the /etc/sudoers file on each UNIX/Linux server, to grant the granular permissions.
NOTE: The sudoers configuration has changed from SCOM 2012 R2 to SCOM 2016. This is because we no longer install each package directly (such as .rpm packages). Now, each agent is included in a .sh file that has logic to determine which packages are applicable, and install only those. Because of this – even if you configured sudoers for SCOM 2012 R2 and previous support, you will need to make some modifications.
Here is a sample sudoers file for all operating systems, in SCOM 2016:
#----------------------------------------------------------------------------------- #Example user configuration for Operations Manager 2016 agent v1.1 #Example assumes users named: scxmaint & scxmon #Replace usernames & corresponding /tmp/scx-<username> specification for your environment ##General requirements #These are any accounts you are using that use SUDO elevation including the Agent Maintenance account and or the monitoring account Defaults:scxmaint !requiretty Defaults:scxmon !requiretty ##Agent maintenance #Agent maintenance for LINUX #Certificate signing scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem #Agent maintenance for UNIX #Certificate signing scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem ##Install or upgrade #AIX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #HPUX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #RHEL scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #RHEL 7.1 PPC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].ppc.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].ppc.sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #SLES scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #SOLARIS 10 scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.sparc.sh --install * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.sparc.sh --upgrade --force * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.x86.sh --install * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.x86.sh --upgrade --force * #SOLARIS 11 scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #UNIVERSAL LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC ##Uninstall #Uninstall for LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall #Uninstall for UNIX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c /opt/microsoft/scx/bin/uninstall ##Log file monitoring scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p ###Examples #Custom shell command monitoring example – replace <shell command> with the correct command string # scxmon ALL=(root) NOPASSWD: /bin/bash -c <shell command> #Daemon diagnostic and restart recovery tasks example (using cron) #scxmon ALL=(root) NOPASSWD: /bin/sh -c ps -ef | grep cron | grep -v grep #scxmon ALL=(root) NOPASSWD: /usr/sbin/cron & #End user configuration for Operations Manager agent #-----------------------------------------------------------------------------------
Since the above file contains ALL OS’s and examples, I am going to trim it down to just what I need for this Ubuntu Linux system:
#----------------------------------------------------------------------------------- #Ubuntu Linux configuration for Operations Manager 2016 agent ##General requirements Defaults:scxmaint !requiretty Defaults:scxmon !requiretty ##Agent maintenance #Certificate signing scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem ##Install or upgrade #UNIVERSAL LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC ##Uninstall scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall ##Log file monitoring scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p #-----------------------------------------------------------------------------------
I will edit my sudoers file and insert this configuration. You can use vi, visudo, or my personal favorite since I am a Windows guy – download and install WINSCP, which will allow a gui editor of the files and helps anytime you need to transfer files to and from Windows and UNIX/Linux using SSH. Generally we want to place this configuration in the appropriate section of the sudoers file – not at the end. There are items at the end of the file that need to stay there. I put this right after the existing “Defaults” section in the existing sudoers configuration, and save it.
Now – back in SCOM – I retry the deployment of the agent:
This will take some time to complete, as the agent is checked for the correct FQDN and certificate, the management servers are inspected to ensure they all have trusted SCX certificates (that we exported/imported above) and the connection is made over SSH, the package is copied down, installed, and the final certificate signing occurs. If all of these checks pass, we get a success!
There are several things that can fail at this point. See the troubleshooting section at the end of this article.
Monitoring Linux servers:
Assuming we got all the way to this point with a successful discovery and agent installation, we need to verify that monitoring is working. After an agent is deployed, the Run As accounts will start being used to run discoveries, and start monitoring. Once enough time has passed for these, check in the Administration pane, under Unix/Linux Computers, and verify that the systems are not listed as “Unknown” but discovered as a specific version of the OS:
Here is is immediately – before the discoveries complete:
Here is what we expect after a few minutes:
Next – go to the Monitoring pane – and select the “Unix/Linux Computers” view at the top. Look that your systems are present and there is a green healthy check mark next to them:
Next – expand the Unix/Linux Computers folder in the left tree (near the bottom) and make sure we have discovered the individual objects, like Linux Server State, Logical Disk State, and Network Adapter state:
Run Health explorer on one of the discovered Linux Server State objects. Remove the filter at the top to see all the monitors for the system:
Close health explorer.
Select the Operating System Performance view. Review the performance counters we collect out of the box for each monitored OS.
Out of the box – we discover and apply a default monitoring template to the following objects:
- Operating System
- Logical disk
- Network Adapters
Optionally, you can enable discoveries for:
- Individual Logical Processors
- Physical Disks
I don’t recommend enabling additional discoveries unless you are sure that your monitoring requirements cannot be met without discovering these additional objects, as they will reduce the scalability of your environment.
Out of the box – for an OS like RedHat Enterprise Linux 5 – here is a list of the monitors in place, and the object they target:
There are also 50 or more rules enabled out of the box. 46 are performance collection rules for reporting, and 4 rules are event based, dealing with security. Two are informational letting you know whenever a direct login is made using root credentials via SSH, and when su elevation occurs by a user session. The other two deal with failed attempts for SSH or SU.
To get more out of your monitoring – you might have other services, processes, or log files that you need to monitor. For that, we provide Authoring Templates with wizards to help you add additional monitoring, in the Authoring pane of the console under Management Pack templates:
In the reporting pane – we also offer a large number of reports you can leverage, or you can always create your own using our generic report templates, or custom ones designed in Visual Studio for SQL reporting services.
As you can see, it is a fairly well rounded solution to include Unix and Linux monitoring into a single pane of glass for your other systems, from the Hardware, to the Operating System, to the network layer, to the applications.
Partners and 3rd party vendors also supply additional management packs which extend our Unix and Linux monitoring, to discover and provide detailed monitoring on non-Microsoft applications that run on these Unix and Linux systems.
Troubleshooting:
The majority of troubleshooting comes in the form of failed discovery/agent deployments.
Microsoft has written a wiki on this topic, which covers the majority of these, and how to resolve:
http://social.technet.microsoft.com/wiki/contents/articles/4966.aspx
- For instance – if your DNS name that you provided does not match the DNS hostname on the Linux server, or match it’s SSL certificate, or if you failed to export/import the SCX certificates for multiple management servers in the pool, you might see:
Agent verification failed. Error detail: The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.
The solution to these common issues is covered in the Wiki with links to the product documentation.
- Perhaps – you failed to properly configure your Run As accounts and profiles. You might see the following show as “Unknown” under administration:
Or you might see alerts in the console:
Alert: UNIX/Linux Run As profile association error event detected
The account for the UNIX/Linux Action Run As profile associated with the workflow “Microsoft.Unix.AgentVersion.Discovery”, running for instance “rh5501.opsmgr.net” with ID {9ADCED3D-B44B-3A82-769D-B0653BFE54F9} is not defined. The workflow has been unloaded. Please associate an account with the profile.
This condition may have occurred because no UNIX/Linux Accounts have been configured for the Run As profile. The UNIX/Linux Run As profile used by this workflow must be configured to associate a Run As account with the target.
Either you failed to configure the Run As accounts, or failed to distribute them, or you chose a low priv account that is not properly configured for sudo on the Linux system. Go back and double-check your work there.
If you want to check if the agent was deployed to a RedHat system, you can provide the following command in a shell session:
More good troubleshooting links and useful info:
Enable logging: https://technet.microsoft.com/en-us/library/ee344801.aspx
Hi Kevin!
We are having an issue with SCOM 2016/OMS agent in some Oracle Linux Servers in Azure. In /var/log/messages I see a segfault in omiengine:
Dec 12 04:34:29 azubdserver01 kernel: omiengine[118112]: segfault at a0000000f ip 00000000004741ca sp 00007fff9597f0a0 error 4 in omiengine[400000+d5000]
Dec 12 04:34:30 azubdserver01 systemd: omid.service: main process exited, code=exited, status=1/FAILURE
Dec 12 04:34:30 azubdserver01 omiserver: /opt/omi/bin/omiserver: server is not running
Dec 12 04:34:30 azubdserver01 systemd: Unit omid.service entered failed state.
Dec 12 04:34:30 azubdserver01 systemd: omid.service failed.
Dec 12 04:34:35 azubdserver01 systemd: omid.service holdoff time over, scheduling restart.
After that, we received alerts from SCOM:
SCOM: Not Present\azubdserver01 — Cannot resolve Hostname Resolution state: New
SCOM: Not Present\azubdserver01 — Heartbeat failed Resolution state: New
Sometimes, it recovers automatically with auto restart:
Dec 12 04:41:22 azubdserver01 systemd: omid.service holdoff time over, scheduling restart.
Dec 12 04:41:22 azubdserver01 systemd: Starting OMI CIM Server…
Dec 12 04:41:22 azubdserver01 systemd: PID file /var/opt/omi/run/omiserver.pid not readable (yet?) after start.
Dec 12 04:41:22 azubdserver01 systemd: Started OMI CIM Server.
And alerts are closed:
SCOM: Not Present\azubdserver01 — Heartbeat failed Resolution state: Closed
SCOM: Not Present\azubdserver01 — Cannot resolve Hostname Resolution state: Closed
But many times, we have to restart agent manually to solve the problem with the command:
/opt/microsoft/scx/bin/tools/scxadmin -restart all
Have you seen this issue before?
Any suggestion for “kernel: omiengine[118112]: segfault at a0000000f ip 00000000004741ca sp 00007fff9597f0a0 error 4 in omiengine[400000+d5000]” error?
Thanks!
Hi this is a great great article, thanks a lot for it!
Most of the links are broken BTW.
Hello
First thank you for all your help!
I am currently deploying SCOM on several linux servers, and i have the following error on some of them (Ubuntu 18.04):
“Failed to install kit. Exit code: 60
Standard Output: Sudo path: /usr/bin/
Extracting…
Installing cross-platform agent …
—– Queuing package: omi (omi-1.6.1-0.ulinux.x64) for installation —–
Error: This system does not have a supported version of OpenSSL installed.
This system’s OpenSSL version: 1.1.0g
Supported versions: 1.0.*
Standard Error:
Exception Message: ”
i have installed the latest version of the UNIX/Linux MP as linked in your UR7 “how to” webpage.
(I am waiting approval to install the UR7 in our infrastructure, but i hope to solve the current issue before that.)
Thx a lot for your help
Olivier Swinnen
I believe SCOM 2019 is the first SCOM that supported Ubuntu 18.04. What version of SCOM are you using?
Hi Kevin !
The same thing with Ubuntu 18.04 and OpsMgr 1807
A very common error is raising during an agent installation with message started from “Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. ”
We’ve checked all the parts might interfering
sudoers file (taken from a fresh Technet page)
name resolving (all works fine)
Also, we use our environment to deploy agents on Centos 7 servers with no issues.
We are totally exhausted with it, have no idea..
Additionally we’ve found out that the Microsoft documentation site says 1807 supports Ubuntu 18.04, but you mentioned here, it isn’t true…Could you, please make it clear – is it possible to push agent from 1807 environment to Ubuntu 18.04 ?
Here is my answer 8)
Yes, it supports !
But you need to add ‘python-ctypes’ module to Ubuntu to get the agent succesfully installed through pushing.
We’ve done it and now we’re able to monitor an Ubuntu 18.04 server .
Is there any way (local config file, SQL table) to determine which Management Server/Gateway in the Resource Pool manages a particular Linux/Unix computer
Hi Kevin,
I’d like to add a special case: using AD account(s) as Linux RunAs account. Testing on RHEL7, I found that:
* using your example with search-and-replace using the AD account as user@domain.ext didn’t work
* using user@domain.ext as member of our AD group that has sudo access, didn’t work either
* using user@domain.ext did work with ALL=(ALL) NOPASSWD:ALL but only AFTER I used sudo once with this account
After quite some trial, error and a bit of frustration here and there (ok, more than a bit 🙂 ), I managed to find the cause and come up with a solution:
* in the defaults, add “,!lecture” to skip the lecture notice
* on all sudo lines, add the @domain.ext part ONLY to the account at the start of the line. Apparently, the install script on RHEL will only use the username part when creating the temp folders. When doing a search-and-replace in your code substituting scxmaint by user@domain.ext, this will include the @domain.ext part in the sudoers path names as well. The user will have sudo permission for /tmp/user@domain.ext/install.sh but the file will be /tmp/user/install.sh, and sudo will fail. As an example, here’s the install line for RHEL:
scxmaint@domain.ext ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh –install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
Hi, Thanks for a great article. I’m being presented with the following error when attempting a discovery on Ubuntu 16.04 server. I’ve followed the guide fully.
0: 08/19/19 10:40:33 : Enter SCXCertWriteAction::DoInit
0: 08/19/19 10:40:33 : XML_INIT_CALL
0: 08/19/19 10:40:33 : Exit SCXCertWriteAction::DoInit
1: 08/19/19 10:40:35 : Enter SCXCertWriteAction::DoProcess
1: 08/19/19 10:40:35 : passed initial arguments validation
1: 08/19/19 10:40:35 : local module, no credentials required
1: 08/19/19 10:40:35 : cert_ws: “Sudo path: /usr/bin/”
1: 08/19/19 10:40:35 : Failed SCXCertWriteAction::DoProcess — ScxCertLibException: Unable to create certificate context
; {ASN1 bad tag value met.
}
I’ve looked in the other log files and can see errdata: sudo: no tty present and no askpass program specified
Any ideas what could cause this?
Hey Nick, did you find a solution for this? I’m getting the same errors, trying to discover a RHEL 7 box.
Hey Lee/Nick
I am facing same issue as you did you find any solution for this
The certificate is created when the agent installs and would thus only be written to when the cross signing occurs (assuming non CA cert)
if you have issues writing (as the error states) check sudoers for the permissions on the certificate folder/files under #Certificate signing in the example supplied above by Kevin
note if you don’t have sudo in the normal location or use a different method you need to update the symbolic link that scom uses to refer to it.
Hi Kevin
We are trying to discover a Linux appliance device for monitoring.
but for this device we are not suppose to use elevated sudo level permission for Run As or Maint Account .
What is the possibility to discover these systems in SCOM2016 (1807) without using Sudo level permissions.
in the discovery wizard specify ‘this account has privileged access’ and it will not try and sudo (ie its what you do when using root) this is typically a BAD thing, but if its the only way its what you have to do.
Dear Kevin,
many thanks for this great instruction. We went through all steps included the sudoers part. But we are still facing the problem that is is telling us:
Failed to install kit. Exit code: 1
Standard Output: Sudo path: /usr/bin/
Standard Error:
Wir gehen davon aus, dass der lokale Systemadministrator Ihnen die
Regeln erkl.rt hat. Normalerweise l.uft es auf drei Regeln hinaus:
#1) Respektieren Sie die Privatsph.re anderer.
#2) Denken Sie nach, bevor Sie tippen.
#3) Mit gro.er Macht kommt gro.e Verantwortung.
sudo: Kein TTY vorhanden und kein .askpass.-Programm angegeben
Exception Message:
We are trying to add a RedHat 7.7 server into SCOM. Due to the reason that I do not have deeper knwoledge into Linux I do not know where to check why it fails. The progress from SCOM is telling is “pending -> deploy -> install” and then fails.
What might be the reason for the job failing?
MAny thanks in advance
Hi Daniiel,
your sudoers configuration is incorrect.
Kevin has linked an old configuration file example.
The SCOM agent build version has changed from 3 digits to just 1 (1.6.3-793 -> 1.6.4-7) and therefore the following sudoers entries are no longer working.
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –upgrade –force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
You could add the following for the universal agent install:
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –upgrade –force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
the joys of regex and trying to be as restrictive as possible, and only having the ‘current’ naming scheme not one they changed it to base your example on. You could just open that up a lot and never worry about it but trying to be sensible you always run the risk this will happen.
and yes it does suck they changed it. as a lot of us did restrict this to require elevation and not use root long before Kevin posted this article
you could add a reference for one digit or keep it flexible and update it to handle one to three values and just have the one entry ie
scx-1.[5-9].[0-9]-[0-9]{1,3}
its always easier to manage one set of entries than two, well at least in my humble opinion. I’m not regex guru, but its worth know enough to tweak these things as needed.
hi Kevin,
I like what you have wrote and I am unix guy. Currently, I would like to know on how can I import my perl monitoring script that runs on HP agent. Is there a good way of creating a custom management pack?
Pingback:Monitoring Red Hat 6 with SCOM 2019 | TopQore Blog
Hello Kevin,
I have followed the same process as you mention(we have SCOM 1807) and tried to push agent ubuntu 16.04 but getting below error
Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. The workflow “Microsoft.Linux.UniversalD.1.Agent.Install.Task” has been unloaded.
Module: SCXCertWriteAction
Location: DoProcess
Exception type: ScxCertLibException
Exception message: Unable to create certificate context
; {ASN1 bad tag value met.
}
Additional data: Sudo path: /usr/bin/
Please help me how to resolve
Hello Kevin,
I have followed same process as you mention above.we have 1807 environment and tried to push agent on Ubuntu 16.04 but I am getting below error
Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. The workflow “Microsoft.Linux.UniversalD.1.Agent.Install.Task” has been unloaded.
Module: SCXCertWriteAction
Location: DoProcess
Exception type: ScxCertLibException
Exception message: Unable to create certificate context
; {ASN1 bad tag value met.
}
Additional data: Sudo path: /usr/bin/
Please help me with the solution
SCOM 1807 is no longer supported. I’d recommend upgrading to SCOM 2019.
HI Kevin,
I know 1807 no longer supported but I have to fixed this issue so please help me with the suggestion
Hello Kevin,
I am monitoring several Linux servers with my SCOM 2016 and they all are healthy in SCOM.
But the problem is that SCOM doesn’t generate “Heartbeat Failed” alert ever for any of my Linux server, even if the server is down for hours.
On checking the corresponding monitor(UNIX/Linux Heartbeat Monitor), I found below information:
“Monitor ensures that the CIM server daemon is running and reachable” and the problem is that none of my Linux server has this CIM server daemon on it .
Please suggest if it is necessary to have CIM daemon at Linux servers to generate “Heartbeat Failed” alert.
We have made many changes to improve heartbeats for Linux. Please ensure you are on the latest Update Rollup, and that you have updated the Linux MP’s accordingly.
the CIM server daemons part of the agent. off memory scom will heartbeat via winrm call you can verify this via
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username: -password: -r:https://:1270/wsman -auth:basic -skipCACheck -skipCNCheck -skiprevocationcheck -encoding:utf-8
skipping the ca checks isn’t important just quicker and can help identify if there are certificate issues by testing with and without it set.
I’ve always found with the agent stopped/erroring (ie 503) I do get heartbeats, however no notification on a reboot as it can occur quick enough to not trigger one, but you can make your own monitor for that.
Hi Kevin, thanks a lot for the great guide you post on this site. Actually I have succeeded in most of the steps but not capable of going through the installation of the agent in a CentOS7 Linux machine.
The host is discovered but error messages like this keep going on no matter what changes and generous permissions I give to the scomagent user.
Following the error message when trying to install agent and relative dependencies:
Failed to install kit. Exit code: 1
Standard Output: Extracting…
Installing cross-platform agent …
—– Queuing package for upgrade: omi (omi-1.6.1-0.ulinux.x64) —–
Skipping package since existing version >= version available
—– Queuing package: scx (scx-1.6.2-343.universal.x64) for installation —–
—– Installing packages: 100/scx-1.6.2-343.universal.x64.rpm —–
Checking if Apache is installed …
Apache found, Apache agent will be installed
Extracting…
Installing Apache agent …
—– Installing package: apache-cimprov (apache-cimprov-1.0.1-10.universal.1.x86_64) —–
Detected Apache v2.4 …
Checking if MySQL is installed …
MySQL found, MySQL agent will be installed
Extracting…
Installing MySQL agent …
—– Installing package: mysql-cimprov (mysql-cimprov-1.0.1-5.universal.x86_64) —–
Standard Error: error: can’t create transaction lock on /var/lib/rpm/.rpm.lock (Permission denied)
error: can’t create transaction lock on /var/lib/rpm/.rpm.lock (Permission denied)
error: can’t create transaction lock on /var/lib/rpm/.rpm.lock (Permission denied)
Exception Message:
Since I’m a Linux sysadmin mostly i know that this:
can’t create transaction lock on /var/lib/rpm/.rpm.lock (Permission denied)
depends on permissions.
Quite strange is the fact that the visudo file addition as you suggest after the Defaults section does not give much of a result. Here the edited part:
#———————————————————————————–
##General requirements #These are any accounts you are using that use SUDO elevation including the Agent Maintenance account and or the monitoring account
Defaults:scomagent !requiretty
Defaults:scommon !requiretty
#Example user configuration for Operations Manager agent
#Example assumes users named: scomagent & scommon
#Replace usernames & corresponding /tmp/scx- specification for your environment
#General requirements
##Defaults:scomagent !requiretty #already defined
#Agent maintenance
##Certificate signing
scomagent ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scomagent/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scomagent; /opt/microsoft/scx/bin/tools/scxadmin -restart
scomagent ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem
scomagent ALL=(root) NOPASSWD: /bin/sh -c if test -f /opt/microsoft/omsagent/bin/service_control; then cp /tmp/scx-scomagent/omsadmin.conf /etc/opt/microsoft/omsagent/scom/conf/omsadmin.conf; /opt/microsoft/omsagent/bin/service_control restart scom; fi
##Install or upgrade
scomagent ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scomagent/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –install; EC=$?; cd /tmp; rm -rf /tmp/scx-scomagent; exit $EC
scomagent ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scomagent/scx-1.[5-9].[0-9]-[0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –upgrade –force; EC=$?; cd /tmp; rm -rf /tmp/scx-scomagent; exit $EC
##Uninstall
#scomagent ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall
scomagent ALL=(root) NOPASSWD: /bin/sh -c if test -f /opt/microsoft/omsagent/bin/omsadmin.sh; then if test “$(/opt/microsoft/omsagent/bin/omsadmin.sh -l | grep scom | wc -l)” \= “1” && test “$(/opt/microsoft/omsagent/bin/omsadmin.sh -l | wc -l)” \= “1” || test “$(/opt/microsoft/omsagent/bin/omsadmin.sh -l)” \= “No Workspace”; then /opt/microsoft/omsagent/bin/uninstall; else /opt/microsoft/omsagent/bin/omsadmin.sh -x scom; fi; else /opt/microsoft/scx/bin/uninstall; fi
##Log file monitoring
scommon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p
###Examples
#Custom shell command monitoring example replace with the correct command string
scommon ALL=(root) NOPASSWD: /bin/sh -c echo error
#Daemon diagnostic and restart recovery tasks example (using cron)
#scommon ALL=(root) NOPASSWD: /bin/sh -c ps -ef | grep cron | grep -v grep
#scommon ALL=(root) NOPASSWD: /usr/sbin/cron &
#End user configuration for Operations Manager agent
#———————————————————————————–
If you have any suggestions or any hints it would be really appreciated.
Does SCOM 2016 support AIX 5.3 devices? I am getting multiple 11904 event IDs for AIX 5.3 servers. The error description is as below:
The Microsoft Operations Manager Expression Filter Module failed to query the delivered item, item was dropped.
Property Expression: $Data/WsManData/*[local-name(.)=’SCX_EthernetPortStatistics’]/*[local-name(.)=’BytesTotal’]$
Error: 0x80004005
One or more workflows were affected by this.
Workflow name: Microsoft.AIX.5.3.NetworkAdapter.BytesTotalSec.Collection
Instance name: en1
Instance ID: {770DC8B5-0A07-1A38-302F-B8874DDCB3A2}
These events are logged only for AIX 5.3 boxes. Other versions of OS are working fine.
Hello,
Thank’s for that great guide.
I’m trying to monitor Ubuntu 18.04 from SCOM 2019 UR1 and nothing works, is that possible ?
After following step by step that guide.
I succeeded to deploy the agent (only when i specified manualy the maintenance account, if i use the runas account, i have “ws man access denied”) but after few minutes the monitor is just greyed out..
Some informations :
sudo scxadmin -status
omiserver: is running
omiagent: is stopped
sudo cat /var/opt/omi/log/omiserver.log
2020/08/12 18:25:20 [12242,12242] WARNING: null(0): EventId=30119 Priority=WARNING ssl-read: unexpected sys error 0
2020/08/12 18:43:31 [12857,12857] WARNING: null(0): EventId=30102 Priority=WARNING SELECTOR_TIMEOUT reached; so failed
sudo cat /var/opt/omi/log/omiagent.root.root.log
2020/08/12 18:46:52 [13088,13088] WARNING: null(0): EventId=30164 Priority=WARNING XmlSerializer_SerializeClass with flags f00
2020/08/13 09:50:09 [18727,18727] WARNING: null(0): EventId=30164 Priority=WARNING XmlSerializer_SerializeClass with flags f00
sudo cat /var/log/auth.log
Aug 13 10:08:02 Ubuntu_server omiserver: pam_unix(omi:auth): check pass; user unknown
Aug 13 10:08:02 Ubuntu_server omiserver: pam_unix(omi:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
Aug 13 10:08:04 Ubuntu_server omiserver: pam_unix(omi:auth): check pass; user unknown
Aug 13 10:08:04 Ubuntu_server omiserver: pam_unix(omi:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
For the tests in sudoers :
Defaults:scomlinuxmain !requiretty
maintenanceaccout ALL=(root) NOPASSWD: ALL
monitoringaccount ALL=(root) NOPASSWD: ALL
WSMan test From the Management Server to the Ubuntu Server :
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:maintenanceaccout -password:xxxxx -r:https://Ubuntu_Server:1270/wsman -auth:basic -skipCACheck -skipCNCheck -skiprevocationcheck -encoding:utf-8
SCX_Agent
InstanceID = null
Caption = SCX Agent meta-information
Description = Release_Build – 20181107
ElementName = null
InstallDate
Datetime = 2020-08-13T09:21:16Z
Name = scx
OperationalStatus = null
StatusDescriptions = null
Status = null
HealthState = null
CommunicationStatus = null
DetailedStatus = null
OperatingStatus = null
PrimaryStatus = null
VersionString = 1.6.3-793
MajorVersion = 1
MinorVersion = 6
RevisionNumber = 3
BuildNumber = 793
BuildDate = 2018-11-07T00:00:00Z
Architecture = x64
OSName = Ubuntu
OSType = Linux
OSVersion = 18.04
KitVersionString = 1.6.3-793
Hostname = servername.domain.com
OSAlias = UniversalD
UnameArchitecture = x86_64
MinActiveLogSeverityThreshold = INFO
MachineType = Virtual
PhysicalProcessors = 1
LogicalProcessors = 4
NB : I also tryied to deploy manually the agent and sign manually the certificate but same issue.
Jérémy
Have you managed to fix this?
Hello Jeremy, were you able to fix this issue?
seems your run as accounts are miss configured.
make sure they are Linux accounts. and make sure to configure correct sudoers file in linux
also make sure the correct account is linked with the correct run as profile.
Kevin,
I am getting this error as have others. Can you help with this error?
Module: SCXCertWriteAction
Location: DoProcess
Exception type: ScxCertLibException
Exception message: Unable to create certificate context
; {ASN1 bad tag value met.
}
Additional data: Sudo path: /etc/opt/microsoft/scx/conf/sudodir/
thanks,
Russell
Context would help. What action are you taking when getting the error? What OS version is the linux system? What SCOM version and UR are you at? 🙂
Kevin,
My apologies. I was attempting to add a linux agent. I am using SCOM 2019 10.19.10050.0 UR1.
thanks,
Russell
Kevin,
My apologies. I was attempting to add a linux agent. I am using SCOM 2019 10.19.10050.0.
thanks,
Russell
Hi Kevin,
I am new to scom 2016
Where do we get the unix/Linux agent dumps….is it needs to be download from any site or there any specific location in the scom server.
I have redhat Linux 7.3 server to monitor and I am not able to find the agent under agent folder.
Please advice.
when using the discovery wizard scom can deploy the agent provided all the permissions have been set first
or if it didnt drop the agent in the downloadedkits folder (C:\Program Files\Microsoft System Center\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKit) Silect have tool that easily lets you export the .sh file that has the agent as a tar archive inside of it from the management pack bundle
ie
C:\Utils\MPBUtil\MPBUtil.exe -extract .\Microsoft.Linux.RHEL.7.mpb agents
Hi Kevin,
Did you documented any on scom 2016/2019 agents on unix/linux servers. Please update if any.
Hi Kenvin,
Did you get any chance to look at it. Manual installation of solaris,AIX,Cent os and ubuntu..i checked MS doc.
one more clarification needed about the version support…If scom 2019 supports RedHat 8 means is it supports older versions. If it supports older version, is it supports from the statring version.\
please let me know about all unix/linux flavours. Thanks in advance.
Hi Kevin,
Did you get any chance to look at it. Manual installation of solaris,AIX,Cent os and ubuntu..i checked MS doc.
one more clarification needed about the version support…If scom 2019 supports RedHat 8 means is it supports older versions. If it supports older version, is it supports from the statring version.\
please let me know about all unix/linux flavours. Thanks in advance.
Hi,
Went through this journey to monitor Debian servers on SCOM 2019. Here is a little update for Sudoers file.
It’s better to use separate file in sdoers.d directory, these survive version upgrades:
touch /etc/sudoers.d/scom_sudoers
chmod 440 /etc/sudoers.d/scom_sudoers
visudo -f /etc/sudoers.d/scom_sudoers
then the content is a updates in there (this is for debian/universal) Updated part is regex for command with agent version:
#———————————————————————————–
#Linux configuration for Operations Manager agent
##General requirements
Defaults:scxmaint !requiretty
Defaults:scxmon !requiretty
##Agent maintenance
#Certificate signing
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem
##Install or upgrade
#UNIVERSAL LINUX
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9]+.universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –install –enable-opsmgr; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9]+.universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh –upgrade –force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
##Uninstall
scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall
##Log file monitoring
scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p
#———————————————————————————–
previous regex was
scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh
that expects exec format like this scx-1.6.6-001.universald…. (three digits after – )
and current script name is scx-scxmaint/scx-1.6.6-0.universald.1.x64.sh
Hi Kevin,
for Linux Kerberos authentication – documentation says: “Enabling Kerberos authentication assumes all UNIX and Linux agents communicating with the management server support Kerberos. Mixed mode authentication where some agents use basic authentication and others leverage Kerberos is not supported.”
Does that mean Management Group wide or can some gateways still support basic authentication like DMZ servers that usually have no Kerberos.
Thank you Kevin for the amazing article !
Can we monitor Redhat 8 with SCOM 2016 ?
No. This is covered in the product documentation: https://docs.microsoft.com/en-us/system-center/scom/plan-supported-crossplat-os?view=sc-om-2016
RHEL 8 is supported in SCOM 2019 UR1 and later: https://docs.microsoft.com/en-us/system-center/scom/plan-supported-crossplat-os?view=sc-om-2019
Hi Kevin,
After installing linux mp i cant see any changes in my administraion view in console. I am not able to discover linux machines or creating Unix/Linux run-as accounts because these things is not visible in scom console.
What am i missing here?
These are the files i have installed (microsoft montoring agent is restartad):
Microsoft.Linux.UniversalD.1 Universal Linux (Debian) Discovery 10.19.1147.0
Microsoft.Linux.Universal.Monitoring Universal Linux Monitoring 10.19.1147.0
Microsoft.Linux.UniversalR.1 Universal Linux (RPM) Discovery 10.19.1147.0
Microsoft.Linux.Universal.Library Universal Linux Operating System Library 10.19.1147.0
Microsoft.Linux.Library Linux Operating System Library 10.19.1147.0
Microsoft.Unix.Process.Library UNIX/Linux Process Monitoring Library 10.19.1147.0
Microsoft.Unix.LogFile.Library UNIX/Linux Log File Monitoring Library 10.19.1147.0
Microsoft.Unix.ShellCommand.Library UNIX/Linux Shell Command and ScriptLibrary 10.19.1147.0
Microsoft.Unix.Library UNIX/Linux Core Library 10.19.1147.0
Solved!
I was missing this files:
Microsoft.Unix.Image.Library.mp
Microsoft.Unix.Views.mp
Microsoft.Unix.ConsoleLibrary.mp
Hi Kevin,
we have added Oracle linux 8 agent to scom manager, getting below error , can you please what is the issues
ssh key ,sudoers looks good on the linux server
Standard Error:
Exception Message:An exception (-1073479102) caused the SSH command to fail – Server’s host key did not match the signature supplied
Have you solved your issue?
Thanks!
Hi,
I have the same issue after trying to install the agent on a Linux server which has been the OS replace with a new install. It seems that the SSH client of SCOM keep the old SSH ID. Tried to put the IP only in the discovery but it was being resolved for the DNS…
Running into the same Issue, did you ever figure out how to fix it?
BR
Running into the same Issue, did you ever figure out how to fix it?
GP
Hi Kevin – I appreciate this post, and all of your content. It has really helped me with my SCOM 2019 deployment. From the example sudoers files, can you please explain the purpose of the regex input after the scom account entries? I have not seen information about this in the Microsoft docs, and as I am struggling with the Unix monitoring in our environment, I’m trying to determine if I need to add similar values for authentication to work. Thanks!
SCOM 2019 has its own sudoers examples that are different than SCOM 2016.
The regex is to support the specific naming of the actual SCOM 2019 agents – the version numbers might change but the format will stay the same. This allows least priv – the elevated rights ONLY allow the account to install this specific agent by name.
Perfect, thank you!
pushing the client software with scom 2019 we faced an issue with setting the sudo rules more secure. We don’t want to give all rights to the maintenance account but only what is needed.
So in our case it worked with giving all right, but not with the suggested rules to set it up more secure on RHEL8 with following line in you sudoers file.
Defaults iolog_dir=/var/log/sudo-io/%{user}
scommaintenanceuser ALL=(ALL) NOPASSWD: LOG_INPUT: LOG_OUTPUT: ALL
after you edited the sudoers, try to install the software again from scom.
you should of course be able to succesful install the software with this rule and then you will be able to filter out the commandlines that are used by that user:
you could use the following command to show these commands.
cat /var/log/sudo-io/scommaintenanceuser/00/00/0[0-9]/log
and add these lines to you sudoers file to restrict the account more and have it more secure.
Did they ever fix that 64k limit that was supposed to be the reason omiagents stopped responding and you ended up with potentially hundreds of them (eventually) sitting there twiddling their thumbs?
MS always said it was custom monitoring, but we always returned less that 1k on shell scripts and one object with winrm… which was normally much less than 64k (always when it was analyzed)
however on stack based systems, or just redhatkvm hosts. the network monitoring could return 64+k trivially. Which made it really have issues.
it was easy to monitor and fix (kill the processes), but would be better if it wasn’t an issue.
Hi Kevin
When we try to discover the RHEL 8.6 machine from the scom. it’s showing error
Failed to sign kit. Exit code: 1
Standard Output: Must have root privileges for this operation
RETURN CODE: 1
Standard Error:
Exception Message: