It should be a rare occurrence to completely lose a SCOM Management server. Normally you would restore from a backup, or recover a VM snapshot in order to quickly recover a lost/damaged/corrupted SCOM MS.
However, in the case this does happen, you have options.
The biggest challenge in recovering a SCOM Management Server, is dealing with the RunAs account passwords. There is a registry entry used to decrypt these, and this is generated when the first management server is installed. This is normally copied to new management servers as they are added after the first MS in a management group.
What the /Recover command line switch does – is tell Setup this is a DR provisioning, and to behave a little differently. It will look to see if ANY other Management Servers in the management group are still online. If they are, Setup will contact them, and copy the registry entries needed to deal with RunAs account decryption. However, if ALL management servers have been lost, then it will re-generate a new decryption key, but this results in you having to re-enter your existing RunAs account passwords in SCOM once the recovery action is complete.
Another challenge, is at the time of this writing, there is no documentation on the SCOM command line parameters for SCOM 2016, and the SCOM 2012 command line reference example is missing some data.
Here is a working command line for a recovery:
Setup.exe /silent /AcceptEndUserLicenseAgreement /recover /InstallPath:"D:\Program Files\Microsoft System Center 2016\Operations Manager" /ManagementGroupName:MGNAME /SqlServerInstance:SQLServerName.domain.com /DatabaseName:OperationsManager /DWSqlServerInstance:SQLServerName.domain.com /DWDatabaseName:OperationsManagerDW /ActionAccountUser:DOMAIN\omaa /ActionAccountPassword:password /DASAccountUser:DOMAIN\omdas /DASAccountPassword:password /DatareaderUser:DOMAIN\omdr /DatareaderPassword:password /DataWriterUser:DOMAIN\omdw /DataWriterPassword:password /EnableErrorReporting:Never /SendCEIPReports:0 /UseMicrosoftUpdate:0
The key areas to focus on with your custom data are:
As with new SCOM deployments – you will need to have elevated rights to install SCOM components:
- SCOM Administrator rights
- Local Administrator rights on the Management server OS
- Local Administrator rights on all other Management Servers (required for remote registry connection)
- System Administrator rights in SQL hosting the databases
How long (just an estimate) does the /recover function take? Over an hour?
No it should be quick. Can you review the MSI logs and see what it is doing that might be taking a long time?
I am not running it correctly. Preparing to have a standby server in the secondary datacenter but wanted to see how long it would take to run the /recover command on the standby server.
I am not a fan of standby Management servers in a remote datacenter…. unless this is a DR scenario where the management server is not installed or part of the management group yet, and is just a VM ready to be installed in the case of DR. I am a fan of replicating the Management Server VM’s in a primary datacenter to a DR datacenter, which allows the fastest possible outage recovery with the least configuration.
Is there any example of your replication? what if the DB also moved to DR site? apart from changes on registry, config file & DB + run sysmessages sql script, are there any other steps should be consider? I’m encountered .NET framework error on assembly, 65537.
Btw, I’m running on SCOM 2019 + SQL 2017.
Did you rebuild or migrate or restore the DB? Thats commonly caused by CLR strict security on SQL 2017. See: https://docs.microsoft.com/en-us/system-center/scom/upgrade-sqlserver-2017-opsmgr?view=sc-om-2019
We were restoring the DB to the DR site. I have follow every steps under this link accordingly, https://docs.microsoft.com/en-us/system-center/scom/manage-move-opsdb?view=sc-om-2019
Apart from above error, Operation Logs also capture warning event 33333 & 28000. Same issue face under this link : https://social.technet.microsoft.com/Forums/WINDOWS/en-US/c8d89d63-5322-4ef6-8f4a-bde85f8c1aa7/issues-with-secondary-management-servers-cannot-set-availability-on-a-health-service-that-doesnt?forum=systemcenterservicemanager
Do I still need to run this “Optional – Enable CLR strict security” script?
There is patch last time for this, but somehow the link is longer valid.
After running this on the first management server , i receive on every management server: “OpsMgr has no configuration for management group [managementgroupname] and is requesting new configuration from the Configuration Service.” and no client is reporting anymore…
You saved my butt! Thank you for all the wonderful guides, especially this one!!!
Thanks for this missed key!
Management servers in both primary and secondary data centers is Active-Active mode right for this setup? So for SQL database OM and DW will be active-passive between primary and DR data center.
For SCOM 2019 its documented in these two articles:
user executing the Recovery command should also have rights on the Database,
or next error will be in the log:
Warn: :Sql error: 14. Error: 229. Error Message: The EXECUTE permission was denied on the object ‘p_MOMManagementGroupInfoSelect’, database ‘OperationsManager’, schema ‘dbo’.
with SQL trace, you could find the user, having deny.
For SCOM 2019 migration from on prem to Azure. We have a great guide about SCOM databases migration but I cant find anything about SCOM management server migration. I tried had migrated on prem SCOM DBs to azure and also migrated SCOM MS with a different hostname and made all the registry changes, serviceconfig file and some tables changed etc but SDK service is crashing now I am wondering if I just need to use disaster recovery to build a new SCOM MS with migrated DBs. any suggestions please