Menu Close

Upgrade from SCOM 2016 to SCOM 2019 Checklist

image

 

This is a planning checklist that will help you determine if an in-place upgrade is possible, and how to prepare the environment in advance for it.  It is similar to my previous post on Upgrading SCOM 2012R2 to SCOM 2016.

 

1. Verify we are moving from a supported version of SCOM to SCOM 2019.

2. Verify the SQL server versions and service pack levels are supported for both SCOM 2016/1801/1807 and SCOM 2019

3. Verify all OS versions for SCOM server roles will be supported for both SCOM 2016/1801/1807 and SCOM 2019

4. Verify all SERVER ROLES meet minimum hardware sizing for SCOM 2019

5. Verify all AGENT managed Operating Systems are supported for SCOM 2019.

6. Verify all MANAGEMENT PACKS in use are supported for SCOM 2019.

  • Check with 3rd party MP vendors and ensure their MP does not have any known support issues with SCOM 2019. Update these MP’s in advance if required.

7. SCOM Database: Verify the OperationsManager database has more than 50 percent free space

8. Optimize Registry settings for management servers

9. Export and review the SCOM management server event logs on all management server roles

  • Look for critical and warning events that indicate major issues that should be resolved before upgrading.
  • Save these for comparison after the upgrade to verify any new issues are actually new

10. Verify SCOM is healthy

  • Review the “Operations Manager > Management Group Health” dashboard in addition to the event logs and ensure SCOM is healthy

11. T-SQL: Clean up the database ETL table in the OperationsManager database

12. SCOM Console: Remove agents from Pending Management

13. Backup unsealed management packs

  • Get a fresh backup of all your unsealed MP’s which contain all your customizations, for disaster recovery
  • Example:    Get-SCOMManagementPack | where {$_.Sealed -eq $false}|Export-SCOMManagementPack -Path c:\mpbackup

14. SCOM Console: Disable Notification subscriptions

15. Disable product connectors or any external connections to the SDK.

16. Optional but recommended:  Restart the SQL server service on the OpsDB server and DW server

  • This will kill any stuck or old blocking processes, and free up any buffer cache
  • Wait at least 5 minutes after restarting to ensure the DB’s are online and functioning.
  • Ensure there is no active blocking in the OpsDB before continuing.
  • Consider a reboot of the entire server.

17. Optional but recommended:  Uninstall the SCOM Console and Web Console on the FIRST management server you plan to upgrade and REBOOT.

  • Removing these roles reduce the risk of an upgrade failure.
  • These roles are easy to reinstall once the management group upgrade is completed.

18. Stop the Operations Manager services on Management servers

  • Stop the following services on all management servers in the management group, to ensure NO changes are being made to SQL during the backup, so we can get a good backup right before the upgrade:
  • Microsoft Monitoring Agent
  • System Center Data Access
  • System Center Configuration

19. Backup the SCOM databases

20. Backup the Management Servers

  • Take a VM snapshot or a full bare-metal backup that is restorable, with the SCOM services stopped, so there should be no transient data. This will be for use in the case of disaster recovery only.

21. Install SCOM 2019 prerequisites on management servers with consoles

22. Ensure .Net 3.5, and .Net 4 (or 4.5) are both installed on ALL management servers

23. Remove any old SDK reference software from the management server

  • Some programs install DLL’s that might block upgrade, consider removing them if installed on your management servers:
  • SCOM 2007 R2 Authoring Console
  • Silect MP Author/MP Studio

24. Optional but recommended:  REBOOT ALL Management servers.

  • Rebooting these servers ensures that any OS related issues are observed or cleared before attempting an upgrade.
  • Rebooting these servers helps remove any question that something was wrong with them prior to the upgrade.
  • If a Management server cannot successfully reboot and start up without errors before an upgrade, it certainly cannot after an upgrade.

25. Upgrade the first management server

26. Upgrade additional management servers

  • It is CRITICAL not to upgrade multiple management servers at the same time. You should wait for one to complete FULLY and inspect the logs to ensure it is working, before continuing with the next.

27. Upgrade ACS (if applicable)

28. Upgrade all gateways (if applicable)

29. Upgrade Stand Alone Web Console servers (if applicable)

30. Upgrade Reporting Server

31. Upgrade Stand-Alone Consoles

32. Post Upgrade tasks

33. Reject Pending Management updates for any agents

  • We will update agents later, after applying the latest Update Rollup for SCOM 2019

34. Verify your SCOM license is reporting correctly as licensed

35. Apply the latest Cumulative Update Rollup for SCOM 2019

  • You should generally wait a few hours after an upgrade to SCOM 2019, before applying the latest SCOM 2019 update rollup. There are warehouse scripts as part of the upgrade that can take several hours to complete, and it is a best practice to not interrupt these.

36. Upgrade Agents

  • Using whatever method you choose, consider upgrading your agents to SCOM 2019 with the latest UR at this point.

 

What to do when things go wrong?

When SCOM upgrades fail, there will be a log telling us why.  Often times you will get an “Error 1603” which is simply a generic error and does not tell you anything.  These log files are typically located in the user profile directory of the account attempting the installation.  C:\Users\<username>\AppData\Local\SCOM\LOGS.  Review ALL the logs, and if needed provide all these logs to a Microsoft engineer when opening a support case.  Log files are not always easy to interpret – but the root cause is always in them.

Common issues causing failures:

  • Lack of permissions for the user account performing the upgrade (requires Local admin, SCOM admin, and SQL SysAdmin)
  • TLS 1.2 enforced on management servers or SQL but missing prerequisites
  • A SCOM Agent is installed on a SCOM Management server
  • SQL Database is experiencing blocking from another process.
  • SQL Database does not have enough free space or transaction log space.

 

Resources:

SCOM 2019 is HERE!

Security changes in SCOM 2019 – Log on as a Service

SCOM 2019 Log On As A Service Management Pack Helper

SCOM 2019 Security Accounts Matrix

SCOM 2019 QuickStart Deployment Guide

27 Comments

  1. Michiel Aubertijn

    Kevin,
    In our case we came across a strange issue. The first three Management Servers went fine. The fourth failed. We revert to the snapshot but this time the scom services were running. Started the upgrade and everything went fine.
    Best Regards,
    Michiel

    • Kevin Holman

      If you actually follow my steps – the last thing you do before attempting an upgrade is to reboot all management servers. This would leave all services running on everything before attempting an upgrade.

  2. Andrew T

    My SQL server is running 2016 (13.0.5820.21) but I’m getting an error on SQL version validation when starting the install/upgrade. I’m not quite sure what it’s complaining about. is there a log anywhere I can check to see what it’s specifically failing on?

    • Kevin Holman

      Yes – the logs should be available in your user directory – C:\Users\\AppData\Local\SCOM\LOGS

      Feel free to shoot them to me via email if needed.

  3. ANDRII VERESHCHAKA

    Kevin I DO APPRECIATE again and again your really priceless work !
    Thanks a lot . Using this checklist has saved our time enough to have two coffee and tea breaks during our SCOM upgrade procedure 8) we did it easy, because we followed your list step-to-step.

  4. James Farthing

    Hi Kevin,
    Great blog as always. We’ve been planning an update to our environment to from 1807 to 2019 and this highlights a few potential issues that we’ve not considered.
    I’ve got a question relating to 26 – upgrading the additional management servers. Normally we would stagger application upgrades of multiple machines over subsequent days where possible, would I be correct in thinking that this wouldn’t be recommended here? From my understanding, after upgrading the first (of 4) Management Server to 2019, we wouldn’t have a functional Management Pool due to being below the quorum threshold. Would you recommend immediately updating a second (or more) Management Server to 2019, or perhaps consider removing the other Management Servers from the pool until they have been upgraded? Alternatively, would the remaining servers continue to work within the Management Pool whilst still running under 1807?
    Thanks in advance.
    James

    • Kevin Holman

      I believe they will continue to work. There is no schema change in these versions… however I would recommend upgrading all components as quickly as possible. Certainly all management servers. If a customer needed to wait a while on a Gateway, or especially agents, that’s fine. But I’d always advise customers to upgrade all their management servers sequentially, but in the same planned outage for the upgrade. I have even worked with customers with 20 management servers, and we would apply the upgrade to all of them in the same evening, starting one as soon as the previous one completed.

      • James Farthing

        Hi Kevin,
        Many thanks for the quick reply, I’ll discuss this with my colleagues.
        It seems like upgrading all the machines in one go is the way forward then, especially as I don’t think we really gain any reduction in risk by spreading out the upgrade over multiple days. The rollback would be the same steps either way around.
        Kind Regards,
        James

  5. Senko S

    Hi Kevin and everyone in the community!

    I would like to ask you or anyone out there about a specific upgrade scenario. I wonder if anyone has tried to do a in-place upgrade from 1807 to 2019 where the SCOM environment isn’t basically allowed to have a long downtime?

    Would it work “in practice” to clone the existing SCOM VM:s in the hypervisor?

    1. Start the cloned VM:s with disabled NIC:s
    2. Start upgrading the management servers in parallel of the old environment
    3. Shutdown the original SCOM Management Servers one by one and enabling the NIC:s on the cloned VM:s with the same IP adress.
    as the previous ones?

    • Kevin Holman

      That would not be supportable, since the upgrade modifies both the SQL databases and the management servers at the same time.

      Doing an in place upgrade should be very limited downtime. What is allowed for your planned maintenance?

    • Kevin Holman

      Sure. Just use a SCOM 2016 agent on them. They still work fine as long as they have powershell 2.0 installed. Just not officially supported, but then neither is WS2008.

  6. Abdi

    I have a really annoying issue installing scom 2019 agent on WS2019 Domain controllers. All other servers non DCs (2016 & 2019) i can install the agent successfully.
    It was an in place upgrade from 2016 to 2019 and so the DC’s that already had the 2016 agent have moved over to 2019 successfully. Any new DCs i try to add (i have only tried to add 2019 DCs) have failed with access denied.
    Does SCOM 2019 need any additional permissions above the domain admin level?

  7. Pingback:Community Round-Up: January 2021 | SquaredUp

  8. Delpol

    We tried upgrading our SCOM instance from 2016 to 2019 but failed with the first management server. During the rollback it removed SCOM completely from the management server and leaving the OPsDB as upgraded. We tried installing 2016 back on it but didn’t work. So rolled back the Ops DB and DW DB to the working state from the backup, then tried recovering the management server. It recovered but still seeing Event ID 29120 (Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed.)
    Both SDK and Config services are running fine but unable to connect to open the console on the same server after recovery. When opening the Ops Mgr Shell, it throws error: The User (xxxx) does not have sufficient permission to perform the operation.
    Both the Config Account and Action Account have sysadmin role on the DB and Local admin on the management and DB servers. Both accounts are set as logon as a service as well. Any ideas what we could be missing @Kevin ?

    • kevinholman

      How did you recover the management server? Command line? Or just reinstall using the UI? If you restore the DB – it would have been best to restore a snapshot of the management server as well.

      Your DAS account and Management Server action account should NOT have sysadmin to the database. That is an overextension of privilege. It wont hurt something, but not required and not a best practice.

      The most likely issue is that when you run a recovery, you need to use the command line with the /recover switch…. if you didnt do that, then review the runas accounts – you might have something messed up there in the accounts and in the profiles, and reset the passwords.

  9. Brian Hansen

    I need to upgrade from 1801 to 2019. I understand the need to have the actual SCOM servers on Server 2016. But according to the documentation the OpsDB and DWDB SQL servers need to be Server 2016 as well. Is that true? Or will the SQL servers be OK on Server 2012 as long as SQL is 2016? (yes, we will need to upgrade the OS on the SQL servers anyway, but for reasons I won’t go in to I need do the SCOM upgrade first)

    • Kevin Holman

      In order to perform a supported in-place upgrade of SCOM, both the previous version and the new version of SCOM have to be in a supported configuration at all times.

      SCOM 1801 supports server roles (Gateway, Web Console, Management Server, and SQL Servers) on WS2012R2 and WS2016.
      SCOM 2019 supports server roles (Gateway, Web Console, Management Server, and SQL Servers) on WS2016 and WS2019.

      Therefore the only common OS between these two is Windows Server 2016. All server roles MUST be on Windows Server 2016 in order to perform a supported in-place upgrade.

      SCOM 1801 supports SQL server roles (OpsDB, DataWarehouse DB, SCOM Reporting) on SQL 2016.
      SCOM 2019 supports SQL server roles (OpsDB, DataWarehouse DB, SCOM Reporting) on SQL 2016 and SQL 2017 and SQL 2019.

      Therefore the only common SQL version between these two is SQL 2016. All SQL server roles MUST be on SQL Server 2016 in order to perform a supported in-place upgrade.

      I understand this is highly restrictive, and often is the reason customers choose not to adopt current SCOM versions due to the impact here. I have given this feedback to the product teams many times, and we need more customers to provide this feedback to the support and product teams.

    • Kevin Holman

      I haven’t ever done it, so I am not sure – but I would think it would work – the new UI is just far superior but at the end of the day it just makes XML.

  10. Brian Hansen

    Doing an upgrade from 1801 to 2019. My 2 Gateway servers are saying they can’t continue setup because they have other SCOM roles installed (MS, Console, Web, agent, etc). But the only SCOM item installed on theses servers is the GWS itself. I have looked thru the MomGateway setup log but can’t find what it thinks there is.

    Any suggestions on how to get the upgrade to complete?

  11. Brian Hansen

    Nevermind, I found the culprit:

    From setup log:

    PROPERTY CHANGE: Adding CORECOMPONENTPRESENT_AGENT property. Its value is ‘{EE0183F4-3BF8-4EC8-8F7C-44D3BBE6FDF0}’.
    FindRelatedProducts. Return value 1.

    From Regsitry: (I deleted this key)

    [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Installer\Products\4F3810EE8FB38CE4F8C7443DBB6EDF0F]
    “Clients”=hex(7):3a,00,00,00,00,00
    “ProductName”=”Microsoft Monitoring Agent”
    “PackageCode”=”0DFCAAEAC7718004BADFDCCEDA1286B7”
    “Language”=dword:00000000
    “Version”=dword:080032fd
    “Assignment”=dword:00000001
    “AdvertiseFlags”=dword:00000180
    “ProductIcon”=”C:\\WINDOWS\\Installer\\{EE0183F4-3BF8-4EC8-8F7C-44D3BBE6FDF0}\\agentgateway.ico”
    “InstanceType”=dword:00000000
    “AuthorizedLUAApp”=dword:00000000
    “DeploymentFlags”=dword:00000001

Leave a Reply

Your email address will not be published. Required fields are marked *