First, I have to begin with a disclaimer – Microsoft does not support customers manually editing the database, unless under the guidance of a Microsoft support engineer with a support case.
However, I have seen several blogs now “leaking” this guidance I send out or use from time to time…. so it is probably time to publish it as an example only. I have always been hesitant to blog about any manual edits to the SCOM database, because all too many times, customers will jump to a conclusion, and start hacking away at their DB because of something they read on some website…. only to find they have corrupted their database, or made their original problem worse. You should NEVER perform any activity like this unless you are experienced and aware of the possible consequences. You should also ensure you take a full backup of the DB’s before ever attempting an edit like this.
Ok, warnings aside…..
There are times in a SCOM Management Groups life cycle, where objects can get orphaned. Either from bugs in SCOM, or bad practices taken by discoveries, or SDK commands, or Product Connector solutions.
For instance, there is a bug in SCOM, when you delete a Management Server (rare, but happens if you are retiring an old OS and replacing with a new one). When you delete the MS, more often than not – we will orphan that management server’s Health Service Watcher:
What can happen, is that when you view the OperationsManager > Management Server > Management Servers State view, you might see old Management Server Health Service Watcher objects here:
This doesn’t really cause a problem, but it is an annoyance. It is just an example where a manual delete is required to clean it up.
Another common scenario, is when customers implement a management pack, that inserts data via the SDK. If objects are inserted via the SDK, they must be removed via the SDK. Sometimes those MP’s get decommissioned, and customers are left with these objects, wondering how to get them removed.
Probably the most common scenario – is Windows Computers showing up in the console, that don’t have an agent. People tend to assume these are “orphans” but often times they are just discovered by some Topology discovery in a management pack, like AD, or SCCM. AEM will also leave you with a lot of ghosts. Sometimes these are caused by running a discovery to “extend” the Windows Computer class, but the discovery written has no logic to “undiscover” the computer when the agent is deleted. It will still be associated to that discovery, even though the agent is long gone.
To inspect the database relationships to see if there is a still a discovery associated with a computer – you can run this query against the OperationsManager database. Just change my computername on the first line to one you are inspecting:
DECLARE @name varchar(255) = '%ws2016.opsmgr.net%' SELECT BME.FullName, BME.DisplayName, BME.Path, dv.DisplayName AS 'DiscoveryDisplayName', dv.Name AS 'DiscoveryName' FROM typedmanagedentity TME JOIN BaseManagedEntity BME ON TME.BaseManagedEntityId = BME.BaseManagedEntityId JOIN DiscoverySourceToTypedManagedEntity DSTME ON TME.TypedManagedEntityID = DSTME.TypedManagedEntityID JOIN DiscoverySource DS ON DS.DiscoverySourceID = DSTME.DiscoverySourceID JOIN DiscoveryView dv ON DS.DiscoveryRuleID=dv.Id WHERE BME.Fullname like @name ORDER BY BME.FullName, DiscoveryDisplayName
If you have fully inspected this and determined the object is not related to any current discoveries – then you can attempt a manual deletion.
You will find lots of blogs talking about setting “IsDeleted = 1” in the BaseManagedEntity table, or even other tables. This is WRONG. You should never do this!!!
The better way to delete managed entities, is the below set of queries.
--Query 1 --First get the Base Managed Entity ID of the object you think is orphaned/bad/needstogo: -- DECLARE @name varchar(255) = '%computername%' -- SELECT BaseManagedEntityId, FullName, DisplayName, IsDeleted, Path, Name FROM BaseManagedEntity WHERE FullName like @name OR DisplayName like @name ORDER BY FullName
Query 1, will help you find the BaseManagedEntity ID of the object you are wanting to delete. Take GREAT care here that you fully understand the object is correct that you want to forcibly remove, or you risk corrupting the SCOM object relationships.
This will give you the BaseManagedEntityId GUID – which we need for the next step:
In Query 2, we paste in that GUID for the orphaned object we are trying to delete.
--Query 2 --Next input that BaseManagedEntityID into the delete statement --This will delete specific typedmanagedentities more gracefully than setting IsDeleted=1 --change "00000000-0000-0000-0000-000000000000" to the ID of the invalid entity -- DECLARE @EntityId uniqueidentifier = '00000000-0000-0000-0000-000000000000' -- DECLARE @TimeGenerated datetime; SET @TimeGenerated = getutcdate(); BEGIN TRANSACTION EXEC dbo.p_TypedManagedEntityDelete @EntityId, @TimeGenerated; COMMIT TRANSACTION
When you run this – it will delete the managed entity in a “graceful” method, that will delete any related objects in ManagedType tables – so you might see a lot more objects deleted in the SQL response than you expected.
In Query 3 – this is just a repeat of Query 1, but we run this to inspect the objects and see if their IsDeleted flag now = 1.
--Query 3 --Check to make sure the IsDeleted field of your BME now = 1 -- DECLARE @name varchar(255) = '%computername%' -- SELECT BaseManagedEntityId, FullName, DisplayName, IsDeleted, Path, Name FROM BaseManagedEntity WHERE FullName like @name OR DisplayName like @name ORDER BY FullName
In Query 4 – we can get an idea of how many objects currently are in a “Deleted” status, and ready to be purged from the database:
--Query 4 --Get an idea of how many BMEs are in scope to purge SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1
Lastly – in Query 5 – we are purging. Purging is a normal cycle and part of SCOM grooming, where we will purge data from the SCOM database after 2-3 days of being marked deleted. We do not delete these immediately, because that is an expensive operation and best reserved for when grooming runs at night. Also, when you are troubleshooting an agent, you might not wish to delete all the data when you delete an agent from the management group. This lets the discovered data remain, if you bring the agent back in within a day or two.
Purging on demand is only useful if you wish to remove the deleted data from SCOM immediately. A common scenario where I use this – is when I need to take a server and install it as a Management Server or a Gateway server, but it was previously managed as an agent. You don’t need to do all the manual SQL deletes in that scenario, but you do need to purge the DB so that when you install it as a Management Server or a GW, there are no issues during the installation.
--Query 5 --This query statement for SCOM 2012 will purge all IsDeleted=1 objects immediately --Normally this is a 2-3day wait before this would happen naturally --This only purges 10000 records. If you have more it will require multiple runs --Purge IsDeleted=1 data from the SCOM 2012 DB: DECLARE @TimeGenerated DATETIME, @BatchSize INT, @RowCount INT SET @TimeGenerated = GETUTCDATE() SET @BatchSize = 10000 EXEC p_DiscoveryDataPurgingByRelationship @TimeGenerated, @BatchSize, @RowCount EXEC p_DiscoveryDataPurgingByTypedManagedEntity @TimeGenerated, @BatchSize, @RowCount EXEC p_DiscoveryDataPurgingByBaseManagedEntity @TimeGenerated, @BatchSize, @RowCount
Review:
Again – doing these DB edits are not supported by Microsoft. I only publish this to keep people from using these queries (which have been published on other blogs now) without understanding the issues, the risks, or the proper way to leverage them. And to get people to STOP forcing the IsDeleted = 1 property using a SQL update statement.
Here are all the Queries together for a quick copy/paste:
(These support SCOM 2012, 2016, and the semi-annual channel builds of SCOM)
--Query 1 --First get the Base Managed Entity ID of the object you think is orphaned/bad/needstogo: -- DECLARE @name varchar(255) = '%computername%' SELECT BaseManagedEntityId, FullName, DisplayName, IsDeleted, Path, Name FROM BaseManagedEntity WHERE FullName like @name OR DisplayName like @name ORDER BY FullName --Query 2 --Next input that BaseManagedEntityID into the delete statement --This will delete specific typedmanagedentities more gracefully than setting IsDeleted=1 --change "00000000-0000-0000-0000-000000000000" to the ID of the invalid entity -- DECLARE @EntityId uniqueidentifier = '00000000-0000-0000-0000-000000000000' DECLARE @TimeGenerated datetime; SET @TimeGenerated = getutcdate(); BEGIN TRANSACTION EXEC dbo.p_TypedManagedEntityDelete @EntityId, @TimeGenerated; COMMIT TRANSACTION --Query 3 --Check to make sure the IsDeleted field of your BME now = 1 -- DECLARE @name varchar(255) = '%computername%' SELECT BaseManagedEntityId, FullName, DisplayName, IsDeleted, Path, Name FROM BaseManagedEntity WHERE FullName like @name OR DisplayName like @name ORDER BY FullName --Query 4 --Get an idea of how many BMEs are in scope to purge -- SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1 --Query 5 --This query statement for SCOM 2012 will purge all IsDeleted=1 objects immediately --Normally this is a 2-3day wait before this would happen naturally --This only purges 10000 records. If you have more it will require multiple runs --Purge IsDeleted=1 data from the SCOM 2012 DB: -- DECLARE @TimeGenerated DATETIME, @BatchSize INT, @RowCount INT SET @TimeGenerated = GETUTCDATE() SET @BatchSize = 10000 EXEC p_DiscoveryDataPurgingByRelationship @TimeGenerated, @BatchSize, @RowCount EXEC p_DiscoveryDataPurgingByTypedManagedEntity @TimeGenerated, @BatchSize, @RowCount EXEC p_DiscoveryDataPurgingByBaseManagedEntity @TimeGenerated, @BatchSize, @RowCount --End of queries
This stored procedure doesn’t seem to exist. I am running SCOM 2012 R2 RTM.
Hello Kevin,
I have imported dpm management packs and the rexecutive summary report was not working so I deleted the management pack. However on the datawarehouse dpm datasets still exist and on reinstalling dpm management pack I see duplicate agent objects in ssrs for dpm.
Hello Kevin!
First, I would like to thank you for the awesome content you’re posting!
I’ve got a question regarding removal of an Agent from cluster nodes. SCOM GUI prevents me from deleting the agent throwing this kind of error:
“Agent is managing other devices and cannot be uninstalled. Please resolve this issue via Agentless managed view in Administration prior to attempting uninstall again.”
I assume that a good approach would be to disable the proxy option on the cluster nodes, wait for the discovery to run and for the agentless objects to disappear, then check again if it’s possible to delete the Agent in the usual way.
And here’s come the issue; the cluster nodes are already decommissioned (the VM’s no longer exists), so disabling the proxy won’t make any difference, so I’m left with two ‘gray’ nodes and an agentless cluster object which apparently prevents me from getting rid of it. Removing the nodes via powershell has left 3 orphaned objects per cluster node (IsDeleted=’0′):
Microsoft.SystemCenter.HealthServiceWatcher:Microsoft.SystemCenter.AgentWatchersGroup
Microsoft.Windows.Computer:thatmadclusternode1.local
Microsoft.Windows.OperatingSystem:thatmadclusternode1.local
I went through your scripts and I had to manually set the IsDeleted rows to ‘1’ for the orphaned objects of each node using your second SQL query and it worked.
I’m wondering if there’s any official way to remove cluster nodes when they no longer exists?
[SCOM2012R2]
Thanks in advance,
Matt B.
1. NEVER set IsDeleted = 1. I have never recommended that. That is a good way to end up with a corrupt database. Unless you mean using the SQL statements I published that use the typemanagedentitydelete function, which will result in setting the objects to deleted (along with any hosted objects)
2. We haven’t had a problem deleting cluster nodes from Agent Managed in many years. What version of SCOM are you using, and what UR level? Is the console you are using patched to the latest UR level as well?
Thanks Kevin for a quick response!
Ad.1) Yes, I’ve used the SQL statement you have published (EXEC dbo.p_TypedManagedEntityDelete @EntityId, @TimeGenerated;).
Ad.2) The infra is running on SCOM 2012R2 with the latest UR14. SCOM console that is used to manage that infra is 2016 (Product version: 7.2.11878.0) and it’s used to connect to a newer SCOM2016 infra also.
After lurking around the web, I’ve found a conversation on Microsoft Technet regarding SCOM Agent removal from a cluster node:
https://social.technet.microsoft.com/Forums/systemcenter/en-US/10ffee08-b875-47af-b788-db07dbfa1b56/cannot-delete-agents-on-cluster-nodes?forum=operationsmanagergeneral
People claim that there’s a bug unresolved for years in SCOM and they’re using ‘dirty hacks’ by setting the IsDisabled flag to 1 that could corrupt their database.
It seems to be a dead-end if the cluster nodes no longer exist but still remain in monitoring.
Hi Kevin
Thanks for a great article – deleting the Windows computer object deleted a lot of rows – that´s great – however some classes remain (e.g. Healthservicewatcher, UserActionManager, CitrixMPAgentMachine.ControllerConnection) for the server when running query 1 again – should they also be manuel removed with this procedure again ?
It “should” be…. but I am curious – what caused this to happen in the first place? What is unique about this orphaned agent, and what caused you to investigate deleting it manually in the first place?
I had the same issue. The process removed all the Windows and child objects for my Orphan, but left behind two objects for Microsoft.SystemCenter.HealthServiceWatcher and Microsoft.SystemCenter.ManagementService. The object is now removed successfully from the console view, but should I be cleaning up these two objects as well? (and which one? Is one the child of the other?). The computer name can still be seen in the Agent State from Health Service Watcher view (in a red critical state for Heartbeat Failure).
Hello Kevin,
thank you for the great article! The timing is very good, because of the EOL from W2K8 we are planning to migrate our old GW’s and MS’s to W2K12 (running SCOM 2012 R2).
But we think there are a lot of unobserved pitfalls. But maybe you can give us a good advise and some best practice approaches.
Due to our implemented firewall policies, we plan to reuse the IP and hostname of the existing systems. Todo that we would like to do the following steps:
1. Install a new server with W2K12 R2 with an other temporary hostname and ip
2. We switch all agents/gw’s to an existing failover system
3. (We remove the first MS/GW from SCOM and) shutdown the existing MS/GW
4. We switch the ip and hostname from the old server tot he new W2K12 R2 and add the MS/GW to the management group again
Regarding to your article above and our migration plan the following questions came up:
– Do we really need to delete the existing objects (from a MS/GW) from the OpsDB with these SQL-Statements. Because we are going to reuse the hostname and ip I think the only difference should be the underlying OS?
– If we have to delete the existing MS/GW objects, i think we got some problems with existing overrides set for the MS/GW’s objects. Is there a smooth solution to get all existing overrides regarding to all objects from a MS/GW (like system drives, computer, or other object instances)? (maybe through a sql-statement?)
– If we don’t want to use the SQL-Statements above, is there another solution to ageing the deleted MS-Objects out from the database? (After a certain period of time?)
– Do you know any other pitfalls if we want to reuse the IP and hostname?
I think these questions are interesting for many customers who are planning a similar migration. So I hope you can give us a good advice! 🙂
Thank you in advance!
Hi Dennis
1. You should NOT be using these SQL statements unless you are working on an open case with Microsoft support.
2. These SQL statements to delete data in the SCOM DB should *never* be used in any way for a planned upgrade/management group transition or maintenance. These SQL statements are “last resort” deletes to clean up a mistake or bug, and should not be part of any “plan”.
3. There is no requirement to reuse a servername and IP. I’d strongly advise against this. I’d absolutely install new servers with new names and IP’s. Then – migrate the agent assignments over – then retire the old management servers. If you delete the MS and GW – with agents assigned – you will orphan those agents, and have a mess on your hands. If there is a requirement to reuse the IP – that’s fine, as you can swap those at any time. But I do not recommend reusing the same names. Everything in SCOM is dependent on FQDN as the unique ID, and reinstalling a MS with the same name can lead to challenges. I’d advise against that.
Hello Kevin,
I’m sure this isn’t the best solution, but I think there’s no way around it for us. The hostname has so many dependencies in our company that are not in our hands.
That’s why I think we’ll have a smaller impact on the reused hostname than resolving the other big dependencies in our infrastructure.
We also don’t want to execute any SQL statements. If there is a solution other than the SQL commands to resume the server with a new OS, this would definitely be the preferred way for us.
You are saying that “due to our company internal requirements, we have no choice but to use ill advised and unsupported solutions to problems we created” ? That sounds like a recipe for disaster. 🙂
Hello, Kevin,
thanks again for your help and tips. Together with the Microsoft support there was fortunately no disaster during the migration. 😉
And with that said – we successfully moved 2 management servers and 14 gateway servers to W2k12 R2.
Next Step -> SCOM 2016
Regards,
Dennis
Hi Kevin,
We are observing 29120 & 29112 event id’s in our SCOM 1801 environment and Management servers are unable to get the latest configuration file if we reset the Health state folder on SCOM Management servers. Could you please advise fix for the same.
Regards,
Kiran Kumar
Hi Kevin,
I’ve been reading trying to diagnose an issue and will not change anything to “IsDeleted=1” 🙂
I am running a report on my DataWarehouse view “vHealthServiceOutage” and there are a few servers listed there that do not exist in BaseManagedEntity. The servers used to exist, some were decommissioned over 4 years ago. They no longer exist anywhere else that I can find in the Operations Manager Database but do exist in places in the datawarehouse.
They are creating erroneous data that I have to cull manually after report runs and it bugs me that the old data is there in the first place.
I’d like to diagnose how the data still exists for these servers when other decommissioned servers do not exist anymore in the datawarehouse… there was obviously something done incorrectly years ago. Also, I’d like to get rid of that data but do it in such a way that my database doesn’t get corrupted.
Any assistance you can give would be appreciated, thanks!
@ZACH did you find a cure for the ghost servers i your reports? i see the same thing; no objects in scom or the operationsmanager database, but the health service watchers for agent servers that were deleted months ago still show up in availability reports.
Again, querying the DB Directly is not supported by Microsoft, but you can try this to identify Orphaned Agents:
declare @DiscoverySourceId uniqueidentifier;
declare @TimeGenerated datetime;
set @TimeGenerated = GETUTCDATE();
set @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
SELECT TME.[TypedManagedEntityid]
FROM MTV_HealthService HS
INNER JOIN dbo.[BaseManagedEntity] BHS
ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
— get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME
ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
AND TME.[IsDeleted] = 0
INNER JOIN dbo.[DerivedManagedTypes] DMT
ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
INNER JOIN dbo.[ManagedType] BT
ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
AND BT.[TypeName] = N’Microsoft.Windows.Computer’
— only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()
AND HSC.[IsDeleted] = 0
INNER JOIN DiscoverySourceToTypedManagedEntity DSTME
ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]
AND DSTME.[DiscoverySourceId] = @DiscoverySourceId
WHERE HS.[IsAgent] = 1
AND HSC.[RelationshipId] IS NULL;
In a healthy env, this should not return anything. If it does, for each entityid returned, run this:
Run the follow query against each TypedManagedEntityid above:
–change “00000000-0000-0000-0000-000000000000” to the ID of the invalid entity
DECLARE @EntityId uniqueidentifier = ‘00000000-0000-0000-0000-000000000000’
—
DECLARE @TimeGenerated datetime;
SET @TimeGenerated = getutcdate();
BEGIN TRANSACTION
EXEC dbo.p_TypedManagedEntityDelete @EntityId, @TimeGenerated;
COMMIT TRANSACTION
Confirm we are setting these invalid entities to IsDeleted=1:
–Get an idea of how many BMEs are in scope to purge
SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1
Lastly, we should run this query to force purging of the SCOM DB:
–Purge IsDeleted=1 data from the SCOM DB:
DECLARE @TimeGenerated DATETIME, @BatchSize INT, @RowCount INT
SET @TimeGenerated = GETUTCDATE()
SET @BatchSize = 10000
EXEC p_DiscoveryDataPurgingByRelationship @TimeGenerated, @BatchSize, @RowCount
EXEC p_DiscoveryDataPurgingByTypedManagedEntity @TimeGenerated, @BatchSize, @RowCount
EXEC p_DiscoveryDataPurgingByBaseManagedEntity @TimeGenerated, @BatchSize, @RowCount
Hallo Blake, this is really needed (identify Orphaned Agents), thank you
Hi Kevin,
Love your article…
Any ideas how you would do “query 2” against a list of BaseManaged Entity ID’s instead of it being done singularly.
Thanks,
Honestly, it could be done… but I’d be very concerned about this….. as we really should not have to be doing this very often or in great bulk. If we have to do this AT ALL then something went very, very wrong.
Hi Kevin,
Thanks for getting back to me. Yes I totally agree with that statement. For information in our example we had a situation from the Veeam management pack where we had moved the majority of the VMs from one Vcenter to another, but unfortunately it never cleaned out the old discovered entity’s so we had to manually clear these down.
Hi Kevin,
We have requirment for making our MS into GW due to latency issue . I am under the impression your query should help here after we uninstall the MS role & run the Cleanup and then install the GW Role. our MS Server does have HP DMC so lot of HP DMC compoment are discovered & are linked to this MS Server. After installing the GW Role the HP DMC Object should be auto-Discovered is my understanding . DO you see any problem or would recommeded any additional precaution to be taken to avoid conflict between MS & GW role Switch which we are trying to peform.
Regards,
Jason Aranha
Ugh. In general, I never recommend changing a MS to a GW. A MS has all kinds of special relationships. I’d assume once you remove a management server, and then purge the DB, everything “should” be gone so that a computer with the same name could be a simple agent managed role or Gateway role, but I have not tested this.
Hi Kevin,
We manage around 4000 servers in SCOM 2012 and recently migrated Some of the agents to SCOM 2016. But we could see almost 1500 servers removed from SCOM agent managed but still exist in windows computer. Is there any possible solution to remove servers in bulk from windows computer.
Looking for your great help!
Hello Kevin,
thank you for the article just curious to know is this applicable to Nework Devices as well.. we are currently experiencing issue with Orphaned Network Devices….
Here is exactly what happend i used your recent powershell script to delete network devices from scom 2016 , they were deleted from Network Devices view from Administration But not from Network Devices View from Monitoring… and now they are appearing in all our reports in turned off state
i tried to query database with following query and they do still appear and i have no possibility to manually delete them anymore from Administration tab as they do not show up anymore
***********************************************************************************
SELECT
BME.BaseManagedEntityId,
TBME.DisplayName AS Path,
BME.DisplayName,
TYPE.TypeName AS Class,
CASE S.HealthState WHEN 3 THEN ‘Critical’ WHEN 2 THEN ‘Warning’ WHEN 1 THEN ‘Success’ WHEN 0 THEN ‘Uninitialized’ ELSE ‘Unknown’ END AS STATE,
S.LastModified
FROM BaseManagedEntity BME WITH(NOLOCK)
INNER JOIN TypedManagedEntity TME WITH(NOLOCK) ON BME.BaseManagedEntityId = TME.BaseManagedEntityId AND BME.IsDeleted = 0 AND TME.IsDeleted = 0
INNER JOIN BaseManagedEntity TBME WITH(NOLOCK) ON BME.TopLevelHostEntityId = TBME.BaseManagedEntityId AND TBME.IsDeleted = 0
INNER JOIN ManagedType TYPE WITH(NOLOCK) ON TME.ManagedTypeID = TYPE.ManagedTypeID
INNER JOIN State AS S ON S.BaseManagedentityid=BME.BaseManagedentityid
INNER JOIN Monitor M ON M.MonitorId=S.MonitorID
WHERE M.MonitorName=’System.Health.EntityState’ and type.TypeName like ‘System.NetworkManagement.Node’
order by STATE
*************************************************************************************************
i tried your query (Check Any Discoveries Assosiated) on this blog replacing windows computer name with network device IP address and it returns 0 values for discoveries for currently reporting healthy device and Orphaned device in question so confused
Would you please help here appreciate your time…
Hi Kevin,
We see these orphaned objects and have used your guide. What can be done when the object still has related discoveries? I am not able to create an override on the discovery for that orphaned object since it cannot be found. But when using your queries I find the object in database. When I run query 2 it runs successfully and rows are affected, but the IsDeleted flag does not change to 1. Any ideas?
I’d be very careful with manually deleting anything, and recommend a support case to be opened whenever dealing with issues like this.
sat. 9/21/19 6.20am
Thank you Sir!
this worked on scom 2016, like a charm! you are on point!
Thanks for the scripts, Kevin. However I’m running into a problem where when an agent is removed from SCOM, not everything gets marked for deletion in the OperationsManager database. I find myself having to follow these steps for every agent I have to remove from SCOM. If I don’t, the agents will stick around in Windows Computers, and eventually my management servers will stop working, and go grey because it can’t contact these servers. I know this isn’t good practice, and I would like to resolve this issue, but I’m not sure how I can, or what the base problem actually is.
I see this happen when customers “extend” the Windows Computer class using a custom class to add their own properties, and instead of using registry keys to do this on agents, they use a central solution from a CSV or CMDB. Do you have this existing?
If you do – this is common, and customers have poorly written scripts which are not “undiscovering” these objects using the same script that discovered them.
I don’t believe we have anything custom like that. This only recently started happening within the last year when we started removing a lot of 2008 servers. I only started noticing it when I had to remove many at once, and it’s only gotten worse since. It could be a poorly written script, or something like that, but I wouldn’t know where to begin looking.
Take one of your orphaned computers and run this:
–To determine what discoveries are still associated with a computer – helpful in finding old stale computer objects in the console that are no longer agent managed, or desired.
select BME.FullName, DS.DiscoveryRuleID, D.DiscoveryName from typedmanagedentity TME
Join BaseManagedEntity BME ON TME.BaseManagedEntityId = BME.BaseManagedEntityId
JOIN DiscoverySourceToTypedManagedEntity DSTME ON TME.TypedManagedEntityID = DSTME.TypedManagedEntityID
JOIN DiscoverySource DS ON DS.DiscoverySourceID = DSTME.DiscoverySourceID
JOIN Discovery D ON DS.DiscoveryRuleID=D.DiscoveryID
Where BME.Fullname like ‘%SQL2A%’
if query 1 finds zero resuls (there is no basemanagagedentityID for a given agent server that was removed from scom six months ago) why might that server still appear in an availability report as “unmonitored” for the past 24 hours
Hi Kevin,
I’ve had an issue where an agent failed to update to UR9 and was stuck Pending under ‘Agent Requires Update’. I decided to delete it from Agent Managed so that it would be rediscovered into Pending Management>Manual Agent Install where I could approve and then take steps to manually update the agent. This has worked well in the past to get around any agents stuck requiring an update.
However on this occasion the agent deleted from Agent Managed but discovered objects remained elsewhere indefinitely. Subsequently the server has not reappeared into pending, and trying the discovery wizard fails because ‘This agent has already been discovered by Operations Manager and needs to be repaired’. However I cannot repair as it is not showing in Agent Managed and i have been unable to get the agent talking to the MS.
I took to figuring that this computer and associated objects are effectively orphaned / or need to be fully removed from SCOM before I can add it back.
Following your blog I ran Query 1 and it returns a bunch of discoveries, I therefore overrode those discoveries (I presume that is what should be done?? and followed it with Remove-SCOMDisabledClassInstance) I can see that the associated discovered objects are now removed except the Windows Computer object which is still showing in ‘Windows Computers’.
The problem is I cannot rediscover/re-add this agent (because it’s still showing in Windows Computers?) and with active discoveries (that are overriden now but still found by query 1) your blog suggests I should not attempt the additional queries as the object is not truly orphaned.
So…What to do to get the agent back into scom?
Your guidance would be appreciated.
SCOM 2016 UR9*
Hi Paul,
In your case, the computer object is orphaned for whatever reason. Here is what I would do:
1. Take a backup of your OpsDB and DataWarehouse in the case you need to restore.
2. Stop the Healthservice on the agent.
3. Run query 1.
4. Get the GUID of the Windows Computer object that is not “deleted”, and put it in query 2, and run Query 2.
5. Run Query 1 again.
6. If any of the items are still present and not market IsDeleted=1, put their GUID’s into query 2 and delete them, running query 1 in between each time to get a fresh update.
7. Once all the items returned by Query 1 are deleted, run query 5 to purge all the deleted records.
8. Start the healthservice on the agent, and it will show up in pending.
“Does what it says on the tin!”
Thank you Kevin, that worked perfectly.
Cheers,
Paul
Ran through the same process on another machine that I had exactly the same scenario with, only this time when running;
4. Get the GUID of the Windows Computer object that is not “deleted”, and put it in query 2, and run Query 2.
5. Run Query 1 again.
The IsDeleted = 0 (eg it has not changed from 0 to 1 – but in the ops console the orphaned computer has now disappeared….).
a little concerning…
Resolved with a little DBA help. All good working perfectly now. Thanks.
Hello Kevin,
Is this supported in SCOM 2019? I mean it’s anyway not supported by MS. 😉
Thanks and regards,
Arno
Yep. Works exactly the same in SCOM 2019.
You can also run this powershell script I wrote from any server with the SQL Powershell Module (https://docs.microsoft.com/en-us/sql/powershell/download-sql-server-ps-module): https://github.com/v-bldrum/SCOM-Scripts-and-SQL/blob/master/Powershell/Erase-BaseManagedEntity.ps1
I have updated the Powershell Script I wrote to handle these steps automatically. It can be located here: https://github.com/blakedrumm/SCOM-Scripts-and-SQL/blob/master/Powershell/Quick%20Fixes/Remove-SCOMBaseManagedEntity.ps1
It makes it very easy to purge Computers from your SCOM Database! Give it a shot!
SCOM 2019 UR4
Hi Blake, will this script work for servers that are showing up as gray agents and are triggering event IDs: 20071?
The OpsMgr Connector connected to XXXXXX, but the connection was closed immediately without authentication taking place. The most likely cause of this error is a failure to authenticate either this agent or the server.
I have four servers with this issue all on the same domain and I have tried all the basic troubleshooting steps to try and get the agent back to healthy:
1. restarting agent.
2. flushing health state cache on agent server and Management server.
3. Verified port 5723 is open.
4. ensured Microsoft monitoring agent config in control panel is pointing to the correct Management server.
5. Verified there is no DNS issues on server.
6. uninstall/reinstall Agent
But none of these steps worked.
I checked the events log on the management server and all I see is Event ID 20000: A device which is not part of this management group has attempted to access this Health Service.
Requesting Device Name : XXXXX , but I don’t see additional information why the agent is showing up as gray in the console. They trigger a heartbeat failure and then go gray.
My next thought is to try purging the database of all of these servers and then installing the agent on them to see if that helps.
Thanks again
Hello Kevin!
Thank you for the article! Thank you for sharing such special knowledge with us!
Have you faced a similar problem with SCOM groups? We came across a similar issue with some SCOM groups.
For example, we had Group1 and Group2. Group2 is a subgroup for Group1. We deleted Group2 (somehow we were allowed to do that, without removing Group2 from Group1’s subgroups). Now we cannot open Group1’s properties. We found out what Group2 is related to Group1 (via Powershell, in MembershipRules). And we can find Group2 in Datawarehouse (in [OperationsManagerDW].[dbo].[ManagedEntity]). There are no relations among Group2 and anything else in [OperationsManagerDW].[dbo].[Relationship]. I planned to delete Group2, using the article, but Group2 doesn’t exist in BaseManagedEntity table.
Have you faced something like that? I was wondering if you have any idea how to clean it up.
I don’t like or use subgroups. I dont mind having some groups contain other groups, but I write that in XML, not in the UI.
Regardless – this should not stop you from being able to delete a group. If a group membership is defined in something the UI does not understand, the UI will throw an error. However, why can’t you just delete the group and start over?
We thought the same. We created new management pack and created new groups there, but we got the same issue after subgroup removal. I tried to repeat the issue on the same environment, creating new test management pack and test groups, but got no issues. So, unfortunately, I don’t know a root cause of the issue.
I hope, removal of these “partially-removed” groups should help. I’ll try to do it in XML. Thank you for the idea!
scom 2019 UR3. Trying to remove and reinstall a management server (someone elses work that didn’t get connected properly) Ran the queries in order, everything seemed to work fine but the server still shows up in the console. I still have two entries, Microsoft.SystemCenter.HealthServiceWatcher.Microsoft.SystemCenter.CollectionManagementServerWatcherGroup, and Veeam.Virt.Extensions.VMware.CollectorManagement.CollectorInstaller.(ServerName). Help?
I have never had to force remove a management server. If this was a lab I’d just delete those objects using the queries. If this is a customer production then I’d open a support case.
I’m finding a large disparity in the count of agents between Managed agents and Windows Servers displays. I find it hard to rationalize this, and suspect this is due to orphaned objects? I will be comparing lists and running the queries to check the extra entries, but am a little hesitant to remove those that don’t match up without a little reassurance? Is there a better way to do this?
First – you don’t delete ANYTHING unless you KNOW it is an orphan. Deleting things from the DB should NEVER be taken lightly.
Next – it is VERY common to have more Windows Computer objects than agents. This is due to topology discoveries like SCCM, Active Directory, and for clusters where each virtual network name has a Windows Computer object.
What kind of disparity?
Windows Servers state view 1970 objects, Agent Managed view 1892 objects. Any ideas on how to identify ghosts without running the query on each server? And thanks, for the time.
That sounds completely normal to me. What makes you think you have an orphan?
so any advice on vendor based crud that doesn’t play nice in the OM…
i.e. the dbo.MT_*….._log tables there have no date stamps so I’m not sure how its supposed to be clearing those tables (unless it has the data elsewhere and it joins the tables to undo the normalization, so to speak
HALF of the OM is taken up by these. if MT’s tables dont clean up even once all instances are removed, or even cleanup weekly that’s a big issue
i can confirm after a few days after removing all instances from the worst offender we have only seen the state change table drop, and not the the table for this (netapp)
I have a question about the point “orphaned objects still connected to discoveries”. I have two former DCs. The machines would be demoted. The SCOM 2019 agent would be uninstalled. Now these two machines are still connected to some discoveries like “Microsoft.AD.Remote.Topology.Discovery”. How can I remove them from this discoveries? Or can I delete them from the database without disconnecting them from the discovery?
If you delete them from Agent managed in SCOM AFTER they are demoted – then they will get removed.
There is an additional override I recommend on the ADMP – on the “AD Topology Discovery” override the value for “Determines whether to only discover the agent” to True. This will tell the AD topology discovery not to discover a DC if it is not Agent Managed in SCOM.
Thanks for your reply and sorry for the late answer. I activated the override you recommended and wait for the result.
I would like you to know what really caused the issue. In the end I realized that the complete AD Topology Discover didn’t work. I found hints on the web that this discovery object needs .Net 3.5. I checked the management servers and saw that only on the server holding the Reporting Services .Net 3.5 was installed. I installed .Net 3.5 on both other servers too and now Topology Discovery works fine as expected. 🙂
I’m actually 99% sure that I’ve already been able to use the commands from your site successfully.
Of course, I am now on the latest version of SCCM 2309.
Unfortunately, the commands are no longer accepted because he cannot find the objects. Always get:
“invalid ojbect name ‘BasedManagedEntity'” or “invalid ojbect name ‘typedmanagedentity'”
Do you have any idea what the objects are called now? Have they perhaps been renamed?`
I would be very grateful for a solution suggestion.
Greets
No – none of these are for MCM – these are for SCOM, as is my entire site.
Pingback:Troubleshooting System Center Operations Manager, Part 1: Troubleshooting the Windows SCOM Agent | POHN IT-Consulting GmbH
Hmmm. My problem is that a Domain Controller shows discoveries, but never had an agent installed from my new SCOM 2022 deployment. It shows up with an unknown state in Monitoring. Not at all in Administration. I’ve spent several days building this SCOM 2022 UR2 server and had just reached the point where I was bringing my servers into the new SCOM instance. The discovery wizard fails and none of the conditions apply. It still works fine in my SCOM 2019 UR6 instance. The agent is dual homed for both instances. I tried updating the agent to 2022 with hopes that it would show up in Pending Management, but no luck. I really don’t want to start over again because of this one critical server is gummed up.
Domain controllers are always visible in SCOM due to the ADDS management pack. It will create a windows computer object for DC’s even if they are not agent managed. This is by design. You can turn this off easily using an override for the DC Topology discovery to only discover agent managed DC’s.
Simply multi-home that DC agent using the control panel if you want, or do a manual agent install, and make sure you doublecheck this: https://kevinholman.com/2016/11/04/deploying-scom-2016-agents-to-domain-controllers-some-assembly-required/
Thanks for the response.
I’ve manually multi-homed the agent and ran a manual agent update to 2022 UR2 hoping it would show up in pending management before posting, but it still shows up as not managed in Monitoring and does not show up as managed or pending management in Administration. All my other DCs are fine. The problem DC is still communicating fine with the 2019 instance. I double checked the information on the reference link and did not find a problem. HSLockdown.exe /A “NT AUTHORITY\SYSTEM” made no difference either. Not a surprise, since I would expect that I would have had issues with SCOM 2019. I’m afraid I’m at a loss as to where to override the DC Topology discovery to only discover agent managed DC’s. SCOM is only one of many systems I manage and once deployed and tuned, I haven’t had to do much tweaking.
It would be nice to have instructions in this document of what to do if there are current discoveries for the problem system. I kind of feel like I’m dealing with Kobyashi Maru scenario.
Hi Kevin,
I have a similar issue. Working on SCOM 2019 UR5 and 2022 UR1 on clients. The SCOM agent is removed from Active Directory servers. The server object is still in AD.
When you are in the SCOM Console, going to Monitoring, Windows Computers, I can still see the servers there, ‘Not Monitored’.
I followed your steps and removed the relevant server. On the console it is not visible and after and hour or so it reappears.
Is there a way I can get this off from the console not to appear anymore?
Waiting in anticipation for your advice in this regard.
Pieter
You may see this behavior with the Active Directory management pack installed, from the discovery “AD Topology Discovery”, which would pull these machines in as “not monitored”. The objects would get mapped to clients if those DCs were under proper monitoring still.
As Kevin mentioned in another commend – “You can turn this off easily using an override for the DC Topology discovery to only discover agent managed DC’s.” To do this, set the override parameter “Determines whether to only discover the agent” to “true” on the discovery.
Or, if you don’t need/want this, you can disable the discovery, or if you’re not wanting to monitor AD at all, remove the management pack.