Menu Close

SCOM 2012 – Grooming deep dive in the OperationsManager database


Grooming of the OpsDB in OpsMgr 2012 is very similar to OpsMgr 2007.  Grooming is called once per day at 12:00am…. by the rule:  “Partitioning and Grooming” You can search for this rule in the Authoring space of the console, under Rules. It is targeted to the “All Management Servers Resource Pool” and is part of the System Center Internal Library.

image

It calls the “p_PartitioningAndGrooming” stored procedure.  This SP calls two other SP’s:  p_Partitioning and then p_Grooming

p_Partitioning inspects the table PartitionAndGroomingSettings, and then calls the SP p_PartitionObject for each object in the PartitionAndGroomingSettings table where “IsPartitioned = 1”   (note – we partition event and perf into 61 daily tables – just like MOM 2005/SCOM 2007)

The PartitionAndGroomingSettings table:

image

The p_PartitionObject SP first identifies the next partition in the sequence, truncates it to make sure it is empty, and then updates the PartitionTables table in the database, to update the IsCurrent field to the next numeric table for events and perf.  It also sets the current time as the partition end time in the previous “is current” row, and sets the current time in the partition start time of the new “is current” row.  Then it calls the p_PartitionAlterInsertView sproc, to make new data start writing to the “new” current event and perf table.

To review which tables you are writing to – execute the following query:   select * from partitiontables where IsCurrent = ‘1’

A select * from partitiontables will show you all 61 event and perf tables, and when they were used.  You should see a PartitionStartTime updated every day – around midnight (time is stored in UTC in the database).  If partitioning is failing to run, then we wont see this date changing every day. 

Ok – that’s the first step of the p_PartitioningAndGrooming sproc – Partitioning.  Now – if that is all successful, we will start grooming!

The p_Grooming is called after partitioning is successful.  One of the first things it does – is to update the InternalJobHistory table.  In this table – we keep a record of all partitioning and grooming jobs.  It is a good spot check to see what’s going on with grooming.  To have a peek at this table – execute a select * from InternalJobHistory order by InternalJobHistoryId DESC

image

The p_Grooming sproc then calls p_GroomPartitionedObjects

p_GroomPartitionedObjects  will first examine the PartitionAndGroomingSettings and compare the “days to keep” column value, against the current date, to figure out how many partitions to keep vs groom.  It will then inspect the partitions (tables) to ensure they have data, and then truncate the partition, by calling p_PartitionTruncate.  A truncate command is just a VERY fast and efficient way to delete all data from a table without issuing a highly transactional DELETE command.  The p_GroomPartitionedObjects sproc will then update the PartitionAndGroomingSettings table with the current time, under the GroomingRunTime column, to reflect when grooming last ran.

Next – the p_Grooming sproc continues, by calling p_GroomNonPartitionedObjects.

p_GroomNonPartitionedObjects is a short, but complex sproc – in that is calls all the individual sprocs listed in the PartitionAndGroomingSettings table where IsPartitioned = 0.  The following stored procedures are present in my database as non-partitioned data:

  • p_AlertGrooming
  • p_StateChangeEventGrooming
  • p_MaintenanceModeHistoryGrooming
  • p_AvailabilityHistoryGrooming
  • p_JobStatusGrooming
  • p_MonitoringJobStatusGrooming
  • p_PerformanceSignatureGrooming
  • p_PendingSdkDataSourceGrooming
  • p_InternalJobHistoryGrooming
  • p_EntityChangeLogGroom
  • p_UserSettingsStoreGrooming
  • p_TriggerEntityChangeLogStagedGrooming

Now, for the above sprocs, each one could potentially return a success or failure.  They will also likely call additional sprocs, for specific tasks.  You can see, the rabbit hole is deep.  Smile  This is just an example of the complexity involved in self-maintenance and grooming.  If you are experiencing a grooming failure of any kind, and the error messages involve any of the above stored procedures when you execute p_PartitioningAndGrooming manually, you should open a support case with Microsoft for troubleshooting and resolution.  The theory is, that each of the above procedures grooms a specific non-partitioned dataset.  Under NORMAL circumstances, each should be able to complete in a reasonable time frame.  The challenge becomes evident when you have something go wrong, like alert storms, state change even storms from monitors flip-flop, lots of performance signature data from using self-tuning threshold monitors, huge amounts of pending SDK datasource data from large Exchange 2010 environments, or from other MP’s that might leverage this.  Grooming non-partitioned data is slow, and highly resource intensive and transactional.  These are specific delete statements, from tables directly, often combined with creating temp tables in TempDB.  Having a good presized high performance TempDB can help, as will ensuring you have plenty of transaction log space for the database, and having the disk subsystem offer as many IOPS as possible.  http://technet.microsoft.com/en-us/library/ms175527(v=SQL.105).aspx

Next – the p_Grooming sproc continues, by updating the InternalJobHistory table, to give it a status of success (StatusCode of 1 = success, 2= failed, 0 appears to be never completed?)

If you ever have a problem with grooming – or need to get your OpsDB database size under control – simply reduce the data retention days, in the console, under Administration, Settings, Database Grooming.  To start with – I recommend setting all these to just 2 days, from the default of 7.  This keeps your OpsDB under control until you have time to tune all the noise from the MP’s you import.  So just reduce this number, then open up query analyzer, and execute EXEC p_PartitioningAndGrooming  When it is done, check the job status by executing select * from InternalJobHistory order by InternalJobHistoryId DESC   The last groom job should be present, and successful.  The OpsDB size should be smaller, with more free space.  And to validate, you can always run my large table query

5 Comments

  1. Peter

    Hi Kevin!

    When you run select * from partitiontables
    Should ALL the 61 tables have PartitionStartTime updated every day?

    In my environemnt its not like that as you can see below

    ObjectId PartitionId PartitionName PartitionStartTime PartitionEndTime IsCurrent
    1 0 Event_00 2019-01-18 00:00:00.303 2019-01-19 00:00:00.313 0
    1 1 Event_01 2019-01-19 00:00:00.313 2019-01-20 00:00:00.350 0
    1 2 Event_02 2019-01-20 00:00:00.350 2019-01-21 00:00:00.310 0
    1 3 Event_03 2019-01-21 00:00:00.310 2019-01-22 00:00:00.300 0
    1 4 Event_04 2019-01-22 00:00:00.300 2019-01-23 00:00:00.277 0
    1 5 Event_05 2019-01-23 00:00:00.277 2019-01-24 00:00:00.257 0
    1 6 Event_06 2019-01-24 00:00:00.257 2019-01-25 00:00:00.120 0
    1 7 Event_07 2019-01-25 00:00:00.120 2019-01-26 00:00:00.323 0
    1 8 Event_08 2019-01-26 00:00:00.323 2019-01-27 00:00:00.317 0
    1 9 Event_09 2019-01-27 00:00:00.317 2019-01-28 00:00:00.270 0
    1 10 Event_10 2019-01-28 00:00:00.270 2019-01-29 00:00:00.330 0
    1 11 Event_11 2019-01-29 00:00:00.330 2019-01-30 00:00:00.283 0
    1 12 Event_12 2019-01-30 00:00:00.283 2019-01-31 00:00:00.323 0
    1 13 Event_13 2019-01-31 00:00:00.323 2019-02-01 00:00:00.297 0
    1 14 Event_14 2019-02-01 00:00:00.297 2019-02-02 00:00:00.340 0
    1 15 Event_15 2019-02-02 00:00:00.340 2019-02-03 00:00:00.303 0
    1 16 Event_16 2019-02-03 00:00:00.303 2019-02-04 00:00:00.317 0
    1 17 Event_17 2019-02-04 00:00:00.317 2019-02-05 00:00:00.290 0
    1 18 Event_18 2019-02-05 00:00:00.290 2019-02-06 00:00:00.317 0
    1 19 Event_19 2019-02-06 00:00:00.317 2019-02-07 00:00:00.290 0
    1 20 Event_20 2019-02-07 00:00:00.290 2019-02-08 00:00:00.270 0
    1 21 Event_21 2019-02-08 00:00:00.270 2019-02-09 00:00:00.293 0
    1 22 Event_22 2019-02-09 00:00:00.293 2019-02-10 00:00:00.323 0
    1 23 Event_23 2019-02-10 00:00:00.323 2019-02-11 00:00:00.123 0
    1 24 Event_24 2019-02-11 00:00:00.123 2019-02-12 00:00:01.107 0
    1 25 Event_25 2019-02-12 00:00:01.107 2019-02-13 00:00:00.683 0
    1 26 Event_26 2019-02-13 00:00:00.683 2019-02-14 00:00:00.643 0
    1 27 Event_27 2019-02-14 00:00:00.643 2019-02-15 00:00:00.000 1
    1 28 Event_28 2018-12-16 00:00:00.940 2018-12-17 00:00:00.567 0
    1 29 Event_29 2018-12-17 00:00:00.567 2018-12-18 00:00:00.680 0
    1 30 Event_30 2018-12-18 00:00:00.680 2018-12-19 00:00:00.057 0
    1 31 Event_31 2018-12-19 00:00:00.057 2018-12-20 00:00:00.640 0
    1 32 Event_32 2018-12-20 00:00:00.640 2018-12-21 00:00:00.490 0
    1 33 Event_33 2018-12-21 00:00:00.490 2018-12-21 23:59:59.900 0
    1 34 Event_34 2018-12-21 23:59:59.900 2018-12-23 00:00:01.307 0
    1 35 Event_35 2018-12-23 00:00:01.307 2018-12-24 00:00:00.593 0
    1 36 Event_36 2018-12-24 00:00:00.593 2018-12-25 00:00:00.587 0
    1 37 Event_37 2018-12-25 00:00:00.587 2018-12-26 00:00:00.600 0
    1 38 Event_38 2018-12-26 00:00:00.600 2018-12-27 00:00:00.537 0
    1 39 Event_39 2018-12-27 00:00:00.537 2018-12-28 00:00:00.600 0
    1 40 Event_40 2018-12-28 00:00:00.600 2018-12-29 00:00:00.540 0
    1 41 Event_41 2018-12-29 00:00:00.540 2018-12-30 00:00:00.627 0
    1 42 Event_42 2018-12-30 00:00:00.627 2018-12-31 00:00:00.547 0
    1 43 Event_43 2018-12-31 00:00:00.547 2019-01-01 00:00:00.547 0
    1 44 Event_44 2019-01-01 00:00:00.547 2019-01-02 00:00:00.667 0
    1 45 Event_45 2019-01-02 00:00:00.667 2019-01-03 00:00:00.563 0
    1 46 Event_46 2019-01-03 00:00:00.563 2019-01-04 00:00:01.050 0
    1 47 Event_47 2019-01-04 00:00:01.050 2019-01-05 00:00:00.337 0
    1 48 Event_48 2019-01-05 00:00:00.337 2019-01-06 00:00:00.330 0
    1 49 Event_49 2019-01-06 00:00:00.330 2019-01-07 00:00:00.283 0
    1 50 Event_50 2019-01-07 00:00:00.283 2019-01-08 00:00:00.360 0
    1 51 Event_51 2019-01-08 00:00:00.360 2019-01-09 00:00:00.303 0
    1 52 Event_52 2019-01-09 00:00:00.303 2019-01-10 00:00:00.343 0
    1 53 Event_53 2019-01-10 00:00:00.343 2019-01-11 00:00:00.320 0
    1 54 Event_54 2019-01-11 00:00:00.320 2019-01-12 00:00:00.323 0
    1 55 Event_55 2019-01-12 00:00:00.323 2019-01-13 00:00:00.277 0
    1 56 Event_56 2019-01-13 00:00:00.277 2019-01-14 00:00:00.310 0
    1 57 Event_57 2019-01-14 00:00:00.310 2019-01-15 00:00:00.310 0
    1 58 Event_58 2019-01-15 00:00:00.310 2019-01-16 00:00:00.243 0
    1 59 Event_59 2019-01-16 00:00:00.243 2019-01-16 23:59:59.990 0
    1 60 Event_60 2019-01-16 23:59:59.990 2019-01-18 00:00:00.303 0
    2 0 PerformanceData_00 2019-01-18 00:00:00.333 2019-01-19 00:00:00.360 0
    2 1 PerformanceData_01 2019-01-19 00:00:00.360 2019-01-20 00:00:00.367 0
    2 2 PerformanceData_02 2019-01-20 00:00:00.367 2019-01-21 00:00:00.357 0
    2 3 PerformanceData_03 2019-01-21 00:00:00.357 2019-01-22 00:00:00.330 0
    2 4 PerformanceData_04 2019-01-22 00:00:00.330 2019-01-23 00:00:00.310 0
    2 5 PerformanceData_05 2019-01-23 00:00:00.310 2019-01-24 00:00:00.303 0
    2 6 PerformanceData_06 2019-01-24 00:00:00.303 2019-01-25 00:00:00.167 0
    2 7 PerformanceData_07 2019-01-25 00:00:00.167 2019-01-26 00:00:00.463 0
    2 8 PerformanceData_08 2019-01-26 00:00:00.463 2019-01-27 00:00:00.350 0
    2 9 PerformanceData_09 2019-01-27 00:00:00.350 2019-01-28 00:00:00.287 0
    2 10 PerformanceData_10 2019-01-28 00:00:00.287 2019-01-29 00:00:00.360 0
    2 11 PerformanceData_11 2019-01-29 00:00:00.360 2019-01-30 00:00:00.330 0
    2 12 PerformanceData_12 2019-01-30 00:00:00.330 2019-01-31 00:00:00.353 0
    2 13 PerformanceData_13 2019-01-31 00:00:00.353 2019-02-01 00:00:00.327 0
    2 14 PerformanceData_14 2019-02-01 00:00:00.327 2019-02-02 00:00:00.383 0
    2 15 PerformanceData_15 2019-02-02 00:00:00.383 2019-02-03 00:00:00.333 0
    2 16 PerformanceData_16 2019-02-03 00:00:00.333 2019-02-04 00:00:00.350 0
    2 17 PerformanceData_17 2019-02-04 00:00:00.350 2019-02-05 00:00:00.320 0
    2 18 PerformanceData_18 2019-02-05 00:00:00.320 2019-02-06 00:00:00.363 0
    2 19 PerformanceData_19 2019-02-06 00:00:00.363 2019-02-07 00:00:00.320 0
    2 20 PerformanceData_20 2019-02-07 00:00:00.320 2019-02-08 00:00:00.300 0
    2 21 PerformanceData_21 2019-02-08 00:00:00.300 2019-02-09 00:00:00.340 0
    2 22 PerformanceData_22 2019-02-09 00:00:00.340 2019-02-10 00:00:00.353 0
    2 23 PerformanceData_23 2019-02-10 00:00:00.353 2019-02-11 00:00:00.217 0
    2 24 PerformanceData_24 2019-02-11 00:00:00.217 2019-02-12 00:00:01.140 0
    2 25 PerformanceData_25 2019-02-12 00:00:01.140 2019-02-13 00:00:00.697 0
    2 26 PerformanceData_26 2019-02-13 00:00:00.697 2019-02-14 00:00:00.660 0
    2 27 PerformanceData_27 2019-02-14 00:00:00.660 2019-02-15 00:00:00.000 1
    2 28 PerformanceData_28 2018-12-16 00:00:00.970 2018-12-17 00:00:00.597 0
    2 29 PerformanceData_29 2018-12-17 00:00:00.597 2018-12-18 00:00:00.693 0
    2 30 PerformanceData_30 2018-12-18 00:00:00.693 2018-12-19 00:00:00.073 0
    2 31 PerformanceData_31 2018-12-19 00:00:00.073 2018-12-20 00:00:00.670 0
    2 32 PerformanceData_32 2018-12-20 00:00:00.670 2018-12-21 00:00:00.503 0
    2 33 PerformanceData_33 2018-12-21 00:00:00.503 2018-12-21 23:59:59.930 0
    2 34 PerformanceData_34 2018-12-21 23:59:59.930 2018-12-23 00:00:01.323 0
    2 35 PerformanceData_35 2018-12-23 00:00:01.323 2018-12-24 00:00:00.610 0
    2 36 PerformanceData_36 2018-12-24 00:00:00.610 2018-12-25 00:00:00.617 0
    2 37 PerformanceData_37 2018-12-25 00:00:00.617 2018-12-26 00:00:00.617 0
    2 38 PerformanceData_38 2018-12-26 00:00:00.617 2018-12-27 00:00:00.583 0
    2 39 PerformanceData_39 2018-12-27 00:00:00.583 2018-12-28 00:00:00.630 0
    2 40 PerformanceData_40 2018-12-28 00:00:00.630 2018-12-29 00:00:00.557 0
    2 41 PerformanceData_41 2018-12-29 00:00:00.557 2018-12-30 00:00:00.657 0
    2 42 PerformanceData_42 2018-12-30 00:00:00.657 2018-12-31 00:00:00.560 0
    2 43 PerformanceData_43 2018-12-31 00:00:00.560 2019-01-01 00:00:00.580 0
    2 44 PerformanceData_44 2019-01-01 00:00:00.580 2019-01-02 00:00:00.727 0
    2 45 PerformanceData_45 2019-01-02 00:00:00.727 2019-01-03 00:00:00.580 0
    2 46 PerformanceData_46 2019-01-03 00:00:00.580 2019-01-04 00:00:01.063 0
    2 47 PerformanceData_47 2019-01-04 00:00:01.063 2019-01-05 00:00:00.370 0
    2 48 PerformanceData_48 2019-01-05 00:00:00.370 2019-01-06 00:00:00.363 0
    2 49 PerformanceData_49 2019-01-06 00:00:00.363 2019-01-07 00:00:00.300 0
    2 50 PerformanceData_50 2019-01-07 00:00:00.300 2019-01-08 00:00:00.373 0
    2 51 PerformanceData_51 2019-01-08 00:00:00.373 2019-01-09 00:00:00.380 0
    2 52 PerformanceData_52 2019-01-09 00:00:00.380 2019-01-10 00:00:00.407 0
    2 53 PerformanceData_53 2019-01-10 00:00:00.407 2019-01-11 00:00:00.400 0
    2 54 PerformanceData_54 2019-01-11 00:00:00.400 2019-01-12 00:00:00.357 0
    2 55 PerformanceData_55 2019-01-12 00:00:00.357 2019-01-13 00:00:00.323 0
    2 56 PerformanceData_56 2019-01-13 00:00:00.323 2019-01-14 00:00:00.393 0
    2 57 PerformanceData_57 2019-01-14 00:00:00.393 2019-01-15 00:00:00.340 0
    2 58 PerformanceData_58 2019-01-15 00:00:00.340 2019-01-16 00:00:00.273 0
    2 59 PerformanceData_59 2019-01-16 00:00:00.273 2019-01-17 00:00:00.037 0
    2 60 PerformanceData_60 2019-01-17 00:00:00.037 2019-01-18 00:00:00.333 0

  2. Mark

    Hi Kevin. Looking at the InternalJobHistory table for our setup it appears that the job used to run at midnight but then moved to 23:00 after Daylight Summer Time (I am UK based).

    Start End Command
    2019-04-04 23:00:01.560 2019-04-04 23:00:54.517 Exec dbo.p_GroomPartitionedObjects …
    2019-04-03 23:00:01.287 2019-04-03 23:01:02.740 Exec dbo.p_GroomPartitionedObjects …
    2019-04-02 23:00:01.387 2019-04-02 23:00:54.017 Exec dbo.p_GroomPartitionedObjects …
    2019-04-01 23:00:01.550 2019-04-01 23:01:03.210 Exec dbo.p_GroomPartitionedObjects …
    2019-03-31 23:00:02.893 2019-03-31 23:01:28.050 Exec dbo.p_GroomPartitionedObjects …
    2019-03-30 00:00:01.463 2019-03-30 00:00:59.653 Exec dbo.p_GroomPartitionedObjects …
    2019-03-29 00:00:01.453 2019-03-29 00:00:59.600 Exec dbo.p_GroomPartitionedObjects …
    2019-03-28 00:00:01.433 2019-03-28 00:00:54.760 Exec dbo.p_GroomPartitionedObjects …
    2019-03-27 00:00:01.230 2019-03-27 00:00:52.270 Exec dbo.p_GroomPartitionedObjects …

    What could I do to fix this please?

    Thank you.

      • Mark

        Having dug deeper I now see that the Operations Manager database is still running the grooming at midnight (UK Daylight Saving Time) but the timestamp now says 23:00 because the grooming stored procedure uses UTC time to get the date and time.

        So changing the time is not actually the correct issue. That was my mistake to think it was.

        However if we report off of the perf.vPerfDaily view in the DataWarehouse then we can see data that looks like this:

        Server Value DateTime
        Server A 73 2019-03-31 23:00:00.000
        Server A 73 2019-03-30 23:00:00.000
        Server A 73 2019-03-30 00:00:00.000
        Server A 73 2019-03-29 00:00:00.000

        Where it appears we get a double entry for the 30th March. One at 23:00 and one at 00:00. This is around the time that UK DST goes forward an hour.

        The two top entries are one hour behind from UK DST. By amending the SQL select statement for the report to say that if any datetime has got the hour 23 in it, display that date with an hour added to it. Which then returns:

        Server Value DateTime
        Server A 73 2019-04-01 00:00:00.000
        Server A 73 2019-03-31 00:00:00.000
        Server A 73 2019-03-30 00:00:00.000
        Server A 73 2019-03-29 00:00:00.000

Leave a Reply

Your email address will not be published. Required fields are marked *