Menu Close

Understanding and modifying Data Warehouse retention and grooming

You will likely find that the default retention in the SCOM data warehouse will need to be adjusted for your environment.  I often find customers are reluctant to adjust these – because they don’t know what they want to keep.  So they assume the defaults are good, and they just keep EVERYTHING.

 

This is a bad idea. 

 

A data warehouse will often be one of the largest databases supported by a company.  Large databases cost money.  They cost money to support.  They are more difficult to maintain.  They cost more to backup in time, backup storage capacity, network impact, etc.  They take longer to restore in the case of a disaster.  The larger they get, the more they cost in hardware (disk space) to support them.  The larger they get, can impact how long reports take to complete.

For these reasons – you should give STRONG consideration to reducing your warehouse retention to your reporting REQUIREMENTS.  If you don’t have any – MAKE SOME!

Originally, when the product released – you had to directly edit SQL tables to adjust this.  Then, tools were released to adjust these values making the process easier and safer.  This post is just going to be a walk through of this process to better understand using these tools – and what each dataset actually means.

 

Here is a link to the original command line tool, DWDATARP:   https://kevinholman.com/files/dwdatarp.zip

Here is a tool which makes this even simpler:  https://www.scom2k7.com/scom-datawarehouse-grooming-settings-super-easy-with-new-gui-tool/

And the latest new tool which adds features and simplicity, which is what I recommend: https://blakedrumm.com/blog/scom-dw-grooming-tool/

 

Different data types are kept in the Data Warehouse in unique “Datasets”.  Each dataset represents a different data type (events, alerts, performance, etc..) and the aggregation type (raw, hourly, daily)

Not every customer will have exactly the same data sets.  This is because some management packs will add their own dataset – if that MP has something very unique that it will collect – that does not fit into the default “buckets” that already exist.

 

So – first – we need to understand the different datasets available – and what they mean.  All the datasets for an environment are kept in the “Dataset” table in the Warehouse database.

SELECT * from dataset ORDER BY DataSetDefaultName

 

This will show us the available datasets.  Common datasets are:

  • Alert data set
  • Event data set
  • Performance data set
  • State data set
  • Client Monitoring data set

Alert, Event, Performance, and State are the most common ones we look at.

 

However – in the warehouse – we also keep different aggregations of some of the datasets – where it makes sense.  The most common datasets that we will aggregate are Performance data and State data.  The reason we have raw, hourly, and daily aggregations – is to be able to keep data for longer periods of time – but still have very good performance on running reports.

In MOM 2005 – we used to stick ALL the raw performance data into a single table in the Warehouse.  After a year of data was reached – this meant the perf table would grow to a HUGE size – and running multiple queries against this table would be impossible to complete with acceptable performance.  It also meant grooming this table would take forever, and would be prone to timeouts and failures.

In SCOM – now we aggregate this data into hourly and daily aggregations.  These aggregations allow us to “summarize” the performance, or state data, into MUCH smaller table sizes.  This means we can keep data for a MUCH longer period of time than ever before.  We also optimized this by splitting these into multiple tables.  When a table reaches a pre-determined size, or number of records – we will start a new table for inserting.  This allows grooming to be incredibly efficient – because now we can simply drop the old tables when all of the data in a table is older than the grooming retention setting.

 

Ok – that’s the background on aggregations.  To see this information – we will need to look at the StandardDatasetAggregation table.

SELECT * from StandardDatasetAggregation

That table contains all the datasets, and their aggregation settings.  To help make more sense of this –  I will join the dataset and the StandardDatasetAggregation tables in a single query – to only show you what you need to look at:

SELECT DataSetDefaultName, AggregationTypeId, MaxDataAgeDays FROM StandardDatasetAggregation sda INNER JOIN dataset ds on ds.datasetid = sda.datasetid ORDER BY DataSetDefaultName

This query will give us the common dataset name, the aggregation type, and the current maximum retention setting.

For the AggregationTypeId:

0 = Raw

20 = Hourly

30 = Daily

Here is my output:

DataSetDefaultName AggregationTypeId MaxDataAgeDays
Alert data set 0 365
Client Monitoring data set 0 30
Client Monitoring data set 30 400
Event data set 0 10
Exchange 2013: Mailbox Database data warehouse dataset 0 30
Exchange 2013: Mailbox statistics data warehouse dataset 0 30
Exchange 2013: Mailbox statistics data warehouse dataset 30 400
Microsoft.Windows.Client.Win10.Dataset.ClientPerf 0 7
Microsoft.Windows.Client.Win10.Dataset.ClientPerf 30 91
Microsoft.Windows.Client.Win10.Dataset.DiskFailure 0 7
Microsoft.Windows.Client.Win10.Dataset.DiskFailure 30 182
Microsoft.Windows.Client.Win10.Dataset.Memory 0 7
Microsoft.Windows.Client.Win10.Dataset.Memory 30 91
Microsoft.Windows.Client.Win10.Dataset.ShellPerf 0 7
Microsoft.Windows.Client.Win10.Dataset.ShellPerf 30 91
Performance data set 0 10
Performance data set 20 60
Performance data set 30 365
State data set 0 60
State data set 20 60
State data set 30 365

 

You will probably notice – that we only keep 10 days of RAW Performance by default.  Generally – you don’t want to mess with this.  This is simply to keep a short amount of raw data – to build our hourly and daily aggregations from.  All built in performance reports in SCOM run from Hourly, or Daily aggregations by default.

 

Now we are cooking!

Fortunately – there is a command line tool published that will help make changes to these retention periods, and provide more information about how much data we have currently.  This tool is called DWDATARP.EXE.  It is available for download HERE.

This gives us a nice way to view the current settings.  Download this to your tools machine, your SCOM server, or directly on your warehouse machine.  Run it from a command line.

Run just the tool with no parameters to get help:   

C:\>dwdatarp.exe

To get our current settings – run the tool with ONLY the –s (server\instance) and –d (database) parameters.  This will output the current settings.  However – it does not format well to the screen – so output it to a TXT file and open it:

C:\>dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW > c:\dwoutput.txt

Here is my output (I removed some of the vista/client garbage for brevity)

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 400 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 153,016 ( 4%)
Event data set Raw data 100 1,348,168 ( 37%)
Performance data set Raw data 10 467,552 ( 13%)
Performance data set Hourly aggregations 400 1,265,160 ( 35%)
Performance data set Daily aggregations 400 61,176 ( 2%)
State data set Raw data 180 13,024 ( 0%)
State data set Hourly aggregations 400 305,120 ( 8%)
State data set Daily aggregations 400 20,112 ( 1%)

 

Right off the bat – I can see how little data that daily performance actually consumes.  I can see how much data that only 10 days of RAW perf data consume.  I also see a surprising amount of event data consuming space in the database.  Typically – you will see that perf hourly will consume the most space in a warehouse.

 

So – with this information in hand – I can do two things….

  • I can know what is using up most of the space in my warehouse.
  • I can know the Dataset name, and Aggregation name… to input to the command line tool to adjust it!

 

Now – on to the retention adjustments.

 

First thing – I will need to gather my Reporting service level agreement from management.  This is my requirement for how long I need to keep data for reports.  I also need to know “what kind” of reports they want to be able to run for this period.

From this discussion with management – we determined:

  • We require detailed performance reports for 90 days (hourly aggregations)
  • We require less detailed performance reports (daily aggregations) for 1 year for trending and capacity planning.
  • We want to keep a record of all ALERTS for 6 months.
  • We don’t use any event reports, so we can reduce this retention from 100 days to 30 days.
  • We don’t use AEM (Client Monitoring Dataset) so we will leave this unchanged.
  • We don’t report on state changes much (if any) so we will set all of these to 90 days.

 

Now I will use the DWDATARP.EXE tool – to adjust these values based on my company reporting SLA:

 

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Hourly aggregations" -m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Daily aggregations" -m 365 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Alert data set" -a "Raw data" -m 180 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Event data set" -a "Raw Data" -m 30 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Raw data" -m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Hourly aggregations" -m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Daily aggregations" -m 90

 

 

Now my table reflects my reporting SLA – and my actual space needed in the warehouse will be much reduced in the long term:

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 180 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 152,944 ( 4%)
Event data set Raw data 30 1,348,552 ( 37%)
Performance data set Raw data 10 468,960 ( 13%)
Performance data set Hourly aggregations 90 1,265,992 ( 35%)
Performance data set Daily aggregations 365 61,176 ( 2%)
State data set Raw data 90 13,024 ( 0%)
State data set Hourly aggregations 90 305,120 ( 8%)
State data set Daily aggregations 90 20,112 ( 1%)

 

Here are some general rules of thumb (might be different if your environment is unique)

  • Only keep the maximum retention of data in the warehouse per your reporting requirements.
  • Do not modify the performance RAW dataset.
  • Most performance reports are run against Perf Hourly data for detail performance throughout the day.  For reports that span long periods of time (weeks/months) you should generally use Daily aggregation.
  • Daily aggregations should generally be kept for the same retention as hourly – or longer.
  • Hourly datasets use up much more space than daily aggregations.
  • Most people don’t use events in reports – and these can often be groomed much sooner than the default of 100 days.
  • Most people don’t do a lot of state reporting beyond 30 days, and these can be groomed much sooner as well if desired.
  • Don’t modify a setting if you don’t use it.  There is no need.
  • The Configuration dataset generally should not be modified.  This keeps data about objects to report on, in the warehouse.  It should be set to at LEAST the longest of any perf, alert, event, or state datasets that you use for reporting.

 

An alternative to DWDATARP.exe and command line – is a newer GUI tool available HERE. 

image

 

 

Some direct SQL statements which can be used (with great care!) instead of DWDataRP:

Retention settings:

--Retention Settings --This is the current setting for retention in your DW and the current Grooming Interval SELECT ds.datasetDefaultName AS 'Dataset Name', CASE WHEN sda.AggregationTypeId = 0 THEN 'raw' WHEN sda.AggregationTypeId = 20 THEN 'hourly' WHEN sda.AggregationTypeId = 30 THEN 'daily' ELSE 'unknown' END AS 'AggregationType', sda.MaxDataAgeDays AS 'RetentionDays', sda.GroomingIntervalMinutes FROM dataset ds, StandardDatasetAggregation sda WHERE ds.datasetid = sda.datasetid --AND ds.datasetDefaultName IN ('Alert data set','Event data set','Performance data set','State data set') ORDER by ds.datasetDefaultName

Compare actual data retention:

--Actual Data Retained --This queries the tables and finds out how old your data actually is in the DW to verify if grooming is working select min(DWLastModifiedDateTime) as MinDate, max(DWLastModifiedDateTime) as MaxDate, datediff(d,min(DWLastModifiedDateTime),max(DWLastModifiedDateTime)) AS Alert from Alert.vAlert select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS Event from Event.vEvent select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS PerfHourly from Perf.vPerfHourly select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS StateHourly from State.vStateHourly

Modify retention:

--Alert Data UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 180 --default is 400 WHERE GroomStoredProcedureName = 'AlertGroom' --Event Data UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 30 --default is 100 WHERE GroomStoredProcedureName = 'EventGroom' --Performance RAW UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 10 --default is 10 WHERE GroomStoredProcedureName = 'PerformanceGroom' AND AggregationTypeId = '0' --Performance HOURLY UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 90 --default is 400 WHERE GroomStoredProcedureName = 'PerformanceGroom' AND AggregationTypeId = '20' --Performance DAILY UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 365 --default is 400 WHERE GroomStoredProcedureName = 'PerformanceGroom' AND AggregationTypeId = '30' --State RAW UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 90 --default is 180 WHERE GroomStoredProcedureName = 'StateGroom' AND AggregationTypeId = '0' --State HOURLY UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 90 --default is 400 WHERE GroomStoredProcedureName = 'StateGroom' AND AggregationTypeId = '20' --State DAILY UPDATE StandardDatasetAggregation SET MaxDataAgeDays = 90 --default is 400 WHERE GroomStoredProcedureName = 'StateGroom' AND AggregationTypeId = '30'

Manually run grooming:

--This will manually run grooming for this dataset DECLARE @DatasetId uniqueidentifier SET @DatasetId = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'Alert') EXEC StandardDatasetGroom @DatasetId --This will manually run grooming for this dataset DECLARE @DatasetId uniqueidentifier SET @DatasetId = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'Event') EXEC StandardDatasetGroom @DatasetId --This will manually run grooming for this dataset DECLARE @DataSetId uniqueidentifier SET @DataSetId = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'Perf') EXEC StandardDatasetGroom @DataSetId --This will manually run grooming for this dataset DECLARE @DataSetId uniqueidentifier SET @DataSetId = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'State') EXEC StandardDatasetGroom @DataSetId

View all the dataset data and consumption percentages:

--BEGIN QUERY DECLARE @DatasetId uniqueidentifier ,@AggregationTypeId int ,@ServerName sysname ,@DatabaseName sysname ,@SchemaName sysname ,@DatasetName nvarchar(256) ,@DatasetDescription nvarchar(max) ,@AggregationTypeName nvarchar(50) ,@MaxDataAgeDays int ,@DataFileGroupName sysname ,@IndexFileGroupName sysname ,@StandardDatasetTableMapRowId int ,@TableGuid uniqueidentifier ,@TableNameSuffix varchar(100) ,@StartDateTime datetime ,@EndDateTime datetime ,@StandardDatasetAggregationStorageRowId int ,@DependentTableInd tinyint ,@BaseTableName nvarchar(90) ,@TableName nvarchar(max) ,@RowCount bigint ,@SizeKb bigint ,@RowCountForDailyAvg bigint ,@SizeKbForDailyAvg bigint ,@MinStartDateTime datetime ,@MaxEndDateTime datetime ,@TotalHours bigint ,@TableCreatedDateTime datetime ,@DomainTableRowId int DECLARE @TableSize TABLE ( TableName sysname NOT NULL ,[RowCount] bigint NOT NULL ,Reserved varchar(30) NOT NULL ,Data varchar(30) NOT NULL ,IndexSize varchar(30) NOT NULL ,Unused varchar(30) NOT NULL ) DECLARE @Result TABLE ( DatasetId uniqueidentifier NOT NULL ,ServerName sysname NOT NULL ,DatabaseName sysname NOT NULL ,DatasetName nvarchar(256) NOT NULL ,AggregationTypeId int NOT NULL ,AggregationTypeName nvarchar(50) NOT NULL ,MaxDataAgeDays int NOT NULL ,[RowCount] bigint NULL ,MinStartDateTime datetime NULL ,SizeKb bigint NOT NULL ,DailySizeKb float NULL ,DailyRowCount bigint NULL ,TotalSizeKb float NULL ,TotalRowCount bigint NULL ,DataFileGroupName sysname NOT NULL ,IndexFileGroupName sysname NOT NULL ) SET @DatasetId = '00000000-0000-0000-0000-000000000000' WHILE EXISTS (SELECT * FROM vDataset d JOIN StandardDataset sd ON (d.DatasetId = sd.DatasetId) JOIN vMemberDatabase mdb ON (d.MemberDatabaseRowId = mdb.MemberDatabaseRowId) WHERE (d.DatasetId > @DatasetId) AND (d.InstallCompletedInd = 1) ) BEGIN SELECT TOP 1 @DatasetId = d.DatasetId ,@SchemaName = sd.SchemaName ,@DatasetName = d.DatasetDefaultName ,@DatasetDescription = d.DatasetDefaultDescription ,@ServerName = mdb.ServerName ,@DatabaseName = mdb.DatabaseName FROM vDataset d JOIN StandardDataset sd ON (d.DatasetId = sd.DatasetId) JOIN vMemberDatabase mdb ON (d.MemberDatabaseRowId = mdb.MemberDatabaseRowId) WHERE (d.DatasetId > @DatasetId) AND (d.InstallCompletedInd = 1) ORDER BY d.DatasetId SET @AggregationTypeId = -1 WHILE EXISTS (SELECT * FROM StandardDatasetAggregation WHERE (DatasetId = @DatasetId) AND (AggregationTypeId > @AggregationTypeId) ) BEGIN SELECT TOP 1 @AggregationTypeId = a.AggregationTypeId ,@AggregationTypeName = at.AggregationTypeDefaultName ,@MaxDataAgeDays = a.MaxDataAgeDays ,@DataFileGroupName = a.DataFileGroupName ,@IndexFileGroupName = a.IndexFileGroupName FROM StandardDatasetAggregation a JOIN vAggregationType at ON (a.AggregationTypeId = at.AggregationTypeId) WHERE (a.DatasetId = @DatasetId) AND (a.AggregationTypeId > @AggregationTypeId) ORDER BY a.AggregationTypeId SET @RowCount = 0 SET @SizeKb = 0 SET @TotalHours = 0 SET @MinStartDateTime = NULL SET @RowCountForDailyAvg = 0 SET @SizeKbForDailyAvg = 0 SET @StandardDatasetTableMapRowId = 0 WHILE EXISTS (SELECT * FROM StandardDatasetTableMap WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = @AggregationTypeId) AND (StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId) ) BEGIN SELECT TOP 1 @StandardDatasetTableMapRowId = StandardDatasetTableMapRowId ,@TableGuid = TableGuid ,@TableNameSuffix = TableNameSuffix ,@StartDateTime = StartDateTime ,@EndDateTime = EndDateTime FROM StandardDatasetTableMap WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = @AggregationTypeId) AND (StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId) ORDER BY StandardDatasetTableMapRowId SET @StandardDatasetAggregationStorageRowId = 0 WHILE EXISTS (SELECT * FROM StandardDatasetAggregationStorage WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = @AggregationTypeId) AND (StandardDatasetAggregationStorageRowId > @StandardDatasetAggregationStorageRowId) ) BEGIN SELECT TOP 1 @StandardDatasetAggregationStorageRowId = StandardDatasetAggregationStorageRowId ,@DependentTableInd = DependentTableInd ,@BaseTableName = BaseTableName FROM StandardDatasetAggregationStorage WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = @AggregationTypeId) AND (StandardDatasetAggregationStorageRowId > @StandardDatasetAggregationStorageRowId) ORDER BY StandardDatasetAggregationStorageRowId SELECT @TableCreatedDateTime = create_date FROM sys.objects o JOIN sys.schemas s ON (o.schema_id = s.schema_id) WHERE (s.name = @SchemaName) AND (o.name = @BaseTableName + '_' + @TableNameSuffix) IF (@StartDateTime < @TableCreatedDateTime) SET @StartDateTime = @TableCreatedDateTime IF (@EndDateTime > GETUTCDATE()) SET @EndDateTime = GETUTCDATE() SET @TableName = QUOTENAME(@SchemaName) + '.' + QUOTENAME(@BaseTableName + '_' + @TableNameSuffix) DELETE @TableSize INSERT @TableSize (TableName, [RowCount], Reserved, Data, IndexSize, Unused) EXEC sp_spaceused @TableName SELECT @RowCount = @RowCount + CASE WHEN @DependentTableInd = 0 THEN [RowCount] ELSE 0 END ,@SizeKb = @SizeKb + CAST(REPLACE(REPLACE(Reserved, 'KB', ''), ' ', '') as bigint) FROM @TableSize IF (@StartDateTime IS NOT NULL) AND (@EndDateTime IS NOT NULL) BEGIN SET @TotalHours = @TotalHours + ABS(DATEDIFF(hour, @StartDateTime, @EndDateTime)) SELECT @RowCountForDailyAvg = @RowCountForDailyAvg + CASE WHEN @DependentTableInd = 0 THEN [RowCount] ELSE 0 END ,@SizeKbForDailyAvg = @SizeKbForDailyAvg + CAST(REPLACE(REPLACE(Reserved, 'KB', ''), ' ', '') as bigint) FROM @TableSize SET @MinStartDateTime = CASE WHEN @MinStartDateTime IS NULL THEN @StartDateTime WHEN @StartDateTime < @MinStartDateTime THEN @StartDateTime ELSE @MinStartDateTime END SET @MaxEndDateTime = CASE WHEN @MaxEndDateTime IS NULL THEN @EndDateTime WHEN @EndDateTime > @MaxEndDateTime THEN @EndDateTime ELSE @MaxEndDateTime END END END END SET @TotalHours = ABS(DATEDIFF(hour, @MinStartDateTime, @MaxEndDateTime)) INSERT @Result ( DatasetId ,ServerName ,DatabaseName ,DatasetName ,AggregationTypeId ,AggregationTypeName ,MaxDataAgeDays ,[RowCount] ,MinStartDateTime ,SizeKb ,DailyRowCount ,DailySizeKb ,DataFileGroupName ,IndexFileGroupName ) SELECT @DatasetId ,@ServerName ,@DatabaseName ,@DatasetName ,@AggregationTypeId ,@AggregationTypeName ,@MaxDataAgeDays ,@RowCount ,@MinStartDateTime ,@SizeKb ,ROUND(CASE WHEN @TotalHours > 0 THEN @RowCountForDailyAvg / CAST(@TotalHours AS float) * 24.0 ELSE NULL END, 0) ,CASE WHEN @TotalHours > 0 THEN @SizeKbForDailyAvg / CAST(@TotalHours AS float) * 24.0 ELSE NULL END ,ISNULL(@DataFileGroupName, 'default') ,ISNULL(@IndexFileGroupName, 'default') END END IF EXISTS (SELECT * FROM sys.objects WHERE name = 'MaintenanceSetting') BEGIN DELETE @TableSize SET @DomainTableRowId = 0 WHILE EXISTS (SELECT * FROM DomainTable WHERE (DomainTableRowId > @DomainTableRowId) ) BEGIN SELECT TOP 1 @DomainTableRowId = DomainTableRowId ,@TableName = QUOTENAME(SchemaName) + '.' + QUOTENAME(TableName) FROM DomainTable WHERE (DomainTableRowId > @DomainTableRowId) ORDER BY DomainTableRowId INSERT @TableSize (TableName, [RowCount], Reserved, Data, IndexSize, Unused) EXEC sp_spaceused @TableName END INSERT @Result ( DatasetId ,ServerName ,DatabaseName ,DatasetName ,AggregationTypeId ,AggregationTypeName ,MaxDataAgeDays ,SizeKb ,DataFileGroupName ,IndexFileGroupName ) SELECT '00000000-0000-0000-0000-000000000000' ,ServerName ,DatabaseName ,'Configuration dataset' ,at.AggregationTypeId ,at.AggregationTypeDefaultName ,CASE WHEN ms.InstanceMaxAgeDays > ms.ManagementPackMaxAgeDays THEN ms.ManagementPackMaxAgeDays ELSE ms.InstanceMaxAgeDays END ,ISNULL((SELECT SUM(CAST(REPLACE(REPLACE(Reserved, 'KB', ''), ' ', '') as bigint)) FROM @TableSize), 0) ,'default' ,'default' FROM vMemberDatabase mdb CROSS JOIN vAggregationType at CROSS JOIN MaintenanceSetting ms WHERE (mdb.MasterDatabaseInd = 1) AND (at.AggregationTypeId = 0) END UPDATE @Result SET TotalSizeKb = DailySizeKb * MaxDataAgeDays ,TotalRowCount = DailyRowCount * MaxDataAgeDays SELECT DatasetName ,AggregationTypeName ,MaxDataAgeDays ,SizeGB = ROUND((CAST(SizeKb AS float) / 1000000.00),2) ,PercentOfDW = ROUND((CAST(SizeKb AS float) / (SELECT SUM(SizeKb) FROM @Result) * 100),2) FROM @Result ORDER BY PercentOfDW DESC --END OF QUERY

15 Comments

    • Adam

      The command should looks as follow:
      dwdatarp.exe -s DBServerName -d OperationsManagerDW -ds Performance -a Hourly -m 90
      and so on for the remaining Dataset names.

  1. Nate B

    Do you have the source code for this EXE? I don’t think it is good in this day in age to blindly run .exe in a production environment. Possibly if someone could convert it to a PowerShell script that would be good too.

  2. Pingback:SCOM DataWarehouse Grooming Settings is now Super Easy with new GUI tool | SCOM Maintenance Mode Scheduler

  3. Kester Stoner

    SCOM 2019 – I have reduced Event Retention in the DW from 100 to 10. After grooming, both automatically and manually i can still see events that go back to 110 days in the DW. Anything i should look after or missing ? Thanks

  4. charlez

    The business is requesting to have raw performance data over a much longer period then 10 days, we are speaking about 6 months to a year at least. Since keeping this data in the dw is a no go, i was wondering what the possibilities are to facilitate this? Are there products available that offload this data into an other db? Or is extracting the data with powershell and dump into other dp the only way to go?

    thx

    • Kevin Holman

      I would first consider Azure Log Analytics.

      Why do you say keeping this much RAW data in the DW is a no-go? I have had customers extend RAW data retention before. It just consumes a huge amount of space.

      Does the “business” really understand what they are asking? Do they understand the value of keeping Hourly aggregation data?

      I have also had customers offload RAW data to other systems better equipped at long term storage and reporting, like Splunk.

      • Charlez

        I was thinking that it was a no go because of the performance of the DW.
        The business does not want to understand, hourly aggregation data is not enough detailed for them.
        I’ll take a look at Azure Log Analytics, that fits the most in our environment.

        Thanks for the tip!

    • Alexandru Jamy Craciun

      select * from InternalJobHistory order by InternalJobHistoryId desc
      With this you can see the Dates and hours of past grooming and partitioning.

      You can also use this to see if there is any old data , and compare the values to what you have set up with the max days, if it’s smaller then it’s ok.

      –To view the number of days of total data of each type in the DW:
      SELECT DATEDIFF(d, MIN(DWCreatedDateTime), GETDATE()) AS [Current] FROM Alert.vAlert
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Event.vEvent
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfRaw
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfHourly
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfDaily
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateRaw
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateHourly
      SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateDaily

      • Blake Drumm

        You may also run this query to get everything in one table output:

        select min(DWLastModifiedDateTime) as MinDate, max(DWLastModifiedDateTime) as MaxDate, datediff(d,min(DWLastModifiedDateTime),max(DWLastModifiedDateTime)) AS DaysOfData, ‘Alert’ AS Type from Alert.vAlert
        UNION ALL
        select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS DaysOfData, ‘Event’ AS Type from Event.vEvent
        UNION ALL
        select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS DaysOfData, ‘PerfHourly’ AS Type from Perf.vPerfHourly
        UNION ALL
        select min(datetime) as MinDate, max(datetime) as MaxDate, datediff(d,min(datetime),max(datetime)) AS DaysOfData, ‘StateHourly’ AS Type from State.vStateHourly

  5. Deepak

    Hi,

    I have changed the MaxDataAgeDays to 50 from 100 for Event Data Set.

    After that I see that the Size of the mdf file has stopped growing.

    However, when I run the query above mentioned for Viewing all the dataset data and consumption percentages, I see certain change in percentage and overall size of mdf fine remains the same.

    Kindly suggest how can the mdf file size be reduced as we have only 40GB of space left on drive. unable to release unused space as well.

  6. Pingback:Coffee Break: Deep dive into a SCOM Administrator's toolbox - SCOMathon

  7. Joel

    Hello,

    Be cautious about timing on running the grooming and faithful with backups. I ran the T-SQL commands successfully but there was an issue with logging into the SCOM console for over an hour after the manual grooming was completed. Behavior was that the console was slow on trying to load the folders , eventually displaying “ready” but the display window and right hand pane were blank! I’m running an AlwaysOn cluster for this environment. I doublechecked the output against another environment and everything looks fine.

Leave a Reply

Your email address will not be published.