Menu Close

Monitoring for Time Drift in your enterprise


image

 

Time sync is critical in today’s networks.  Experiencing time drift across devices can cause authentication breakdowns, reporting miscalculations, and wreak havoc on interconnected systems.  This article shows a demo management pack to monitor for time sync across your Windows devices.

The basic idea was – to monitor all systems and compare their local time, against a target reference time server, using W32Time.  Here is the command from the PowerShell:

$cmd = w32tm /stripchart /computer:$RefServer /dataonly /samples:$Samples

The script will take two parameters, the reference server and the threshold for how much time drift is allowed.

Here is the PowerShell script:

#================================================================================= # Time Skew Monitoring Script # Kevin Holman # Version 1.0 #================================================================================= param([string]$RefServer,[int]$Threshold) #================================================================================= # Constants section - modify stuff here: # Assign script name variable for use in event logging $ScriptName = "Demo.TimeDrift.PA.ps1" # Set samples to the number of w32time samples you wish to include [int]$Samples = '1' # For testing - assign values instead of paramtersto the script #[string]$RefServer = 'dc1.opsmgr.net' #[int]$Threshold = '10' #================================================================================= # Gather script start time $StartTime = Get-Date # Gather who the script is running as $WhoAmI = whoami # Load MomScript API and PropertyBag function $momapi = new-object -comObject 'MOM.ScriptAPI' $bag = $momapi.CreatePropertyBag() #Log script event that we are starting task $momapi.LogScriptEvent($ScriptName,9250,0, "Starting script") #Start MAIN body of script: #Getting the required data $cmd = w32tm /stripchart /computer:$RefServer /dataonly /samples:$Samples IF ($cmd -match 'error') { #Log error and quit $momapi.LogScriptEvent($ScriptName,9250,2, "Getting TimeDrift from Reference Server returned an error . Reference server is ($RefServer). Output of command is ($cmd)") exit } ELSE { #Assume we got good results from cmd $Skew = $cmd[-1..($Samples * -1)] | ConvertFrom-Csv -Header "Time","Skew" | Select -ExpandProperty Skew $Result = $Skew | % { $_ -replace "s","" } | Measure-Object -Average | select -ExpandProperty Average } #The problem is that you can have time skew in two directions: positive or negative. You can do two #things: create an IF statement that does check both or just create a positive number. IF ($Result -lt 0) { $Result = $Result * -1 } $TimeDriftSeconds = [math]::Round($Result,2) #Determine if the average time skew is higher than your threshold and report this back to SCOM. IF ($TimeDriftSeconds -gt $Threshold) { $bag.AddValue("TimeSkew","True") $momapi.LogScriptEvent($ScriptName,9250,2, "Time Drift was detected. Reference server is ($RefServer). Threshold is ($Threshold) seconds. Value is ($TimeDriftSeconds) seconds") } ELSE { $bag.AddValue("TimeSkew","False") #Log good event for testing #$momapi.LogScriptEvent($ScriptName,9250,0, "Time Drift was OK. Reference server is ($RefServer). Threshold is ($Threshold) seconds. Value is ($TimeDriftSeconds) seconds") } #Add stuff into the propertybag $bag.AddValue("RefServer",$RefServer) $bag.AddValue("Threshold",$Threshold) $bag.AddValue("TimeDriftSeconds",$TimeDriftSeconds) #Log an event for script ending and total execution time. $EndTime = Get-Date $ScriptTime = ($EndTime - $StartTime).TotalSeconds $ScriptTime = [math]::Round($ScriptTime,2) $momapi.LogScriptEvent($ScriptName,9250,0,"`n Script has completed. `n Reference server is ($RefServer). `n Threshold is ($Threshold) seconds. `n Value is ($TimeDriftSeconds) seconds. `n Runtime was ($ScriptTime) seconds.") #Output the propertybag $bag

 

Next, we will put the script into a Probe action, which will be called by a Datasource with a scheduler.  The reason we want to break this out, is because we want to “share” this datasource between a monitor and rule.  The monitor will monitor for the time skew, while the rule will collect the skew as a perf counter, so we can monitor for trends in the environment.

 

So the key components of the MP are the DS, the PA (containing the script), the MonitorType and the Monitor, the Perf collection rule, and some views to show this off:

 

image

 

When a threshold is breached, the monitor raises an alert:

image

 

The performance view will show you the trending across your systems:

image

 

On the monitor (and rule) you can modify the reference server:

image

 

One VERY IMPORTANT concept – if you change anything – you must make identical overrides on BOTH the monitor and the rule, otherwise you will break cookdown, and result in the script running twice for each interval.  So be sure to set the IntervalSeconds, RefServer, and Threshold the same on both the monitor and the rule.  If you want the monitor to run much more frequently than the default once an hour, that’s fine, but you might not want the perf data collected more than once per hour, so while that will break cookdown, it only breaks once per hour, which is probably less of an impact than overcollecting performance data.

From here, you could add in a recovery to force a resync of w32time if you wanted, or add in additional alert rules for w32time events.

 

The example MP is available here:

https://github.com/thekevinholman/MonitorTimeDrift


29 Comments

    • Kevin Holman

      That kind of defeats the design of time sync. You use an authoritative source and you sync with it. The MP allows this to be overridden for agents who need a different authoritative source.

  1. Naman

    Hey Kevin,
    What if I want to make sure that the PDC itself is not out of Sync with the external NTP. Can we achieve that by comparing the time of the DC with an external time source?

  2. Jay Gopinathan

    Hi Kevin, very helpful and I just implemented it. How can I make the threshold to 50ms? Also how can I run the monitor every 10 minutes while not impacting performance counters?

    Thanks
    Jay

  3. William

    Hi Kevin, I would like to download the Time Drift Management Pack from the technet and the technet was closed. Would you like to let me know where I can download the pack?

  4. Fabricio

    Thank You Kevin for all the information that you share about SCOM , I was working with SCOM 2007 R2 but now I have installed SCOM 2016 , the knowledge that you share with us , helps me to clarify all my doubts and concerns

  5. David Culebras Minguez

    Updated the script to use the current time provider configured on the server also retry if one sample gets error:

    #=================================================================================
    # Time Skew Monitoring Script
    # Kevin Holman
    # Version 1.0
    #=================================================================================
    param([string]$RefServer,[int]$Threshold)

    #=================================================================================
    # Constants section – modify stuff here:

    # Assign script name variable for use in event logging
    $ScriptName = “Windows.TimeDrift.PA.ps1″
    # Set samples to the number of w32time samples you wish to include
    #[int]$Samples = ‘1’
    # For testing – assign values instead of paramtersto the script
    #[string]$RefServer = ‘dc1.opsmgr.net’
    #[int]$Threshold = ’10’
    #=================================================================================

    [int]$Samples = ‘3’

    #Get current NTP Source of the server
    $RefServer = w32tm /query /source
    $RefServer = $RefServer.ToString().Replace(” “,””)
    if($RefServer -like “time.wind*”)
    {
    $RefServer= $RefServer.Split(“,”)[0]
    }

    #If server dont have source force to your main ntp server
    if($RefServer -like “*Free-runningSystemClock*”)
    {
    $RefServer = “YOURNTPSERVERHERE”
    $Threshold = 300
    }

    #if your ntp server is not resolving try time sync with time.windows.com
    if($RefServer -like “*YOURNTPSERVERHERE*”)
    {
    Try{
    $query = Resolve-DnsName $RefServer -QuickTimeout A -ErrorAction Stop
    }
    catch{
    $RefServer = “time.windows.com”
    $Threshold = 300
    }
    }

    # Gather script start time
    $StartTime = Get-Date

    # Gather who the script is running as
    $WhoAmI = whoami

    # Load MomScript API and PropertyBag function
    $momapi = new-object -comObject ‘MOM.ScriptAPI’
    $bag = $momapi.CreatePropertyBag()

    #Log script event that we are starting task
    $momapi.LogScriptEvent($ScriptName,9250,0, “Starting script”)

    #Start MAIN body of script:

    #Getting the required data
    $cmd = w32tm /stripchart /computer:$RefServer /dataonly /samples:$Samples

    #If you get an error on the first try, try it again
    IF ($cmd -match ‘0x800705B4’)
    {
    Sleep 2
    $cmd = w32tm /stripchart /computer:$RefServer /dataonly /samples:$Samples
    }

    IF ($cmd -match ‘error’)
    {
    #Log error and quit
    $momapi.LogScriptEvent($ScriptName,9250,2, “Getting TimeDrift from Reference Server returned an error . Reference server is ($RefServer). Output of command is ($cmd)”)
    exit
    }
    ELSE
    {
    #Assume we got good results from cmd
    $Skew = $cmd[-1..($Samples * -1)] | ConvertFrom-Csv -Header “Time”,”Skew” | Select -ExpandProperty Skew
    $Result = $Skew | % { $_ -replace “s”,”” } | Measure-Object -Average | select -ExpandProperty Average
    }

    #The problem is that you can have time skew in two directions: positive or negative. You can do two
    #things: create an IF statement that does check both or just create a positive number.
    IF ($Result -lt 0) { $Result = $Result * -1 }

    $TimeDriftSeconds = [math]::Round($Result,2)

    #Determine if the average time skew is higher than your threshold and report this back to SCOM.
    IF ($TimeDriftSeconds -gt $Threshold)
    {
    $bag.AddValue(“TimeSkew”,”True”)
    $momapi.LogScriptEvent($ScriptName,9250,2, “Time Drift was detected. Reference server is ($RefServer). Threshold is ($Threshold) seconds. Value is ($TimeDriftSeconds) seconds”)
    }
    ELSE
    {
    $bag.AddValue(“TimeSkew”,”False”)
    #Log good event for testing
    #$momapi.LogScriptEvent($ScriptName,9250,0, “Time Drift was OK. Reference server is ($RefServer). Threshold is ($Threshold) seconds. Value is ($TimeDriftSeconds) seconds”)
    }

    #Add stuff into the propertybag
    $bag.AddValue(“RefServer”,$RefServer)
    $bag.AddValue(“Threshold”,$Threshold)
    $bag.AddValue(“TimeDriftSeconds”,$TimeDriftSeconds)

    #Log an event for script ending and total execution time.
    $EndTime = Get-Date
    $ScriptTime = ($EndTime – $StartTime).TotalSeconds
    $ScriptTime = [math]::Round($ScriptTime,2)
    $momapi.LogScriptEvent($ScriptName,9250,0,”`n Script has completed. `n Reference server is ($RefServer). `n Threshold is ($Threshold) seconds. `n Value is ($TimeDriftSeconds) seconds. `n Runtime was ($ScriptTime) seconds.”)

    #Output the propertybag
    $bag

  6. Jaegermeiste

    Can multiple instances of this per counter be set up simultaneously? EG I have a scenario where I have an authoritative time server that two machines sync to, but one of those machines is also a time server, and I want to see the delta between the two on the third machine, while also confirming the drift on the third machine against the original time server

    • Kevin Holman

      Youd have to copy/paste the rule, or you’d have to re-write the script to accept multiple time sources and multiple outputs.

  7. Niels

    I’ve found an elegant solution (in my view), for a contingent reference to the timeserver :
    I replaced “dc1.opsmgr.net” with “$Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/DomainDnsName$”

    This resolves to the domain catch-all address which also calls the NTP server, in our case.

    so:

    3600
    $Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/DomainDnsName$
    200

    It doesn’t work when an agent is “WORKGOUP’ed”, however.

  8. Simha

    Hi Kevin.
    What I need to change in the script and in XML in case when there are servers from different domains with different DCs? Create a few Windows.TimeDrift MPs for each domain? How to do it? Help me please!

      • simha

        I’m creating 2 rules overrides for each type of OS (is it right?) and i’m selecting a group that i’ve created before. Now a question – in “Override properties” window in Select destination MP area no possibility to select another MP. Selected and greyed only MP=”Windows TimeDrift”. Is this one normally and all right? Thank you for help, for your site and time.

      • simha

        Ups. After “Apply” in Override properties windows i am getting error (maybe there is solution?):
        OpsMgr SDK Service error 23319 “an exception was thrown while processing TryUpdateManagementPackWithResources for session ID ….. Exception message: database error. MPInfra_p_ManagementPackInstall failed with exception:
        Database error. MPInfra_p_ManagementPackInstall failed with exception:
        Failed to validate item:
        Alias…..OverrideForRuleWindowsTimeDriftPerfCollectionRuleForContextUINameSpace……Group

        • Jason

          I’m seeing this same error and I’m not seeing all my groups when trying to override by groups. Is there a fix or update for this?

    • Tobias Redelberger

      I’ve added below to the Powershell Script to find its Root-PDC (which is by default also the forest’s time source in a NT5DS Hierarchy):

      [..]
      # Check if $RefServer is default dummy
      IF ($RefServer -eq “dc1.opsmgr.net”)
      {
      # Get ForestRootPDC and assign it as $RefServer instead
      $DomainFQDN = (Get-WmiObject -Namespace root\cimv2 -Class Win32_ComputerSystem | Select Domain).Domain
      $context = new-object System.DirectoryServices.ActiveDirectory.DirectoryContext(“Domain”,$DomainFQDN)
      $ForestRootDomainFQDN = (([System.DirectoryServices.ActiveDirectory.Domain]::GetDomain($context)).Forest).Name
      $context = new-object System.DirectoryServices.ActiveDirectory.DirectoryContext(“Forest”,$ForestRootDomainFQDN)
      $ForestRootPDC = ([System.DirectoryServices.ActiveDirectory.Forest]::GetForest($context)).RootDomain | %{$_.pdcRoleOwner.Name}
      $RefServer = $ForestRootPDC
      }
      [..]

  9. Randall Landes

    I’m receiving the below alerts, how can I correct this issue:
    Alert description: A script error occurred when measuring for Time Drift: Event Description: Windows.TimeDrift.PA.ps1 : Getting TimeDrift from Reference Server returned an error . Reference server is (0.pool.ntp.org). Output of command is (Tracking 0.pool.ntp.org [216.229.4.66:123]. Collecting 1 samples. The current time is 1/21/2023 7:02:38 AM. 07:02:38, error: 0x800705B4)

  10. Randall Landes

    I’m receiving the below error alert (did anyone else experience this):
    Alert description: A script error occurred when measuring for Time Drift: Event Description: Windows.TimeDrift.PA.ps1 : Getting TimeDrift from Reference Server returned an error . Reference server is (0.pool.ntp.org). Output of command is (Tracking 0.pool.ntp.org [216.229.4.66:123]. Collecting 1 samples. The current time is 1/21/2023 7:02:38 AM. 07:02:38, error: 0x800705B4)

    • Paul Ramagost

      Under Authoring > Mangement Pack Objects > Rules (‘Windows Server Operating System’ management pack scope) right-click the ‘Time Drift Monitoring Script had an error’ rule then select Properties. On the Configuration tab click the Edit button for Data sources. On the Expression tab click the Insert button.

      -Parameter Name = EventDescription
      -Operator = Does not contain
      -Value = 0X800705B4

      Save your edits.

      • Mark Ronsman

        I too am getting a 0X800705B4 error for just about every server we have. I did exclude those events, like you suggested, but what does that error message mean? Is the management pack/monitor still working if that error keeps happening?

        • Mark Ronsman

          Full error message for the above post:

          Time Drift Script Error Rule, Description: A script error occurred when measuring for Time Drift: Event Description: Windows.TimeDrift.PA.ps1 : Getting TimeDrift from Reference Server returned an error . Reference server is (servername.company.com). Output of command is (Tracking servername2.company.com [10.84.170.42:123]. Collecting 1 samples. The current time is 5/13/2024 1:00:37 PM. 13:00:37, error: 0x800705B4)

  11. Tobias H

    Hello everyone,

    Does anyone know how to reset the health state of a monitor, or monitors in general, when dealing with a time jump?

    I have a red event in the future, and every time I reset the health, I create a green reset event in the Health Explorer from today, but the red event remains at the top…

    I even disabled the monitor via an override. I see the info ‘the monitor has been disabled or removed’ in the Health Explorer, but the red event remains at the top, and the health is still bad.

Leave a Reply

Your email address will not be published.