Reducing Noise in Error Logs / Removing PS Errors

on Monday, January 13, 2020

I have a nightly scheduled job which will send out a notification email if the job has an error occur anywhere within it (even when the error is handled). This job infrequently sends out the error email. However, my long history of reviewing these emails has brought me to the point where I assume the error is always:

  • There was a file lock on file X when the file was being save; the function detected the error, waited a brief time period for the lock to clear and then retried the save operation successfully.

I can't remember a time when that wasn't the case. Because of this, I am finding myself less interested in actually reading the error message and desiring to simply ignore the email. But, I know that is going to lead to a situation where something unexpected will happen and I'll ignore the warning emails. Which would be a failure of the entire warning system.

So, what I have is a very narrowly defined and well known case of when the exception occurs, and I have a desire to ignore it. If I setup the code to simply suppress this error after the save operation successfully completes, then I should be able to safely reduce the amount of noise in the error messages that are sent to me. (It should still report the error if the retries never complete successfully)

This is a very common scenario: Teams setup a warning mechanism that is highly effective when a system is first built. At that time, there are a myriad of possible unforeseen errors that could occur. There also hasn’t been enough operational history to feel that the system is stable, so being notified on every potential problem is still a welcome learning experience. As those problems are reduced or eliminated it builds trust in the new system. However, it’s also very common that once a team completes a project and does a moderate amount of post deployment bug fixes, they are asked to move on and prioritize a new project. Which gives no devoted / allocated time to maintaining small and inconsistent issues that arise in the previous project(s).

Unfortunately, the side effect of not giving the time needed to maintain and pay down the technical debt on the older projects is that you can become used to “little” problems that can occur on them; including ignoring the warning messages that they send out. And this creates an effect where you can start to distrust that the warning messages coming from a system are important, because you believe that you know the warning is “little” or “no big deal”.

The best way to instill confidence in the warning and error messages produced by a system is to ensure that the systems only send out important messages, separating the Signal from the Noise.

For my scenario above, the way I’m going to do this is to prevent these handled errors from sending out notification emails. This goes against best practices because I will need to alter the global error monitor in Powershell, $global:Error. But, given that my end goal is to ensure that I only receive important error messages, this seems like an appropriate time to go against best practices.

Below is a snippet of code which can be used to remove error records from $global:Error that fit a given criteria. It will only remove the most recent entries of that error, in order to try and keep the historical error log intact.

You need to be careful with this. If the error you’re looking for occurs within a loop with a retry policy on it, then you need to keep the errors which continued to fail beyond the retry policy, and only remove future errors when the retry policy succeeded. You can better handle the retry policy situation by using the –Last 1 parameter.

<#
.SYNOPSIS
Removes errors from $global:Error which match the given type or message. The errors will be removed
from the end of the $global:Error "queue" (most recently occurred), back towards the front of
the queue (the oldest errors). If an element in the queue does not match the given type or
message, then it will stop removing errors from the queue. This also means, that if the most
recent error doesn't match the type or message given, then no errors will be removed. This will
search inner exceptions for matches by default.
.PARAMETER All
Clears all occurrences of the error record instead of just the last ones
#>
function Clear-LatestGlobalErrors {
param(
[string] $Type = [string]::Empty,
[string] $Message = [string]::Empty,
[switch] $IgnoreInnerExceptions,
[switch] $All,
[int] $Last = -1
)
if([string]::IsNullOrWhiteSpace($Type) -and [string]::IsNullOrWhiteSpace($Message)) {
throw (
"Either parameter Type or Message are required, please specify one of them for usage. ",
"If you would like to clear all global errors, use `$global:Error.Clear() instead."
) -join ""
}
$hasType = -not [string]::IsNullOrWhiteSpace($Type)
$hasMessage = -not [string]::IsNullOrWhiteSpace($Message)
$hasLastCount = $Last -gt 0
# regexify the message
$MessageRegex = $Message
if($hasMessage) {
$MessageRegex = $MessageRegex.Replace(".", "\.").Replace("*", ".*");
}
$count = $global:Error.Count
if($count -le 0) {
# there are no errors, stop early
return;
}
# create a copy of the current error arraylist, to ensure it doesn't change size/shape
$globalErrorClone = $global:Error.Clone()
# search and remove errors
$i = 0
$removeMatch = $false
$toRemove = @()
do {
$removeMatch = $false;
<#
In VSCode debugging, using $global:Error may cause a bunch of
CmdletInvocationExceptions, with a message of
'The scope number 'X' exceeds the number of active scopes.'
These values can be safely ignored.
The errors that we are really interested in all have Exceptions
associated with them
#>
$errorRecord = $null
$testi = $i
do {
$errorRecord = $globalErrorClone[$testi]
if($null -eq $errorRecord.Exception) { $testi++ }
} while($null -eq $errorRecord.Exception -and ($testi -le $globalErrorClone.Count))
$i = $testi
$current = $errorRecord.Exception
if($null -eq $current) {
# Related to the CmdletInvocationException note above;
# when the error record has no Exception associated with it, we will break the loop
# however, if All messages are supposed to be cleared out, then continue the loop
if(-not $All) {
break;
} else {
continue;
}
}
do {
if($hasType) {
$etype = $current.GetType()
if( $etype.Name -eq $Type `
-or $etype.FullName -eq $Type
) {
$removeMatch = $true
}
}
if($hasMessage) {
if($current.Message -match $MessageRegex) {
$removeMatch = $true
}
}
$current = $current.InnerException
} while( # stop when you run out of inner exceptions or you find a match
(-not $IgnoreInnerExceptions -and $null -ne $current) `
-and $removeMatch -eq $false
);
if($removeMatch) {
$toRemove += @($errorRecord)
}
$i++
} while( # stop when you no longer find matches or run out of errors to search through
# if all matching error records are to be removed, only stop when you
# run out of errors to search through
# it can also stop if we've reached the maximum number of entries they wanted to remove ($Last)
($removeMatch -or $All) `
-and (-not $hasLastCount -or $Last -lt $toRemove.Count) `
-and ($i -lt $globalErrorClone.Count)
);
# finally, remove the entries
foreach($r in $toRemove) {
$global:Error.Remove($r)
}
}
# http://stackoverflow.com/questions/1183183/path-of-currently-executing-powershell-script
$root = Split-Path $MyInvocation.MyCommand.Path -Parent;
if($global:CoreUcsbInvokePester -eq $null -or $global:CoreUcsbInvokePester -eq $false) {
Import-Module CoreUcsb -Force
Import-Module Pester
}
Describe -Tag "Unit","Public" -Name "Clear-LatestGlobalErrors" {
function Get-ExceptionTypeCount {
param([string] $Type)
$matching =
$global:Error |? {
if($null -eq $_.Exception) { return $false }
if($null -eq $_.Exception.InnerException) { return $false }
return $_.Exception.InnerException.GetType().Name -eq $Type
}
return $matching.Count
}
function Get-ExceptionMessageCount {
param([string] $Message)
$matching =
$global:Error |? {
if($null -eq $_.Exception) { return $false }
if($_.Exception.Message -match $Message) { return $true }
if($null -eq $_.Exception.InnerException) { return $false }
return $_.Exception.InnerException.Message -match $Message
}
return $matching.Count
}
It "Prompts user to use `$global:Error.Clear()" {
$global:Error.Clear()
{ Clear-LatestGlobalErrors } |
Should -Throw "Error.Clear"
}
It "Clears the last error by type (DivideByZeroException)" {
$global:Error.Clear()
try { 1 / 0 } catch { <# surpress #> }
(Get-ExceptionTypeCount -Type "DivideByZeroException") | Should -Be 1
Clear-LatestGlobalErrors -Type DivideByZeroException
(Get-ExceptionTypeCount -Type "DivideByZeroException") | Should -Be 0
}
It "Only clears the last error if it really was the last error (DivideByZeroException)" {
$global:Error.Clear()
try { 1 / 0 } catch { <# surpress #> }
try { throw "something something" } catch { <# surpress #> }
(Get-ExceptionTypeCount -Type "DivideByZeroException") | Should -Be 1
Clear-LatestGlobalErrors -Type DivideByZeroException
(Get-ExceptionTypeCount -Type "DivideByZeroException") | Should -Be 1
}
It "Clears the last error by message (DivideByZeroException)" {
$global:Error.Clear()
try { 1 / 0 } catch { <# surpress #> }
$ex.InnerException.GetType().Name | Should -Be DivideByZeroException
(Get-ExceptionMessageCount -Message "Attempted to divide by zero.") | Should -Be 1
Clear-LatestGlobalErrors -Message "Attempted to divide by zero."
(Get-ExceptionMessageCount -Message "Attempted to divide by zero.") | Should -Be 0
}
It "Clears powershell thrown errors (`"something something`")" {
$global:Error.Clear()
try { throw "something something" } catch { <# surpress #> }
(Get-ExceptionMessageCount -Message "something something") | Should -Be 1
Clear-LatestGlobalErrors -Message "something something"
(Get-ExceptionMessageCount -Message "something something") | Should -Be 0
}
It "Clears all of an exception message (`"something something`")" {
$global:Error.Clear()
try { throw "something something" } catch { <# surpress #> }
try { 1 / 0 } catch { <# surpress #> }
try { throw "something something" } catch { <# surpress #> }
(Get-ExceptionMessageCount -Message "something something") | Should -Be 2
Clear-LatestGlobalErrors -Message "something something" -All
(Get-ExceptionMessageCount -Message "something something") | Should -Be 0
}
It "Clears Last 1 of an exception message (`"something something`")" {
$global:Error.Clear()
try { throw "something something" } catch { <# surpress #> }
try { 1 / 0 } catch { <# surpress #> }
try { throw "something something" } catch { <# surpress #> }
(Get-ExceptionMessageCount -Message "something something") | Should -Be 2
Clear-LatestGlobalErrors -Message "something something" -Last 1
(Get-ExceptionMessageCount -Message "something something") | Should -Be 1
(Get-ExceptionTypeCount -Type DivideByZeroException) | Should -Be 1
}
}

0 comments:

Post a Comment


Creative Commons License
This site uses Alex Gorbatchev's SyntaxHighlighter, and hosted by herdingcode.com's Jon Galloway.