How to Crash Exchange Using IIS Healthchecks

on Saturday, September 23, 2017

So, I had a bad week. I crashed a multiple server, redundant, highly available Exchange Server setup using the IIS Healthchecks of a single website in Dev and Test (not even Prod).

How did I do this? Well …

  • Start with a website that is only in Dev & Test; and hasn’t moved to Prod.
    • All of the database objects are only in Dev & Test.
  • Do a database refresh from Prod and overlay Dev & Test.
    • The database refresh takes 2 hours; but the next 17 hours is a period where the Dev & Test environments don’t have the database objects available to them, because those objects weren’t a part of the refresh.
  • So, now you have 19 hours of a single website being unable to properly make a database call.
  • Why wasn’t anyone notified? Well, that’s all on me. It was the Dev & Test version of the website, and I was ignoring those error messages (those many, many error messages).
  • Those error messages were from ELMAH. If you use ASP.NET and don’t know ELMAH; then please learn about it, it’s amazing!
    • In this case, I was using ELMAH with WebAPI, so I was using the Elmah.Contrib.WebAPI package. I’m not singling them out as a problem, I just want to spread the word that WebAPI applications need to use this package to get error reporting.
  • Finally, you have the IIS WebFarm Healthcheck system.
    • The IIS WebFarm healthcheck system is meant to help a WebFarm route requests to healthy application servers behind a proxy. If a single server is having a problem, then requests are no longer routed to it and only the healthy servers are sent requests to process. It’s a really good idea.
    • Unfortunately, … (You know what? … I’ll get back to this below)
    • Our proxy servers have around 215 web app pools.
    • The way IIS healthchecks are implemented, every one of those web app pools will run the healthchecks on every web farm. So, this one single application gets 215 healthchecks every 30 seconds (the default healthcheck interval).
    • That’s 2 healthchecks per minute, by 215 application pools …
    • Or 430 healthchecks per minute … per server
    • Times 3 servers (1 Dev & 2 Test Application Servers) … 1290 healthchecks per minute
    • Times 60 per hour, times 19 hours … 1,470,600 healthchecks in 19 hours.
  • Every one of the 1,470,600 healthchecks produced an error, and ELMAH diligently reported every one of those errors. (First email type)
  • Now for Exchange
    • Even if we didn’t have a multi-server, redundant, highly available Exchange server, 1.5 million emails would have probably crashed it.
    • But, things got crazier because we have a multiple server, redundant, highly available setup.
    • So, the error emails went to a single recipient, me.
    • And, eventually my Inbox filled up (6 GBs limit on my Inbox), which started to produce response emails saying “This Inbox is Full”. (Second email type)
    • Well … those response emails went back to the sender … which was a fake email address I used for the website (it’s never supposed to be responded to).
    • Unfortunately, that fake email address has an the domain as my account (; which sent all the responses back to the same Exchange server.
    • Those “Inbox is Full” error messages then triggered Exchange to send back messages that said “This email address doesn’t exist”. (Third email type)
    • I’m not exactly sure about how this happened, but there was a number of retry attempts on the [First Email Type] which again re-triggered the Second and Third email type. I call the retrys the (Fourth email type).
    • Once all of the error messages get factored into the equation, the 1.5 million healthcheck emails generated out 4.5 million healthcheck and smtp error emails.
    • Way before we hit the 4.5 million mark, our Exchange server filled up …
      • It’s database
      • The disk on the actual Exchange servers

So, I don’t really understand Exchange too well. I’m trying to understand this diagram a little better. One thing that continues to puzzle me is the why the Exchange server sent out error emails to “itself”. (My email address is and the ELMAH emails were from … so the error emails were sent to, which that Exchange server owns). Or does it …

  • So, from the diagram, consultation, and my limited understanding … our configuration is this:
    • We have a front end email firewall that owns the MX record (DNS routing address) for
      • The front end email firewall is supposed to handle external email DDOS attacks and ridiculous spam emails.
    • We have an internal Client Access Server / Hub Transport Server which takes in the ELMAH emails from our applications and routes them into the Exchange Servers.
    • We have 2 Exchange servers with 2 Databases behind them, which our email inboxes are split across.
    • So, the flow might be (again, I don’t have this pinned down)
      • The application sent the error email to the Client Access Server
      • The Client Access Server queued the error email and determined which Exchange server to process it with (let’s say Exchange1)
      • Exchange1 found that the mailbox was full and using SMTP protocols it needed to send an “Inbox is full error message”. Exchange1 looked up the MX record of where to send and found that it needed to send it to the Email Firewall. It sent it ..
      • The Email Firewall then found that wasn’t an actual address and maybe sent it to Exchange2 for processing?
      • Exchange2 found it was a fake address and sent back a “This address doesn’t exist email”, which went back to the Email Firewall.
      • The Email Firewall forwarded the email or dropped it?
      • And, somewhere in all this mess, the emails that couldn’t be delivered to my real address because my “Inbox was full” got put into a retry queue … in case my inbox cleared up. And, this helped generate more “Inbox is full” and “This address doesn’t exist” emails.
  • Sidenote: I said above “One thing that continues to puzzle me is the why the Exchange server sent out error emails to “itself”. ”
    • I kinda get it. Exchange does an MX lookup for and finds the Email Firewall as the IP address, which isn’t itself. But …
    • Shouldn’t Exchange know that it owns Why does it need to send the error email?

So … this biggest problem in this whole equation is me. I knew that IIS had this healthcheck problem before hand. And, I had even created a support ticket with Microsoft to get it fixed (which they say has been escalated to the Product Group … but nothing has happened for months).

I knew of the problem, I implemented ELMAH, and I completely forgot that the database refresh would wipe out the db objects which the applications would need.

Of course, we/I’ve now gone about implementing fixes, but I want to dig into this IIS Healthcheck issue a little more. Here’s how it works.

  • IIS has a feature called ARR (Application Request Routing)
    • It’s used all the time in Azure. You may have setup a Web App, which requires an “App Service”. The App Service is actually a proxy server that sits in front of your Web App. The proxy server uses ARR to route the requests to your Web App. But, in Azure they literally create a single proxy server for your single web application server. If you want to scale up and “move the slider”, more application servers are created behind the proxy. BUT, in Azure, the number of Web Apps that can sit behind a App Service/Proxy Service is very limited (less than 5). <rant>No where in the IIS documentation do they tell you to limit yourself to 5 applications; and the “/Build conference” videos from the IIS team make you believe that IIS is meant to handle hundreds of websites. </rant>
  • We use ARR to route requests for all our custom made websites (~215) to the application servers behind our proxy.
  • ARR uses webfarms to determine where to route requests. The purpose of the webfarms is have multiple backend Application Servers; which handle load balancing.
  • The webfarms have a Healthcheck feature, which allows the web farms to check if the application servers behind the proxy are Healthy. If one of the application servers isn’t healthy then it’s taken out of the pool until it’s healthy again.
    • I really like this feature and it makes a lot of sense.
    • So, every application pool that runs on the frontend proxy server, loads the entire list of webfarms into memory.
    • If any of those webfarms happens to have a healthcheck url, then that application pool will consider itself the responsible party to check that healthcheck url.
    • So, if a healthcheck url has a healthcheck interval of 30 seconds …
    • And a proxy server has 215 application pools on it; then that is 215 healthchecks every 30 seconds.

I think the design of the Healthcheck feature is great. But, the IMPLEMENTATION is flawed. HEALTHCHECKS ARE NOT DESIGNED THE WAY THEY ARE IMPLEMENTED.

Of course I’ve worked on other ways to prevent this problem in the future. But, IIS NEEDS TO FIX THE WAY HEALTHCHECKS ARE IMPLEMENTED.

I get bothered when people complain without a solution, so here’s the solution I propose:

  • Create a new xmlnode in the <webfarm> section of applicationHost.config which directly links webfarms to application pools.
  • Example (sorry, I’m having a lot of problem getting code snippets to work in this version of my LiveWriter)
<webfarm enabled="true" name="">
  <applicationpool name="" />
  <server enabled="true" address="" />
    <protocol reverserewritehostinresponseheaders="false" timeout="00:00:30">
      <cache enabled="false" querystringhandling="Accept" />
    <affinity cookiename="" usecookie="true"/>
    <loadbalancing algorithm="WeightedRoundRobin" />

Healthchecks Should Not Be Pings

on Saturday, December 17, 2016

I had a long held belief that health checks should just be pings. “Is the websites up?” And for years, that was right. Not anymore.

Recently, a developer asked me if he should use health checks to ensure that the Entity Framework Cache stays in memory? It took me a while disassociate health checks from pings, but he was right. YES, you should use health checks to ensure the health of your site.

You should use health checks to do this:

  • Ensure your site is up and running (ping)
  • Ensure all cached values are available and, if possible, at the latest value.
  • Ensure Entity Framework’s cache is hit before your first user
    • EF is a total hog of resources and complete slowdown on first hit
  • Same thing for WCF
  • Cache any application specific values needed before first hit

Health checks should not be pings. They should check the entire health of the site and its responsiveness. It should check the cache, it’s database connectivity, and everything that makes a website work. It’s a “health check” not a ping.

Tyk in Docker on Windows 10

on Sunday, October 16, 2016

I’m very new to all this technology so, please take this with a grain of salt. The reason I’m writing it is because I couldn’t find another guide that had end-to-end setup on Tyk in Docker on Windows 10.

Tyk is an API Gateway product that can be used to help manage a centralized location of many services/micro services. It is a product which is built on top of the nginx web server. And, nginx is really only supported as a “server” product on *nix based systems. Their Windows build is considered a beta.

So, there are already some good guides for each of the next steps, I’m just gonna pull them all together, and add one extra piece at the end.

Install Docker

There are a couple ways to get around the limitation of nginx only being “production ready on *nix”, but I choose to try out Tyk on Docker. Docker is the multiplatform container host that has created a lot of buzz within the cloud space. But, it also seems pretty awesome at setting up small containers on your local machine too.

Note: At this time, 2016-10-16, if you download a Docker for Windows installer, use the Beta Channel. The stable channel has a bug when trying to mount volumes into containers.

The docker installation wizard is pretty straight forward, so no worries there. Once, installed right-click on the Docker systray icon and select Open Kitematic …


A pop-up window should come up containing instructions on how to download and install Kitematic. It was amazingly simple and gave a nice GUI interface over the command line.

Follow Tyk’s Installer Instructions

Tyk provides instructions to setup the API gateway & dashboard with Docker on their website. I would suggest getting an account at Docker Hub. I don’t remember when in the process I created one, but I needed it to access … something.

In Step 2. Get the quick start compose files you’ll need to git clone the files to an folder under you C:\Users\XXXX folder. For me, Docker had a permissions restriction that only allowed containers to mount volumes from folders under my user folder. (So, that could be interesting if you run a container on a server under a service account.)

The silver lining about this set of containers is that they only need to use config files from your local drive. So, it’s not like your C:\Users folder is going to store a database.

In Step 4. Bootstrap your dashboard and portal, if you have bash available to you I would suggest trying it when you run ./ I haven’t installed Win10 Anniversary Update, Git Bash, or Cygwin so I didn’t have bash available to run

However, I do feel somewhat comfortable in powershell, and the script didn’t look too long. Below is the powershell conversion, which you should be saved in the same directory as, and you should run .\setup.ps1 from the PowerShell ISE with the arguments that you want.

After that, I had a running Tyk API Gateway.


Other Thoughts

Since this was all new technology I ran into a lot of errors and read through a lot of issue/forum posts. Which makes me think this might not be the best idea for a production setup. If you’re able to make linux servers within your production environment, I would strongly suggest that.

Because I made so many mistakes I got used to these three commands which really helped recreate the environment whenever I messed things up. I hope this helps.


on Friday, April 1, 2016

In a previous post I forgot to include the PowerShell code for Get-FullDomainAccount. Sorry about that.

Here it is:

$env:USERDOMAIN = "<your domain>"
	Ensures that the given domain account also has the domain prefix. For example,
	if the -DomainAccount is "IUSR_AbcXyz" the "<your domain>\IUSR_AbcXyz" would most likely
	be returned. The domain is pulled from the current users domain, $env:USERDOMAIN.

	If -Environment is provided, this will also run the -DomainAccount through
	Get-EnvironmentDomainAccount to replace any environment specific information.

		Used to apply environment specific value to the domain account

    $result = Get-FullDomainAccount -DomainAccount "IUSR_AbcXyz"
    $result -eq "<your domain>\IUSR_AbcXyz"
Function Get-FullDomainAccount {
Param (
	[string] $DomainAccount,
	[string ]$Environment = ""
	$accountName = $DomainAccount;

	if($Environment -ne "") {
        $accountName = Get-EnvironmentDomainAccount -Environment $Environment -DomainAccount $DomainAccount;

    if($accountName -match "ApplicationPoolIdentity") {
        $accountName = "IIS AppPool\$accountName"

    if($accountName -match "LocalSystem") {
        $accountName = "$($env:COMPUTERNAME)\$accountName"

	if($accountName -notmatch "\\") {
		$accountName = $env:USERDOMAIN + "\" + $accountName;
	return $accountName;

WebAdministration Not Loaded Correctly on Remote

on Friday, November 14, 2014

When making remote calls that use the WebAdministration module you can sometimes get this error, inconsistently:

ERROR: Get-WebSite : Could not load file or assembly ‘Microsoft.IIS.PowerShell.Framework' or one of its dependencies. The system cannot find the file specified.

It’s a really tricky error because its inconsistent. But, there is a workaround that will prevent the error from giving you too much trouble. From the community that has done troubleshooting on this, the problem seems to occur on the first call that uses the WebAdministration module. If you can wrap that call in a try/catch, then subsequent calls will work correctly.

$scriptBlock = {
    Import-Module WebAdministration

    try {
        $sites = Get-WebSite
    } catch {
        $sites = Get-WebSite

Invoke-Command -ScriptBlock $scriptBlock -ComputerName Remote01

PowerShell AppPool Assignment Problems

on Friday, November 7, 2014

The WebAdministration module has a Function called IIS:. It essentially acts like a drive letter or an uri protocol. Its really convenient and makes accessing appPool, site information, and ssl bindings easy.

I recently noticed two problems with assigning values through the IIS: protocol or the objects which is works with:

StartMode Can’t Be Set Directly

For some reason, using Set-ItemProperty to set the startMode value directly throws an error. But, if you retrieve the appPool into a variable and set the value using an = operator, everything works fine.

ipmo webadministration

New-WebAppPool ""

Set-ItemProperty IIS:\AppPools\ startMode "AlwaysRunning" # throws an error

$a = Get-Item IIS:\AppPools\
$a.startMode = "AlwaysRunning"
Set-Item IIS:\AppPools\ $a # works

Here is the error that gets thrown:

Set-ItemProperty : AlwaysRunning is not a valid value for Int32.
At C:\Issue-PowershellThrowsErrorOnAppPoolStartMode.ps1:6 char:1
+ Set-ItemProperty IIS:\AppPools\ startMode "AlwaysRunning" # throws an e ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Set-ItemProperty], Exception
    + FullyQualifiedErrorId : System.Exception,Microsoft.PowerShell.Commands.SetItemPropertyCommand


CPU’s resetLimit Can’t Directly Use New-TimeSpan’s Result

I think the example can show the problem better than I can describe it:

ipmo webadministration

New-WebAppPool ""

$a = Get-ItemProperty IIS:\AppPools\ cpu
$a.resetInterval = New-TimeSpan -Minutes 4 # this will throw an error
Set-ItemProperty IIS:\AppPools\ cpu $a

$a = Get-ItemProperty IIS:\AppPools\ cpu
$k = New-TimeSpan -Minutes 4 # this will work
$a.resetInterval = $k
Set-ItemProperty IIS:\AppPools\ cpu $a

Here is the error that gets thrown:

Set-ItemProperty : Specified cast is not valid.
At C:\Issue-PowershellThrowsErrorOnCpuLimitReset.ps1:8 char:1
+ Set-ItemProperty IIS:\AppPools\ cpu $a
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Set-ItemProperty], InvalidCastException
    + FullyQualifiedErrorId : System.InvalidCastException,Microsoft.PowerShell.Commands.SetItemPropertyCommand

The links on each section correspond with bug reports for the issues, so hopefully they will get looked into.

PowerShell Wrapper for Http Namespaces

on Friday, October 31, 2014

When hosting HTTP WCF services as a self-hosted Windows Services the server needs to have the HTTP Namespace reserved. The reservation allows for the domain account which runs the service to setup a listener on a particular port, for a particular address.

There are some tools already available which can help in this process:

  • HTTP Namespace Manager – A nice GUI interface, which is easy to understand and setup. It also works on Server Core Servers.
  • httpcfg – Windows Server 2003
  • netsh – Windows Server 2008+

But, there are no PowerShell wrappers for these commands. So, here’s a wrapper that provides:

  • Add-HttpNamespace
  • Get-HttpNamespace
  • Get-HttpNamespaces
  • Test-HttpNamespaceExists

There’s no remove because I haven’t needed it yet. A namespace is usually associated with a particular port, and I haven’t been involved in a situation where a port needed to be reused.

    Parses the output from netsh to turn them in PSObjects.
Function Get-HttpNamespaces {

    # the $propsReady variable causes alot of errors to occur, but the results are accurate.
    # so this helps hide the errors
    $originalErrorAction = $ErrorActionPreference
    $ErrorActionPreference = 'SilentlyContinue'

    try {

        # pull the data from netsh
        $urlaclOutput = . netsh http show urlacl

        # parse the data into PSObjects
        $httpNamespaces = New-Object System.Collections.Generic.List[PSObject]
        $props = @{}
        $userProps = @{}
        $userRdy = $false
        for($i = 0; $i -lt $urlaclOutput.Count; $i++) {
            $line = $urlaclOutput[$i].Trim()

            $split = $line.Split(":", [StringSplitOptions]::RemoveEmptyEntries)

            $first = ""
            if($split.Count -gt 0) { $first = $split[0] }
            # line parsing
            switch($first.Trim()) {
                "Reserved URL" {
                    $props.ReservedUrl = $line.Substring(25).Trim()
                    $users = New-Object System.Collections.Generic.List[PSObject]
                "User" {
                    if($userRdy) {
                        $user = New-Object PSObject -Property $userProps

                        $userProps = @{}
                        $userRdy = $false

                    $userProps.User = $split[1].Trim()
                "Listen" { $userProps.Listen = $split[1].Trim() }
                "Delegate" {
                    $userProps.Delegate = $split[1].Trim()
                    $userRdy = $true
                "SDDL" {
                    $userProps.SDDL = $line.Substring(5).Trim()
                    $userRdy = $true
                "" {
                    if($userRdy) {
                        # user
                        $user = New-Object PSObject -Property $userProps

                        $userProps = @{}

                        # url
                        $props.Users = $users.ToArray()

                        $cnObj = New-Object PSObject -Property $props

                        $props = @{}

                        # reset flag
                        $userRdy = $false
    } finally {
        $ErrorActionPreference = $originalErrorAction # revert the error action

    return $httpNamespaces.ToArray()

    Retrieves the namespace information for a given namespace. It will also search for namespaces
    which match but the host names have been replaced with + or * symbols.
Function Get-HttpNamespace {
Param (
    [Parameter(Mandatory = $true)]
    [string] $HttpNamespace

    $httpNamespaces = Get-HttpNamespaces

    # get * and + versions of the url ready
    $starNamespace = $HttpNamespace
    $plusNamespace = $HttpNamespace
    $namespaceRegex = [regex] "http.*://(.*):.*/.*"
    if($HttpNamespace -match $namespaceRegex) {
        $hostname = $Matches[1]
        $starNamespace = $HttpNamespace.Replace($hostname, "*")
        $plusNamespace = $HttpNamespace.Replace($hostname, "+")

    # sometimes the http namespaces get /'s added to the end
    $namespace = $httpNamespaces |? {
                            $_.ReservedUrl -eq $HttpNamespace `
                    -or     $_.ReservedUrl -eq ($HttpNamespace + '/') `
                    -or     $_.ReservedUrl -eq $starNamespace `
                    -or     $_.ReservedUrl -eq ($starNamespace + '/') `
                    -or     $_.ReservedUrl -eq $plusNamespace `
                    -or     $_.ReservedUrl -eq ($plusNamespace + '/')

    return $namespace

    Checks if a namespace already exists. It will also search if the namespace has had its host name
    replaced with + or * symbols.
Function Test-HttpNamespaceExists {
Param (
    [Parameter(Mandatory = $true)]
    [string] $HttpNamespace

    $namespace = Get-HttpNamespace $HttpNamespace

    return $namespace -ne $null

    Adds a new Http Namespace. This will automatically swap out the host name for a + symbol. The
    + symbol allows the Http Namespace to bind on all NIC addresses.
Function Add-HttpNamespace {
Param (
    [Parameter(Mandatory = $true)]
    [string] $HttpNamespace,
    [Parameter(Mandatory = $true)]
    [string] $DomainAccount

    $create = $true
    if(Test-HttpNamespaceExists $HttpNamespace) {
        # it already exists, so maybe not create it
        $create = $false

        $namespace = Get-HttpNamespace $HttpNamespace
        # but, if the given DomainAccount doesn't exist then create it
        $user = $namespace.users |? { $_.user -eq $DomainAccount }
        if($user) {
            Write-Warning "NET $env:COMPUTERNAME - Http Namespace '$HttpNamespace' already contains a rule for '$DomainAccount'. Skipping creation."
        } else {
            $create = $true

    if($create) {
        # the standard pattern to use is http://+:port/servicename.
        #   eg. http://contoso01:15110/EmployeeService would become http://+:15110/EmployeeService
        $plusNamespace = $HttpNamespace
        $namespaceRegex = [regex] "http.*://(.*):.*/.*"
        if($HttpNamespace -match $namespaceRegex) {
            $hostname = $Matches[1]
            $plusNamespace = $HttpNamespace.Replace($hostname, "+")
        } else {
            throw "NET $env:COMPUTERNAME - Http Namespace '$HttpNamespace' could not be parsed into plus format before being added. Plus format " + `
                "looks like http://+:port/servicename. For example, http://contoso01:15110/EmployeeService would be formatted into " + `

        # ensure the full domain account name is used
        $fullDomainAccount = Get-FullDomainAccount $DomainAccount

        # create the permission
        Write-Warning "NET $env:COMPUTERNAME - Adding Http Namespace '$Httpnamespace' for account '$fullDomainAccount'"
        $results = . netsh http add urlacl url=$plusNamespace user=$fullDomainAccount listen=yes delegate=yes
        Write-Host "NET $env:COMPUTERNAME - Added Http Namespace '$Httpnamespace' for account '$fullDomainAccount'"

    $namespace = Get-HttpNamespace $HttpNamespace
    return $namespace

Creative Commons License
This site uses Alex Gorbatchev's SyntaxHighlighter, and hosted by's Jon Galloway.