Book Review?: Team Topologies

on Monday, December 30, 2019

Matthew Skelton created a blog post/website describing team topologies a few years ago, and with the help of others has created a book from it (Amazon, Audible).

The book breaks down the teams into 4 architypes (supplemental materials at itrevolution.com):

  • Stream Aligned Team
  • Enabling Team
  • Complex Subsystem Team
  • Platform Team

And 3 Interaction Modes:

  • Collaboration
  • X-as-a-Service
  • Facilitating

The book will help describe the team types and what their usage patterns are.

I ran across the website which shows the Anti-Patterns and Patterns of Team Topologies about a year ago. I was trying to answer questions from coworkers about “What should our teams look like if we do DevOps”, and the website left me with a lot of guess work into the details of each of the patterns. I don’t think the original intent of the website was to be confusing, but it explored an area that I wanted to know more about so it left me with more questions than answers. And, this book didn’t seem to directly answer those questions.

It seemed to expand on the original work with new ideas and new thoughtful constructs. The team architypes lined up well with what I had seen in my own work and also lined up well with my prior understanding of DevOps / Lean Project Management methodologies. All of which are well described in the books from itrevolution.com and other publishers. (This is not an advertisement, these books have well established and useful knowledge within them.)

For the most part, the Stream Aligned Teams make a lot of sense to me because I have seen working example of them. At work, we have multiple teams which sit within distinct departments and work on the applications/projects/products for those departments.

However, the Enabling and Platform teams seemed like they could be much more intertwined than what the book described. I personally feel like I work within a platform team, but the platforms that I help provide are only useful if (1) they are developed in collaboration with a Stream Aligned Team and (2) the platform product in demonstrated/shared/knowledge-transferred to all Stream Aligned Teams. To me, #2 seems like the function of an Enabling Team. So, it feels like a company that wants to establish a permanent Enabling team would need to contain members that were in constant rotation with the Platform teams in order to:

  • Keep Enabling Team members up to date on the latest utilities developed by the Platform teams
  • Keep the Enabling Team members skills sharp on implementation details of the Platform team products
  • Allow for Enabling Team members to develop new functionality into the Platform without having to cross a communication/trust boundary (fighting Conway’s Law)

I think that the Enabling and Platform teams might have been separated in the book due to the need for the Enabling team members to have a higher degree of interpersonal communication and collaborative skills. To be very blunt, the Silicon Valley architypes of system administrators didn’t come out of thin air (1, 2) and Platform teams would need good System Administrators.

So, maybe it’s the more evangelist personality types of the Platform Teams that can rotate into the Enabling Teams.

The only team type that I never fully connected with was the Complex Subsystem Team. I thought of some potential example projects which could fit the description. But the teams were adhoc and my example cases never felt quite right. Maybe I just don’t work someplace that has difficult enough problems to require such teams. The book is very clear that these teams are optional and would not be needed for all work environments.

In the end, I think the book adds to the overall body of literature on DevOps, and it can really help a group if they are scratching their heads about what sort of team structure could help improve their organizational structure.

Update .NET Core Runtime Installer for SDK

on Monday, December 23, 2019

A while back, I wrote a post titled Monitor and Download Latest .NET Core Installer. The post was about a script which could be used to monitor the aspnet/aspnetcore releases page, and if a new releases came out the installer for the .NET Hosting Bundle would be downloaded. A piece of code I didn’t include in that post, was a script that then took the downloaded .exe installer and created an installation package from it. This installation package was targeted at servers that host ASP.NET Core web applications. This post is not about that secondary script.

Instead, this is a small tweak to the monitor/download script. Instead of downloading the .NET Hosting Bundling, it would download the SDK. And the installation package that it eventually creates (not included in this post) is targeted at build servers.

Create a Custom ProblemDetailsFactory

on Monday, December 16, 2019

An Exception Handler is also needed for this to work. You can read more within the follow-up post, ExceptionHandler Needed.

In .NET Core 2.2/3.0 the ASP.NET Team made a move towards a validation and error reporting standard called Problem Details (RFC 7807). I don’t know much about the history of the standard except for what’s listed on it’s description. It became a proposed standard in March 2016 (which means it was probably being developed for years before that), and it was sponsored by M. Nottingham (W3C Technical Architecture Group), Akamai (they’re pretty famous) and E. Wilde (Siemens).

This standardization also lines up with something David Fowler has been talking about for a little while (1, 2, 3), Distributed Systems Tracing. From an outsiders perspective it really feels like many of the teams at Microsoft are trying their best to get more observability, metrics, and tracing into their tooling. And Microsoft seems to be using the “extensions” described in the Problem Details RFC to add a new property called “traceId”. I think this property will line up with a larger effort by Microsoft to support OpenTelemetery (Microsoft reference) and potential improvements to Application Insights.

So … Microsoft has these great baseline ProblemDetail objects which help standardize the way 500 and 400 errors are returned from Web APIs. But, how can you extend upon their work to add some customizations that are particular to your needs?

Well, when you read the Microsoft Handle errors in ASP.NET Core web APIs documentation, you feel like it must be pretty easy because they say you just need to “Implement ProblemDetailsFactory”. But, that’s all the documentation does. It just “says” you should implement it, there is no example code to work from. The example that is given shows how to replace the default factory with your custom factory (which is a great example, Thank You!), but there’s no example given on what your factory could look like.

This leads to the question of “How does Microsoft do it?”. Well … they use an internal (non-public) DefaultProblemDetailsFactory.

It would be great if DefaultProblemDetailsFactory could be made public.

One of the striking features of that default implementation it never references System.Exception. It’s job is to translate an uncaught exception into a 500 Internal Server Error response object. But, it never uses an exception object in it’s code?

Maybe that’s because it does the translation earlier on the process, like in ProblemDetailsClientErrorFactory. I really don’t know how it all connects. The original developers are some pretty smart people to get it all working.

Anyways … for this example, I’m going to:

  • Use the DefaultProblemDetailsFactory as the starting code to extend.
  • Create a custom class which the Factory will look for in order to alter the returned ProblemDetails object.
  • Use a Feature on the httpContext to pull in Exception information (I don’t know how else to get the exception object?)
  • Use the ProblemDetailsFactoryTest class from Microsoft to help build a unit test.
  • Update the unit test to inject the exception.

Let’s start with the custom class (YourBussException.cs) that will be used by our custom factory to extend the returned data. The class will:

  • Use the underlying Exception’s Message property to fill in the “Detail” property of the ProblemDetail object.
  • Add an ExtendedInfo object where your development teams can add extra information that can help inform API clients on how to resolve the issue.

Next we'll make some small updates to the factory in order to create one that will translate our exception into a 400 Bad Request response (YourBussProblemDetailsFactory.cs):

Alternatively, you can use a Decorator Pattern to wrap the default InvalidModelStateFactory as described in AspNetCore.Docs/issue/12157, How to log automatic 400 responses on model validation errors, option #2 (using PostConfigure<>). The concern I have with this approach is that you are no longer using the Dependency Injection system to create your factory. You are hand creating an instance of the factory and that instance is no longer easily referenceable to any code that wants to interact with it. This also makes the code more brittle to changes and less testable.

Finally, we can use the example code from Microsoft Handle errors in ASP.NET Core web APIs documentation, to swap in our new YourBussProblemDetailsFactory (Startup.cs):

Now, you should be able to throw your exception from anywhere in the code and have it translated back as a 400 error:

Some things to take note of:

  • The ProblemDetails classes were introduced in ASP.NET Core 3.0. So, you’ll have to update your target framework to ‘aspnetcore3.0’ or above to make this work.
  • You’ll also need to add in a FrameworkReference to ‘Microsoft.AspNetCore.App’ as the Microsoft.AspNetCore.Mvc.Infrastructure namespace only exists within it. And you can only get that reference through the FrameworkReference (as opposed to nuget packages). See Migrate from ASP.NET Core 2.2 to 3.0 for an example.
  • The null-coalescing operator (??=) only compiles in C# 8.0. So, if your project, or your referenced projects depend on ‘netstandard2.0’ or ‘netcoreapp2.X’ then you’ll need to update them to get the compiler to work (this took a while to figure out.) (<--That’s right, your referenced projects have to update too; it’s really non-intuitive.)

Finally, let’s take a look at a unit test. I’m going to make this code sample a bit short. To make the full example work, you will need to copy all of these internal classes into your testing code:

This is the code snippet needed just for the test (YourBussProblemDetailsFactoryTests.cs):

An Exception Handler is also needed for this to work. You can read more within the follow-up post, ExceptionHandler Needed.

Using Swagger UI and ReDoc in ASP.NET Core 3.0

on Monday, December 9, 2019

I was working with the Swashbuckle.AspNetCore library and wanted to play around with both the Swagger-UI active documentation endpoint and the ReDoc active documentation endpoint at the same time. I wanted to compare the two to see which was easier to use. (It turns out they’re both really easy to use.)

But, I ran into a problem, both UI endpoints wanted to use OpenAPI json document to pull in the Web APIs definition. And, the default configuration that comes with Swashbuckle.AspnetCore will generate that document at “swagger/v1/swagger.json”. (You can change this path using the .RouteTemplate property during configuration.)

https://your.domain.com/subapp/swagger/v1/swagger.json

This wasn’t a problem with the Swagger-UI system because the .UseSwaggerUI configuration has a default base path of “swagger”. This means all you need to do to configure is call .SwaggerEndpoint(“v1/swagger.json”, “My API V1”) and it will find the correct path.

Swagger-UI’s default path is “swagger”, so adding “v1/swagger.json” will create “swagger/v1/swagger.json”.

However, the .UseReDoc configuration has a base path of “api-docs”. Which means you shouldn’t be able to find the default json documentation path under it.

You shouldn’t be able to … but in a very interesting twist, the configuration does allow for relative pathing. Which means you can use “..” to escape the default pathing. This was great! So, you can use a relative path, “../swagger/v1/swagger.json” to reference it.

ReDocs default path is “api-docs”, so you can escape it using relative path like “../swagger/v1/swagger.json”. This will create the expected “swagger/v1/swagger.json”

Here’s an example:

Getting Root / Base Path in ASP.NET Core 3.0 (IIS)

on Monday, December 2, 2019

In my mind a root / base path for an IIS application looks something like this:

https://my.host.com/some/subdir

That format contains the schema://host/base-path

For most web applications, this information isn’t important. And, for the most part, you should try to develop your applications to be agnostic to where it’s hosted. To do this you can use techniques like always using relative pathing in your web applications links (<a href=”../something/relative.html”>) and web api results. Of course, if you know a link is pointing to a resource outside of your application, then using an absolute path would make sense.

However, using relative paths isn’t always possible and sometimes you will need to create an absolute path for your local application. I’ve run into this scenario when a nuget package or submodule wasn’t designed with relative paths in mind and they require an absolute Uri/path as input.

When I googled for how to get the base path within an Asp.Net Core 3.0 application today, the top results where focused on getting the base path of a request. Which is really easy. The HttpRequest/Message will contain that information and it’s always at your fingertips. But the scenario I had in mind was during Startup.cs, specifically during the .Configure(IApplicationBuilder app, IWebHostEnvironment env) method.

There is a good stackoverflow post, How to get base url wihtout accessing a request, which demonstrates how to get the information in ASP.NET Core 2.X. It showed that there is an internal feature called, IServerAddressesFeature, which contains the full root / base path information (example at the start of the post). In ASP.NET Core 2.X, that service/feature was accessible from the IWebHost object that could be found in the Program.cs class.

In 3.0, IWebHost has been replaced by IHost and the .ServerFeatures property is not available on it. But, that’s because it’s even easier to get ahold of. When the IHost.Run() method is called in Program.cs, it will (eventually) call Startup.Configure(IApplicationBuilder app, IWebHostEnvironment env). And, the ServerFeatures property is exposed on the IApplicationBuilder object. So, you can retrieve the IServerAddressesFeature information directly within the .Configure method.

For example:

And a full example:

Default Configurations for PS Modules

on Monday, November 25, 2019

A common problem with Powershell modules is that they need to be configured slightly differently when being used for different needs. For example, developers may want a module to use a local instance of a service in order to do development or testing. But, on a server, the module might be expected to connect to the instance of a service specific for that environment. These are two separate groups of users, but each has the same need, a default configuration that makes sense for them.

One way we’ve found to help make this a little more manageable is to create a standardized way to configure local default configuration’s for developers, while creating an interface which can be used by service providers to set default configurations for use on the servers.

This comes about by standardizing on 4 functions:

  • Set-{ModuleName}Config –Environment [Prod|Test|Dev|Local]

    This is the function that most people will use. If you want to point that module to use a particular environments services, use this function.

    For developers, this is useful to point the module at their most commonly used environment. For a service they help build and maintain, that would most likely be local. But, for service they only consume, that is usually Prod.

    For module developers, this function can be used to set the default configuration for the module. In general, this turns out to be defaulted to Prod. If your not the developer of a service, and you are going to use a Powershell module to interact with that service, you’re generally wanting to point it to Prod. This is the most common use case, and module developers usually setup module defaults for the most common use case.

    For service developers that use the module within their services, this command is flexible enough for them to determine what environment their service is running in and set up the module to connect to the correct endpoints.
  • Save-{ModuleName}DefaultConfig

    This is mostly used by developers.

    Once you have the environment setup the way you want it, use the Save function to save the configuration locally to disk. We have had success saving this file under the users local folder (right next to their profile); so the settings are not machine wide, but user specific.

  • Restore-{ModuleName}DefaultConfig

    This function usually isn’t called by developers / end users.

    This function is called when the module loads and it will check if the user has a local configuration file. If it finds one, it will load the values into memory.

    Services usually don’t have a local configuration file.
  • Test-{ModuleName}Configured

    This function usually won't be called by the end user. It's used internally to determine if all the important properties are setup before saving the properties to disk.

To get people to adopt this strategy, you have to make it easy for module developers to add the functionality into their module. To do that there’s one more function:

  • Add-DefaultConfigToModule –ModuleName <ModuleName> –Path <Path>

    This will add 4 templated files to a module, one for each function. It will also update the .psm1 file to end with a call to Restore-{ModuleName}DefaultConfig.

Below is a very mashed together version of the files for the module.

The code does assume all the module configuration information is stored in $global:ModuleName

And, these files are to be placed within a subdirectory of the DefaultConfig module called /resources/AddTemplate:

Submitting a Bug/Pull Request for Asp.Net Core

on Monday, November 18, 2019

It’s great that the Asp.Net Core team has made the project open source. It makes finding bugs, resolving them and submit updates/pull requests tremendously more satisfying than when you would call Microsoft support, report your bug and hope the product team considered your problem high enough priority to justify their time.

Last week I put together a post on a NullReferenceException within the Microsoft.AspNetCore.Http.ItemsCollection object. I finished the post by filing a bug with the AspNetCore team (#16938). The next step I needed to do with download the source code for Asp.Net Core and create a fix for it.

So, the standard forking of the repository and creating a new branch to create the fix in was easy enough, but now came the interesting part. Submitting the Pull Request and receiving their feedback. The first thing that was pretty amazing was that I submitted the fix around 10 AM on Saturday morning (Pull Request #16947). By noon on a Saturday some one from the Asp.Net Core team had reviewed the pull request and already found improvements to the code. The best part of the review is that they are very talented programmers and found multiple ways to improve the unit test submitted and even found other errors in the original file that was being updated. They didn’t just review the code, they looked for ways to improve the overall product.

The fix I found for the NullReferenceException was to use a null-conditional operator to ensure that the exception didn’t occur. But, what they did was search the entire class for every place this might occur and suggested where else a null-conditional operator could be applied to prevent future issues. They are detailed.

The parts of my pull request they had the most feedback on were the unit tests, and the suggestions were are useful to simplify the code and get at the core of what was being tested. When running the unit tests on my local machine, I could tell that they really focused in how to make the unit tests as fast and efficient as possible. The dotnet test runner could run the 169 unit tests in the Microsoft.AspNetCore.Http library in under 4 seconds. For comparison, I’ve mostly been working with Powershell unit tests for a while, and loading up the Powershell runtime and the Pester module, before even running the tests, usually takes a good ~5 seconds.

Overall it was a pretty easy process and they accepted the update for inclusion in the Asp.Net Core 5.0 preview1 release. Now, for getting the fix merged into Asp.Net Core 3.X, that was a little more difficult (#17068).

NullReferenceException in Http.ItemsCollection

on Monday, November 11, 2019

The other day a coworker and I were adding the ElmahCore/ElmahCore library to an AspNetCore web project and we ran into a strange NullReferenceException when using it. The problem didn’t occur when the library was used in an AspNetCore 2.2 project, and this was the first time trying it on an AspNetCore 3.0 project. So, we wanted to believe it was something to do with 3.0, in which case there would probably be other people that were running into the issue and Microsoft have a fix ready for it very soon. But, we needed to move forward on the project, so we wanted to find a work around in the meantime.

Personally, I found a bit of humor with this Exception, because at first glance it looked like it was occurring within an Exception Management system (Elmah). I mean the one thing that an Exception Management system definitely doesn’t want to do is throw Exceptions. However, being that I was already troubleshooting an Exception problem that seemed kind of funny, I leaned into the ridiculousness and decided to debug the Exception Management system with another Exception Management system, Stackify Prefix ( … it’s more of an APM, but it’s close enough).

The truly laugh out loud moment came when I hooked up Stackify Prefix, and it fixed the Elmah exception. Elmah started working perfectly normally once Stackify Prefix was setup in the application. So … one Exception Management system fixed another Exception Management system. (Or, at least, that’s how it felt.)

Of course, I needed to figure out what Stackify did to fix Elmah, so needed to pull Stackify back out of the application and really get into Elmah. Which meant grabbing a clone of ElmahCore to work with.

Even before having the source code available, I had the function name and stack trace of the Exception. So once I had the source code available, zeroing in on the code was pretty straight forward. But, what I found was unexpected.

As best I could tell, the problematic code wasn’t in Elmah, but was occurring from an underlying change inside of Microsoft.AspNetCore.Http.ItemsDictionary. It seemed like an internal field, _items, was null within the ItemsDictionary object. And, when the enumerator for the collection was used, a NullReferenceException was being generated.

(More detailed information at ElmahCore Issue #51)

It seemed like the workaround for the issue was to populate the request.Items collection with a value, in order to ensure the internal _items collection was not null. And to double check this, I loaded back up Stackify Prefix and checked what the value of the collection was when Stackify was involved. Sure enough, Stackify had populated the dictionary with a corellation id; and that’s why it fixed the issue.

For ElmahCore, I implemented a really bad code fix and created a pull request for them. It is a really bad fix because the solution triggers the exception, catches it, and then swallows it. Which makes the Elmah code continue to execute successfully. But, any .NET Profiler that ties into the really low level .NET Profiler APIs (these are the APIs which power APM solutions like Stackify Retrace, AppDynamics, etc) will record that exception and report as if its an exception occurring within your application.

At this point, the next steps are how to inform Microsoft of the problem and get a bug fix/pull request going for them. I’ve opened the bug (AspNetCore Issue #16938), but I will need to get the code compiling on my machine before submitting a pull request.

Performance Gains by Caching Google JWT kids

on Monday, November 4, 2019

To start with, a “kid” is a key id within a JSON Web Key Set (JWKS). Within the OpenID Connect protocol (which is kind of like an OAuth2 extension) Authentication Services can ensure the data integrity of their JWT tokens by signing them. And they sign the tokens with a private certificate. The token signature can then be verified using a public certificate; this is somewhat similar to SSL certificates for websites over https. With https, the public certificate is immediately given to the browser as soon as you navigate to the website. But, for JWT tokens your application will have to go “look up” the certificate. And that’s where OpenID Connect comes in.

OpenID Connect’s goal wasn’t to standardize JWT tokens or the certificate signing, it was a secondary feature for them. However, the secondary feature was pretty well done and OAuth2 enthusiasts adopted that part of the protocol while leaving the rest of it alone. OpenID Connect specified that their should be a “well known” endpoint where any system could go look up common configuration information about an Authentication Service, for example:

https://accounts.google.com/.well-known/openid-configuration

One of the standard values is ‘jwks_uri`, which is the link to the JSON Web Key Set. In this case:

https://www.googleapis.com/oauth2/v3/certs

In the example above, the entire certificate is in the `n` value. And the `kid` is the key to lookup which certificate to use. So, that’s what kids are; they’re the ids of signing certificates & algorithms.

So, where does the performance gain come in?

The performance gain for these publicly available certificates is that they can be cached on your application servers. If your application is going to use Google OAuth for authentication, and use JWT tokens to pass the user information around, then you can verify the token signatures using cached certificates. This puts all the authentication overhead on your application server and not in a synchronous callback to an Authentication Service.

But, there is a small performance penalty in the first call to retrieve the JWKS.

What’s the first call performance penalty look like?

Not much, about 500 ms. But, here’s what it looks like with an actual example.

First call that includes the JWT Token:

  • It reaches out to https://accounts.google.com/.well-known/openid-configuration which has configuration information
  • The configuration information indicates where to the get the “kids”: https://www.googleapis.com/oauth2/v3/certs
  • It downloads the JWKS and caches them
  • Then it performs validation against the JWT token (my token was expired in all of the screenshots, this is why there are “bugs” indicated)
  • Processing time: 582 ms
  • Processing time overhead for JWT: about 500 ms (in the list on the left side, the request just before it was the same request with no JWT token, it took about 99 ms)



(info was larger than one screenshot could capture)

Second call with JWT:

  • The caching worked as expected and the calls to google didn’t occur.
  • Processing Time: 102 ms
  • So, the 500 ms overhead of the google calls don’t happen when the caching is working.


(info was larger than one screenshot could capture)

Test Setup:

  • The first call is the application load time. This also included an Entity Framework first load penalty when it called a database to verify if I have permissions to view the requested record.
    • Processing Time: 4851 ms
    • This call did not include the JWT.
  • The second call was to baseline the call without a JWT.
    • Processing Time: 96 ms
  • The third call was to verify the baseline without a JWT.
    • Processing Time: 99 ms

Wrap Up …

So, it isn’t much of a performance gain. But, its enough to make caching the value and keeping all the authentication logic on your application server worth while.

(Ohh … and Stackify Prefix is pretty awesome! If you haven’t tried it, you should: https://stackify.com/prefix-download/)

Translate a BDD User Story to a Pester/Selenium Test

on Monday, October 28, 2019

This is a very simple example of taking a User Story that’s written in a BDD style given-when-then pattern and converting it into a Selenium based UI Test. To do this, I’m going to use Pester and a Selenium Powershell module (mentioned previously in Selenium & Powershell) to write the executable test.

So, let’s start out with a very common scenario, logging into a website. It’s not the greatest example as it’s generally assumed that development teams will need to ensure this functionality works and might not make it into requirements written by end users or driven by business value statements. But, here’s the simplified use case anyways:

Scenario: User logs in with valid credentials

Given that I am not logged in,

When I enter my username and password and click the Log In button,

Then I will be logged in and I will see the Search page.

So, let’s translate this to some example code:

What’s useful in a deployment summary?

on Monday, October 21, 2019

There’s a larger question that the title question stems from: What information is useful when troubleshooting problematic code in production? If a deployment goes out and things aren’t working as expected, what information would be useful for the development team to track down the problem?

Some things that would be great, would be:

  • Complete observability of the functions that were having the problem with information about inputs and the bad outputs.
  • Knowledge of what exactly was changed in the deployment compared to a previously known good deployment.
  • Confidence that the last update to the deployment could be the only reason that things aren’t functioning correctly.

All three of those things would be fantastic to have, but I’m only going to focus on the middle one. And, I’m only going to focus on the image above.

How do we gain quick and insightful knowledge into what changed in a deployment compared to a previously known ‘good’ deployment?

I guess a good starting place is to write down what are the elements that are important to know about a deployment for troubleshooting (by the development team)?

  • You’ll need to lookup the deployment history information, so you’ll need a unique identifier that can be used to look up the info. (I’m always hopeful that this identifier is readily known or very easy to figure/find out. That’s not always the case, but it’s something to shoot for.)
  • When the deployment occurred, date and time?

    This would be useful information to known if the problem definitely started after the deployment went out.
  • Links to all of the work items that were part of the deployment?

    Sometimes you can guess which work item is most likely associated with an issue by the use of a key term or reference in it’s description. This can help narrow down where in logs or source control you may need to look.

    If they are described in a way that is easily understood by the development team (and with luck, members outside the development team) that would be great.
  • Links to the build that went into the production deployment? Or, all previous builds since the last production deployment?

    Knowing the dates and the details of the previous builds can help track the issue back to code commits or similar behavior in testing environments.

    Of course, if you can get to full CI/CD (one commit per build/deployment), then tracking down which work item / commit had the problem becomes a whole lot easier.
  • Links to the source control commit history diffs?

    If you want to answer the question “What exactly changed?” A commit diff can answer that question effectively.
  • Links directly to sql change diffs?

    In the same vein as source control commit history diffs, how about diffs to what changed in the databases.
  • Statistics on prior build testing results? If a previous build didn’t make it to production, why didn’t it make it there? Were their failing unit tests? How about failing integration tests or healthchecks?

    Would quick statistics on the number of tests run on each build (and passed) help pinpoint a production issue? How about code coverage percentages? Hmmm … I don’t know if those statistics would lead to more effective troubleshooting.

Another thing that seems obvious from the picture, but might not always be obvious is “linkability”. A deployment summary can give you an at-a-glance view into the deployment, but when you need to find out more information about a particular aspect, having links to drill down is incredibly useful.

But, there has to be more. What other elements are good to have in a deployment summary?

A FakeHttpMessageResponse Handler

on Monday, October 14, 2019

The Microsoft ASP.NET Team has done a great deal of work over the years to make their libraries more friendly to unit testing. Some of the best parts of their efforts were their adoption of open source testing frameworks like xUnit and Moq.

An interesting use case when testing is the need to mock data from a data source. Generally this means mocking the result from a database call, which the EF Core Team has made considerably easier with the addition of UseInMemoryDatabase.

But, sometimes your data store is going to be a Web API. And in those cases, you might use an HttpClient to retrieve your data from the service. So, how would test that? The way that Microsoft intended for you to handle this is to create an HttpMessageHandler (or DelegatingHandler) and add it into the base line handlers when creating your HttpClient through the HttpClientFactory. Unfortunately, that isn’t always as straight forward as it would seem.

Luckily, Sebastian Gingter has this great way of using Moq to simplify the process with a FakeHttpMessageHandler. It really cleans up the clutter in the code and makes the usage readable.

This example has a small twist on his implementation as it doesn’t pass the mocked HttpMessageHandler in through the constructor. Instead is exposes the HttpClient’s internal handler through a public property so it can be replaced after the instantiation of the HttpClient object. This is definitely not something Microsoft wants you to do.

But, here we go anyway:

  • AlphaProxyTests.cs

    This is example Test code which shows the FakeHttpMessageHandler in use.
  • FakeHttpMessageHandler.cs

    This is the slightly modified version of Sebastian Gingter’s FakeHttpMessageHandler class.
  • HttpClientWrapper.cs

    This is a wrapper class for HttpClient which exposes the internal HttpMessageHandler as a public property. This is not recommended by Microsoft.
  • HttpMessageInvokeExtensions.cs

    This uses reflection to access internal fields in order to directly manipulate the internal HttpMessageHandler that HttpClient will call. Because this accesses internal fields from HttpClient it is unsupported by Microsoft and it will break from time to time. It has already broken once, and there is a good chance it may not work with all ASP.NET Core implementations.

Analysis of an Antivirus False Positive

on Monday, October 7, 2019

Here’s the background scenario:

A team member zipped up a website from a production server and copied it to their local machine. This particular production server did not have an antivirus scanner on it at the time, and when the zip file reached the team members machine this warning message appeared from the machines antivirus software:

image
The team member did the right thing and deleted the files from disk and contacted the security team to investigate.

Soo … what are the right steps to perform from here? I’m not sure, but here are some steps that could be taken.

  • Despite the constant warnings of security trainers, don’t assume that a virus has now been distributed onto the server and now onto the team members workstation. This is a classic fear based training technique used to prevent the spread of an attack, but it may not always be the right thing to do.
  • Instead, look at the evidence presented soo far and try to see what more information can be learned from it.

Okay, so what can be determined from the information so far:

  • A server could possibly have a backdoor virus called PHP/Remoteshell.B on it.
    • If the backdoor was executed, then potentially a different/higher privileged malware could be on the server.
  • A team member machine could possibly have a different/higher privileged malware on it.
  • The website was running Sitefinity
    • Sitefinity is an ASP.NET product, not a PHP product
  • The virus signature was found in a file called Error.*.log
    • Sitefinity uses Error.*.log files to record messages about errors it detects or encounters in order for administrators to review them later.
    • Therefore the Error.*.log files would be error messages and not executable code.

And, what assumptions could we make from the information so far:

  • Given that it’s an error log, viewing the log file would probably be benign and could possibly give information about how an attack started and if the attack stopped.
  • Given that it’s an ASP.NET web application and not a PHP application, the attack did not succeed. So, there is very little to no risk is assuming the server is not infected with a backdoor or virus.
  • However, if it was infected with a virus, and that virus was capable of proliferating itself through RDP, then the team member had already brought the virus into their/your network.
  • In this highly unlikely scenario, if the team member had already brought the virus onto your network then it wouldn’t be terrible to connect with an isolated VM instance from that network which could be used to look at the files on the server.

At this point, the risk of proliferating a virus onto a disposable VM in order to view the actual log files on the server seemed like a good risk vs reward scenario.

So, using a throw away virtual machine, you could connect up to the server and take a look at the Error.*.log files, which might show this:

image

There are a number of lines that look very similar to ones above, and they were all done over short 30 second period and then stopped. The requested urls varied with syntaxs from PHP, .NET, to regular old SQL injection.

So, what’s the new information that we can learn from this:

  • This looks like an automated attack signature from a bot that was penetration testing.

This server is fortune enough to get a routine penetration testing from out automated penetration testing system around every two weeks. If any of the attacks are found to be successful then a report is generated and sent to the Security and Development teams to implement fixes.

So, this could be a nice moment to call the analysis done, but it did highlight a few missing things. It resulted in the antivirus software being hooked up to the server and a scan being performed, along with a good amount of information sharing and more awareness of the automated penetration testing system.

There’s probably a lot more that could be done, and a lot more safety precautions that could have been taken during the analysis. At least, there’s always tomorrow to improve.

Telemetry in PowerShellForGitHub

on Monday, September 30, 2019

The PowerShellForGitHub module has a number of interesting things about it. One interesting aspect is it’s implementation for collecting telemetry data, Telemetry.ps1. Telemetry is very useful to help answer the questions of:

  • What aspects of your service are being used most frequently?
  • What aspects are malfunctioning?
  • Are there particular flows of usage which can be simplified?

But, to be able to collect the data necessary to answer those questions you have to make the collection process incredibly easy. The goal would be to boil down the collection process to single line of code. And that’s what PowerShellForGitHub tried to do.

The level of data collection the module provides does take more than a single of line of code to use, but it’s so easily done as a part of the development process, it doesn’t feel like “extra work” or “overhead”. Here’s a snippet from GitHubRelease.ps1’s Get-GitHubRelease:

Looking through the function you can see a hashtable called $telemtryProperties created earlier on and it’s properties are slowly filled in as the function continues. Eventually, it gets to the point where a common function, Invoke-GHRestMethodMultipleResults is called and the telemetry information is passed off the underlying provider.

All of the hard work of setting up where the information will be collected and how it’s collected it abstracted away, and it boils down to be a subfeature of a single function, Invoke-GHRestMethodXYZ. Boiling everything down to that single line of code is what makes the telemetry in that module soo useful: it’s approachable. It’s not a headache that you have to go setup yourself or get permissions too, it just works.

To make it work at that level was no easy feat though! The code which makes all of that possible, including the amazing supplementary function’s like Get-PiiSafeString, is really long and involves the usage of nuget.exe to download and load Microsoft’s Application Insights and Event Tracing .NET libraries. These are hidden away in Telemetry.ps1 and NugetTools.ps1.

So, given the idea that “Telemetry is an incredibly useful and necessary piece of software development in order to answer the questions asked at the top of this article”, the new question becomes “How can you refactor the telemetry code from PowerShellForGitHub to be an easy to reuse package that any PowerShell module could take advantage of?

Monitor and Download Latest .Net Core Installer

on Monday, September 23, 2019

.NET Core has changed it’s IIS server installation model compared to .NET Full Framework. Full Framework updates were installable in offline installers, but they were also available through Windows Update/SCCM. However, with .NET Core the installer the IIS server installer, the “Hosting Bundle”, is only an offline installer that can be found by following these steps: (example for 2.2.7)

Even though this process is really quick, it can feel repetitive and feel like something you shouldn’t need to do. This feeling can be compounded if you missed that a new release was created weeks or months before and someone else points outs the update to you.

So, a small solution to these two minor inconveniences is to setup a watcher script which will monitor the AspNetCore teams github repository for new releases, and upon finding a new one will download the Hosting Bundle and notify you of the update. This solution could run on a nightly basis.

In this example script those previously mentioned pieces of functionality are provided by:

  • PowerShellForGithub\Get-GitHubRelease

    This function can be used to pull back all the releases for a github repository and the results can be filtered to find the latest stable release.

    In order to know if the latest release is newer than the release you currently have on your servers, you could simply check the date stamp for that release was the current day. However, for this sample script, it’s going to assume you can query an external system to find out the latest installed version.

  • Selenium \ SeleniumUcsb

    Selenium is used to navigate through the github release pages and find links to the download. It then downloads the Hosting Bundle installer by parsing through the pages to find the appropriate link. The parsing is actually a bit difficult to figure out sometimes, so an XPath Helper/tester for your browser can be really handy.

  • Emailing

    Send-MailMessage … yeah, that’s pretty straight forward.

.NET Core Installer Wait Loop

on Monday, September 16, 2019

.NET Core has a pretty fast release cycle and the team is not offering their hosting bundles through the Windows Update/SCCM Update channels. So, you may find yourself need to install the bundle on your system yourself. (I envy all the Docker folks that don’t need to think about this ever again.)

But, if you are looking to create a small installation script for .NET Core that can run remotely on your servers using Powershell, this snippet might help you get started. The script relied on the command Get-InstalledNetCoreVersion which can be found in a previous post.

The installer is really quick, always under 3 minutes, but the tricky piece is waiting until the installation completes. This particular wait loop can actually exit before the installer completes. But, the 60 second wait period between “checks” hasn’t created any race conditions yet.

Basic PowerShell Convertor for MSTest to xUnit

on Monday, September 9, 2019

This is just a quick script that can help convert a .Tests.csproj which was originally written for MS Test over to using xUnit. It probably doesn’t cover every conversion aspect, but it can get you moving in the right direction.

What it will convert:

  • Replace using Microsoft.VisualStudio.TestTools.UnitTesting; with using Xunit;
  • Remove [TestClass]
  • Replace [TestMethod] with [Fact]
  • Replace Assert.AreEqual with Assert.Equal
  • Replace Assert.IsTrue with Assert.True
  • Replace Assert.IsFalse with Assert.False
  • Replace Assert.IsNull with Assert.Null
  • Replace Assert.IsNotNull with Assert.NotNull

Azure/Elastic Kubernetes Service and gMSA

on Friday, September 6, 2019

I’ve written previously about how Docker Containers Are Not Domain Joined and all of the difficulties that it creates.

This post simply adds to that previous article with a little more links and information.

When I first heard of Docker, I imagined a system where you would throw a container at a service and it would figure out everything that was needed to run the container and just make it happen. Obviously that’s extremely difficult to do and as I learnt more about Docker the larger and more engrossing the problem became. My current understanding is no where near complete but here’s some more info on the problem.

In 2018, around the time I looked at AWS’ ALB prices, I looked into a price comparison of a Dockerized Web Farm vs an a IIS EC2 Web Farm. When developing out the system architecture for the Dockerized Web Farm I ran into two major issues:

  • Theoretically, it looks like, Windows containers use an absolute limit (search for “CPU limit is enforced as an absolute limit”) when allocating CPU utilization to the container.

    NOTE: I have not gotten to the point where I can prove or disprove the above statement; and OLDER Docker documentation doesn’t seem to indicate that Windows has this problem.

    What this means is that if you have a 2 CPU Host system, and you were to allocate .5 CPU to a Windows Container, then the Windows container would be given that .5 CPU for it’s sole usage. No other container could use the .5 CPU and the allocating container would be hard-capped at .5 CPU.

    In Linux containers this is not an issue. You can allocate dozens of containers on a single host to use .5 CPU and they would (a) all share the full 100% CPU resources available, (b) never be hard-capped, and (c) only use the .5 CPU hard cap once the CPU reached 100% utilization and it needed to share the CPU between two containers that were fighting over the CPUs time.
  • The gMSA issue that was brought up in previous Is SQL Server looking to Dockerize on Windows? post.

Even with those issues, I was curious about what AWS was doing with containers in hopes that they had the same idea that I did: We should be able to give a container image to a service and the service just figures out everything needed to run it and maked it happen. And they did: AWS Fargate.

But!! …

They were also frustrated with the permissions and gMSA security issues that the Windows OS introduced into the equation. And, as such, they don’t support Windows Containers on Fargate. They don’t directly say that they don’t support it because of the gMSA/permissions issues, but when you look at what needs to be done to support gMSA it becomes an easily rationalized conclusion. Here’s what it looks like to use a gMSA account on a Windows Container (with all the secret/password storage and management removed):

  1. Create a gMSA account in Active Directory.
  2. Select the Docker Host that will host the new container instance.
  3. Update Active Directory to register the gMSA to be usable on that Docker Host.
  4. Register the gMSA on the Docker Host (checks with Active Directory to validate the request).
  5. Start the container, and you’re now able use the gMSA account within the container.
  6. You’ll need to reapply the registrations (steps 2-4) for each Docker Host that the container will run on.

With a fully automated provisioning process, that’s not that difficult. It’s really doable in fact. However, here’s the list of difficult specifics that a fully managed Kubernetes infrastructure (like Fargate) would have to deal with:

  1. Where is the Active Directory located?
  2. Are the networking routes open for the ECS/Fargate infrastructure to it?
  3. Are there other security requirements?
  4. What versions of Active Directory are supported?
  5. etc, etc, etc …

I don’t know at what bullet point you just *facepalm* and say “We’re not supporting this.”

But!! …

Figuring out all the details of this should be in the wheel house of Azure, right? It’s the Microsoft OS and platform, they are probably working on this problem.

So, here’s what the landscape looks like today with AKS:

So there’s the update.

A somewhat scientific analysis of a quality issue

on Monday, August 26, 2019

I was recently working through a concern over the quality of data being entered manually which was causing a number of reprocessing requests. The person that performed the actual task of processing the incoming requests was noticing that there was a number of requests which were nearly duplicates of previous work and they needed to remove / clean-up the original work along with processing the new work. This was classic Muda and I wanted to figure out why.

The person that had reported the issue was convinced that the solution to prevent the reprocessing was to add a manual review step in the process; where the original request would be routed back to another group for review before continuing to their station for execution. The description he gave of the problem made sense, but I had had some direct involvement with some of the people putting in the requests. And when the requests were being submitted I was actually somewhat involved. So, something didn’t feel quite right and I wanted to dig into the data.

Background: The process was the addition of a database role onto a service account. The input data for the process were: Database Name, Service Account Name, Role Name, CreateRole (true/false), and Environment (dev/test/prod/all).

After getting ahold of the data this is what it looked like:

44584 - Db1, ISVC_Acct1, Role1 (all)
44582 - Db1, IUSR_Acct2, Role1 (all)
44536 - Db2, ISVC_Acct3, Role2 (all)
44504 - Db3, ISVC_Acct4, Role3Role (all) - Reprocessing (Bad Name) - Pulled name from Documentation. Docs were later corrected. Docs written by maglio-s. see 44447
44449 - Db4, ISVC_Acct4, Role3 (all)
44448 - Db3, ISVC_Acct4, Role3 (all) - Reprocessing (Wrong Database) - Developer didn't read documentation closely enough. Docs written by maglio-s. see 44447
44447 - Db1, ISVC_Acct4, Role3 (all)
44360 - Db5, ISVC_Acct1, Role4 (all)
44359 - Db6, ISVC_Acct5, Role5 (all)
44358 - Db6, ISVC_Acct1, Role6 (all)
43965 - Db1, IUSR_Acct6, Role1 (all) - Reprocessing (Bad Name) - Pulled name from Documentation. Docs were later corrected. Docs written by maglio-s. see 43960
43964 - Db7, IUSR_Acct6, Role7 (all)
43963 - Db7, IUSR_Acct6, Role8 (all)
43962 - Db7, IUSR_Acct6, Role9 (all)
43961 - Db7, IUSR_Acct6, Role1Role (all)
43960 - Db1, IUSR_Acct6, Role1Role (all)
43959 - Db8, IUSR_Acct6, Role10 (all) - Extra Message - Db8 didn't yet exist in Prod. This wasn't a problem that affected the results or required reprocessing.
43585 - Db9, IUSR_Acct7, Role11 (dev) - Extra Processing - Detected problem with script (updated bot after next Deployments / Dev Support meeting)
43295 - Db11, SVC_Acct8, Role12 (prod)
43294 - Db11, SVC_Acct8, Role12 (test)
43256 - Db7, IUSR_Acct9, Role8 (all)
43255 - Db7, IUSR_Acct9, Role9 (all)
43254 - Db7, IUSR_Acct9, Role7 (all)
43144 - Db3, ISVC_Acct10, Role3Role (all)
43088 - Db10, SVC_Acct11, Role13 (all)
43087 - Db1, SVC_Acct11, Role1 (all)
43086 - Db1, SVC_Acct11, Role14 (all)
43063 - Db11, SVC_Acct12, Role15 (prod)
42918 - Db11, SVC_Acct12, Role15 (test)
42920 - Db12, SVC_Acct12, Role16 (all) - Reviewed Before Running / Reprocessing (Bad Name), see also 42919
42921 - Db12, SVC_Acct13, Role16 (all) - Reviewed Before Running - CJ determined it wasn't necessary (requestor: maglio-s)

(*maglio-s = me; I figured I might as well out myself as the guilty party for a lot of these.)

It doesn’t look like too much reprocessing, until you look at and break down of the overall defect rates:

image

Overall there were 6 defects: 4 reprocessing needed, 1 reviewed and rejected, and 1 bug during processing. That’s 20% defects, with 13.3% reprocessing.

Upon the first review, there did seem to be a data quality issue, but more of an issue with my documentation and people trusting my documentation. If the engineer that was reporting this data quality issue was trying to get me to improve my thoroughness without pointing a finger at me; then good job!

But, when I was talking with the engineer that reported the issue, they were adamant that it wasn’t a single person but an overall quality issue. I couldn’t totally agree with them, but there was definitely was a quality problem. Now, how do we improve the quality?

As mentioned earlier, for the engineer, the solution was to add a manual review step by another group before it got to him for processing. But, that was something I was adamantly trying to avoid. I wanted to avoid it because:

  • It would take a manual process and move that manual labor to another group, rather than replace it.
  • The other group would need to be consulted because it was going to increase their workload, and they would need to add their own ideas and solutions into the conversation.
  • It wasn’t creating a smaller feedback loop for the requestor to figure out if they had submitted bad input.

I’m a fan Henrik Kniberg’s saying, “Manage for the normal, treat the exceptions as exceptional.

Each of these reprocessing issues seemed (to me) to be exceptions. And I wanted to deal with each one as an exceptional case rather than implement a new review step that would become part of the normal process.

The easy part of dealing with each one as an exception, is that you don’t have to change the overall process. And, because I had already been involved in resolving some of them earlier the implementation cost of correcting the documentation and “fixing the bug in the bot” were already taken care of.

However, neither of these approaches really seemed like they were going to be a sure fire way to ensure the quality of the data increased. They both felt like they required a “let’s wait and see if this fixes the problem” approach. And the reporting engineer had a really good point that we need to improve the quality and lower the amount of reprocessing work.

But then something new started to stand out. At the top of this article I mentioned the inputs to the system. One of the inputs to the system that didn’t make it into the analysis data was the parameter CreateRole. In the original implementation of the system, if the role in the database didn’t exist, the script which added the database role would fail. The CreateRole flag was asked for by the development team, so they could indicate to the engineering team that the role would need to be created. The engineering team looked at this problem and fixed the system by ALWAYS creating the role if it didn’t exist. And this is where the heart of the confusion occurred. The development team thought that if CreateRole was set to ‘false’, and the role didn’t exist, then the system would throw an error. The assumption was that even if they got the name wrong, it would be fine because the system wouldn’t create a new role that wasn’t asked for.

After looking at the new information, 3 out of the 4 reprocessed requests (75%) we’re all attributable to the CreateRole flag being ignored. So how do we improve the system?

Multifold:

  • Hold myself to a higher standard when writing documentation in order to prevent downstream team members from using the wrong names.
  • Ensure that Role names are unique enough to not be confused with each other. (The ones that needed to be reprocessed had Role names that were really similar to other Role names.)
  • Add a fast feedback loop, by setting up the input mechanism to verify if a role exists at the time the request is put in (if the CreateRole flag is set to false).

The most important change that came from the data analysis was introducing a new fast feedback loop. And, I don’t think we would have found it without analyzing the data. It’s a hard discipline to gather the metrics, but we need to start doing it much more frequently and with greater detail.

Promoting Vision and Behavior through Validation

on Monday, August 19, 2019

Building out a new architecture requires many processes and behaviors to be re-evaluated and modified. Those changes take time to refine, communicate and become embedded within a departments teams. And, when it comes to changes within the understanding of how processes works, it’s a very slow process which takes a large amount of repetition of intentions in order to build a communal understanding of a larger vision. And, usually, a lot of the details get lost when you’re trying to shift your mindset from one large vision over to a new large vision. The small stuff doesn’t really sink in until the large stuff has taken root.

But, there are minor opportunities to help remind team members of those smaller details when implementing validation checks within your systems. Software developers do this all the time when developing websites for their end customers. But, it isn’t done as often for internal facing applications or infrastructure pieces. For example, Domino’s requires you to put in a telephone number when you order a pizza for delivery, because if their driver becomes lost they will need to call you. The error message on the Domino’s order form does say the field is required, but it also has a little pop-up tip of why it’s required. When was the last time you saw an internal facing application explain why it wants you to do something in a validation message?

Its difficult to know when it will be useful to put in the extra time to adjust an error message in order for it to be more informative. So being on the look out for anyone that is reporting an issue and directly saying, “I didn’t know we should do it that way” is very helpful in locating places where a little more attention could go a long way.

Here’s an example. This was the original error message:

Service Accounts should start with ISVC_, IUSR_, or SVC_. Domain Account = '{0}'. AD User and Group accounts can also be used. Service Accounts should not start with '{domain name}\'

And after the update, here’s the new error message:

Service Accounts should start with ISVC_, IUSR_, or SVC_. Domain Account = '{0}'. AD User and Group accounts can also be used. Service Accounts should not start with '{domain name}\'. Please see `serviceaccount guidelines` for more information.

And this is what `serviceaccount guidelines` provides:

Service Accounts

Service accounts cannot be over 20 characters long including the endings of 'Dev' and 'Test'. Because of that, the maximum allowed length is 16 characters, which should include the prefix.

Prefixes

    IUSR_   Stands for IIS User Account. Used with websites that are seen/used by end users.
    ISVC_   Stands for IIS Service Account. Used with webservices and webjobs.
    SVC_    Stands for Win Services Account. Used with Windows Services and Scheduled Tasks.

Example

    ISVC_OrgAppName{env}


When choosing names for accounts, try to avoid redundancies within the name. For example, IUSR_DoorWebApp{env} would have 2 redundancies. 'Web' is redundant because the prefix of 'IUSR_' indicates it's a top level web application. And, 'App' is redundant because the prefix of 'IUSR_' indicates it's a web application. Another example of redundancy would be to add 'Svc' in an 'ISVC_' account name, eg. 'ISVC_DoorSvc{env}'.

It’s a small addition, but it has two effects. First, it’s communicating out a group of standardized application types and how to signal to other’s what the role of your application is. It’s also empowering the team members to have the information necessary to make solid decisions without needing to reach out to other teams (who may be busy with other work).

It’s extra overhead to put together the extra documentation, but it can definitely be worth it.

Powershell: Using a file hash to test for a change

on Monday, August 12, 2019

The PowershellForGitHub module is great! But … sometimes it can be a bit verbose when it’s trying to help out new users/developers of the module. This isn’t a bad thing in any way, just a personal preference thing. And, the module owner, Howard Wolosky, is really open to suggestions. Which is great!

So, I opened a ticket (PowerShellForGitHub Issue #124) to explain my confusion over the warning messages. And, to be fair, I explained my confusion in a very confusing way. But, he was nice enough to work through it with me and we found that something we needed was a way to tell if someone had updated a settings file after downloading the module on their machine.

Enter, Get-FileHash.

This command looks like it’s been around for quite a while, but it does the classic job of creating a hash of a file. And, that hash can be stored in code, so that it can be used to check if a change in the file has occurred.

So, how to use the check.

Here’s the original code:

And, here’s the updated code using Get-FileHash:

PowerShellForGitHub–Adding Get-GitHubRelease

on Monday, August 5, 2019

PowerShellForGitHub is an awesome powershell module for interacting with the GitHub API. It has a wide set of features that are already implemented and its supported by Microsoft(!!). You can also tell the amount of care the maintainer, Howard Wolosky, has put into it when you dig into the code and read through the inline documentation, contributing documentation and telemetry support(!!). BTW, if you ever need to create a PII transmittable string, check it out: Get-PiiSafeString.

One of the features I was looking for the other day was the ability to retrieve a list of releases for a repository. (I was building a script to monitor the dotnet/core releases; in order to build an ASP.NET Core Hosting Bundle auto-installer.)

I submitted a pull request of the update last week and was really impressed with all the automation in the pull request processing. The first thing that surprised me was the integrated msftclas bot (Microsoft Contribution License Agreements), which posted a legal agreement form that I (or the company I represent) consent to give Microsoft ownership of the code we contribute. It was soo smooth and easy to do.

Next was the meticulous level of comments and review notes on the pull request. If he made all those comments by hand, holy moly! That would be amazing and I would want to praise him for his patience and level of detail. Hopefully, some of the comments were stubbed out by a script/bot; which would be a really cool script to know about.

So, I’m gonna go through the comments and see if I can update this pull request.

  • *facepalm* Wow, I really missed changing the name GitHubLabels.ps1 to GitHubReleases.ps1 in the .tests.ps1 file.
  • White space in .tests.ps1: Ahhh … I can better see the white space formatting style now.
  • Examples missing documentation: Hahaha! My mistake. It looks like I started writing them and got distracted.
  • Telemetery: I loved the note:

    For these, I think it's less interesting to store the encrypted value of the input, but more so that the input was provided (simply in terms of tracking how a command is being used).

    Thank you for pointing that out! It makes complete sense.

In summary, a big Thank You to Howard Wolosky and the Microsoft team for making this module! It was a huge time saver and really informative on how to write Powershell code in a better way.

Pester Testing Styles

on Monday, July 29, 2019

Pester is a great testing framework for Powershell. And it can be used in a variety of different testing styles: TDD, BDD, etc. I’m going to look at two different styles, both of which are perfectly good to use.

TDD’ish with BeforeAll / AfterAll

Lines 4 through 7 are used to ensure that module don’t get repeatable imported, when this tests are run as part of a Test Suite. However, they will allow modules to be reloaded if you are running the individual test file within VSCode. For the most part, they can be ignored.

In this more Test Driven Development style test

  • The Describe blocks name is the function under test
  • And each It test is labelled to describe a specific scenario it is going to test
  • All the logic for setting up the test and executing the test are contained within the It block
  • This relies on the Should tests to have clear enough error messages that when reading through the unit tests output you can intuit what was the failing condition

This is a very straight forward approach and it’s really easy to see how all the pieces are setup. It’s also very easy for someone new to the project to add a test to it, because everything is so isolated. One thing that can really help future maintainers of a project is to write much lengthier and more descriptive It block names than the ones in the example, in order to help clarify what is under test.

Some things to note:

In this setup, the BeforeAll script is used to configure the environment to be ready for tests that are about to be run. Over time, this function has been replaced with BeforeEach, but for this example I’m using BeforeAll. The BeforeAll is setting up some values that I want available when the test is run, or a variable I want available when the test is run. I put a prefix of $script: on the variable created within the BeforeAll function because I have seen behavior where the variable was no longer defined outside of the scope of BeforeAll.

The AfterAll is a corresponding block to the BeforeAll, and is pretty self explanatory. The interesting part of the these two blocks is that they have to be declared within the Describe block and not within the InModuleScope block. They will not be run if they are declared in the InModuleScope block.

BDD’ish with try / finally

Lines 10 and 11 are used to ensure that that module has been configured correctly (for normal usage … not specific to the tests) and ensuring that the module isn’t being reloaded when being run in a Test Suite.

In this more Behavior Driven Development style test

  • Uses the Describe block to outline the preconditions for the tests
  • Immediately following the declaration of the Describe block, it has the code which will setup the preconditions
  • Uses the Context block to outline the specific scenario the user would be trying
  • And, immediately following the declaration, it has the code which will execute that scenario
  • Uses the It blocks to outline the specific condition that is being tested.
  • This requires more code, but makes it clearer what condition actually failed when reviewing unit test output

This is not as straight forward of an approach, as different areas of the code create the conditions which are being tested. You might have to search around a bit to fully understand the test setup. It also adds a little more overhead when testing multiple conditions as you will be writing more It block statements. The upside of that extra work is that the unit test output is easier to understand.

Some things to note:

In this setup, variable scope is less of an issue because variables are defined at the highest scope needed to be available in all tests.

The BeforeAll/AfterAll blocks have also been replaced with try/finally blocks. This alternative approach is better supported by Pester, and it can also help new developers make a key insight into the way Pester tests are run: They are not run in parallel, but instead are run in order from top to bottom. Because of this, you can use some programming tricks to mock and redefine variables in particular sections of the code without having to worry about affecting the results of other tests.

Deleting cached metadata from local NuGet

on Monday, July 22, 2019

The Issue

This article will help you resolve a nuget issue that involves packages which have case-sensitive misspellings in their dependency lists. Here is a sample error message:

Install-Package : Unable to resolve dependency 'Microsoft.Extensions.COnfiguration.Builder'

The Conditions

This is a very specific scenario. So, it takes a couple of preconditions to create:

  • You have to have attempted installing the nuget package into a project.
  • And, the installation had to fail with the above message. Here’s a screen shot for more context.


  • The installation needs to leave your system with the metadata for the package downloaded, but the actual .nupkg is no where on disk.

The Solution

In this scenario, your package metadata cache (called the http-cache) has been updated with the packages dependency list. So, the next time you attempt to call nuget.exe; instead of fetching a fresh copy of the packages metadata from the source, it will use the cached version. To fix this, we’ll need to remove the cached metadata.

  1. Find the package metadata store on your computer (reference docs):

    nuget locals all –list


  2. Use the http-cache value provided, and open the folder in file explorer. And, then find the subfolder which matches the package source.


  3. Within the subfolder, find the package you’re looking for and delete it.


  4. Now, you are ready to reinstall the package from nuget using the latest metadata information.

RSAT Setup on Windows 10 1903 - 0x800f0954

on Monday, July 15, 2019

Windows 10 1803 was the last time that Windows 10 had a separate RSAT download bundle. This the note from the download page:

IMPORTANT: Starting with Windows 10 October 2018 Update, RSAT is included as a set of "Features on Demand" in Windows 10 itself. See "Install Instructions" below for details, and "Additional Information" for recommendations and troubleshooting. RSAT lets IT admins manage Windows Server roles and features from a Windows 10 PC.

This is great! It makes re-installation of the RSAT tools just a little bit easier; and a little bit more aligned with automation.

A very nice Microsoft MVP, Martin Bengtsson, saw this new direction for installation and built out an easy to use installation script written in powershell. Here’s a blog post on what it does and how to use it.

The download of the script, execution and setup would have been pretty easy except for one thing … Error 0x800f0954.

It turns out that you need to enable a Group Policy that will allow your machine to download the optional RSAT packages from Windows Update servers instead of your on-premise Windows Server Update Services.

Luckily, Prajwai Desai has already figured this out and has an easy to follow set of instructions to update your Group Policy and allow for the download to occur.

Basic Install-WindowsTaskTemplate

on Monday, July 8, 2019

I don’t install powershell scripts as Windows Tasks every day (any probably need to find a way for another system to manage that responsibility), so it’s easy to forget how to do them. Here’s a quick template to install a Windows Task on a remote machine:

Fun Little Cryptogram

on Monday, July 1, 2019

There’s an interesting site https://ironscripter.us/ which creates powershell based scripting challenges for practicing DevOps thinking and continual learning. It’s kind of like a kata website with a funny “Battle for the Iron Throne” feel to it.

A few days ago they posted a really small cryptogram to find a hidden message within some text. I say really small because I have a coworker that is active in crypto games and the stuff he does is mind blowing (https://op011.com/).

Ironscripter’s challenge is more light hearted and just a quick game to help you think about powershell, string manipulation, visualizing data to make it useful and so on. So, here’s my solution to the challenge.

(I think I might go back later and use a language file to try and match the text in the possible solutions; instead of trying to look through them manually.)

(Thanks to David Carroll for pointing this site out: His solution)


Creative Commons License
This site uses Alex Gorbatchev's SyntaxHighlighter, and hosted by herdingcode.com's Jon Galloway.