Implementation Efficiency Frustration Tiers

on Monday, February 17, 2020

For me, a lot of stress and anxiety about working efficiently comes from the momentary feelings of being ineffective. If I need to accomplish a work item, how long will it take to complete that work item? How many sub-tasks do I need to complete before I can complete the work item? How many of those do I understand and can do with a minimal amount of effort, and how many do I need to do research for before I can even begin implementing them? The more time, energy, and amount of knowledge that has to be gained to complete a work item, the more stressful it becomes to complete it.

So, I wanted to take a moment and start to break down those feelings into categories. I read Scott Hanselman’s Yak Shaving post years ago, and it has become a part of the shared language among the development teams I work with. Before reading that post, I had described the act of Yak Shaving as “speed bumps”; but I would have to explain it every time I used it. Hopefully, getting this written down can help me define a language so I can communicate this feeling more easily.

At the moment, the feeling of implementation efficiency can be broken down as:

Tier 3

This is when you need to implement something, but in order to do it you are going to need to learn a new technology stack or a new paradigm in order to complete it. The task you’re trying to complete could be something as trivial as adding Exception Handling to an application, but in order to do it, you’re going to research APM solutions, determine which best fits your needs and then implement the infrastructure and plumbing that will allow you to use the new tool.

An example of this might be your first usage of Azure Application Insights in an ASP.NET Core application. Microsoft has put in a tremendous amount of work to make it very easy to use, but you’ll still need to learn how to Create an Application Insights resource, add Application Insights into an ASP.NET Core application, re-evaluate if you created your Application Insights resource correctly to handle multiple environments, and then most likely reimplement it with Dev, Test, Prod in mind, determine what common parameters which are unique to your company should always be recorded, and then work with external teams to setup firewall rules, develop risk profiles and work through all the other details necessary to get a working solution.

Tier 3 is the most frustrating because you have learn so much yourself just to get to your end value. So, for me, it’s the one that I also feel the most nervous about taking on because it can feel like I’m being incredibly inefficient at doing so much work to produce something that feels so small.

Tier 2

This is when you already have all the knowledge of how to do something and you understand what configuration needs to take place, but you are going to have to do the configuration yourself. When you know at the beginning exactly how much work it will take to complete, there is a lot less frustration because you can rationalize the amount of time spent for the end value that’s achieved. The moment this becomes frustrating is when the extra work that you’re putting in is a form of Yak Shaving. For example, when you are dealing with a production issue, and you realize that you’re going to need to implement component X in order to get the necessary information in order to solve the problem, that’s the moment you heavily sigh because you realize the amount of hand work you’re going to have to put in place just to get component X working.

This level of efficient usually happens when your working on the second or third project that you’ve used a particular technology stack with. Let’s use Application Insights as the example again. You’ve probably already developed some scripts which can automatically create the Application Insights instances, and you’re comfortable installing the nuget packages that you need, but you still need to run those scripts by hand, and setup permissions by hand, and maybe even request firewall rules to be put in place. None of these tasks will really take up too much time, but it feels like wasted time because your not producing the real end value that you had in mind in the first place.

Tier 1

This is when the solution is not only well known to yourself, but your organization has developed the tooling and infrastructure to rigorously minimize the amount of time spent on implementing the solution. This doesn’t come cheap, but the peace of mind that comes with having an instantaneous solution to a problem are the moments that make work enjoyable. The ability to stumble upon a problem and think, “Oh, I can fix that”, and within moments you’re back to working on whatever you were originally doing creates a sense that any problem can be overcome. It removes the feeling that you’re slogging through mud with no end in sight, and instead that feeling is replaced with confidence that you can handle whatever is thrown at you.

It’s rare that you can get enough tooling and knowledge built up in an organization that Tier 1 can be achieved on a regular and on going basis. It requires constant improvement of work practices and investment into people’s knowledge, skillsets, and processes to align the tooling and capabilities of their environment with their needs.

When creating working environments, everyone starts out with a goal of creating a Tier 1 scenario. But, it seems pretty difficult to get there and maintain it.

This one of the pieces I can find very frustrating about security. There is a lot of information available about what could go wrong, and different risk scenarios, but their just isn’t a lot of premade tooling which can get you to a Tier 1 level of Implementation Efficiency. People are trying though: OWASP has the Glue Docker image, and Github automated security update scanner is fantastic, and NWebSec for ASP.NET Core is a step in the right direction. But, overall, their needs to be a better way to get security into that Tier 1 of Implementation Efficiency zone.

3rd Party Event Tracing Calls in Apigee

on Monday, February 10, 2020

Apigee has information on their website which makes event tracing of calls to a 3rd party system relatively easy. But, the information is spread out over a couple of pages. To provide this functionality effectively, you’ll want to use two different features together:

  • Use a PostClientFlow to ensure the event logging is performed after the response is sent to the client.
    &nsbp;
  • Use a ServiceCallout Policy, with the <Response /> element removed. This will ensure the call to the 3rd party system is done as a Fire-and-Forget call, rather than one that waits for a response before continuing processing.

    There is a MessageLogging Policy, which is specifically designed for this logging scenario. However, the MessageLogging policy doesn’t allow for Header information to be added into the call; and there are a number of 3rd party logging systems (like Splunk) which use the Authentication header to verify the incoming caller.

The end result of making these changes looks a little like this:

The 5 steps within the workflow taht are grouped by a red box show a group of 2 service calls which are each logging to separate 3rd party systems (we wanted to compare the two products to see which would fit our needs better). In the top left red box is the complete processing time within Apigee, 78 ms. And the small red box at the bottom right (in Postman) is the amount time from the client’s perspective, just 46 ms.

To do this, you’ll want to setup a shared flow that will make the ServiceCallout's. Remember that each ServiceCallout should remove it’s <Response> element:

Once that’s in place, you’ll just need use the shared flow as part of a <PostClientFlow> within you API’s. I wish this was an element I could use within the Post-proxy Flow Hook; that way I could add it to all APIs in one place.

A communication benefit of microservices

on Monday, February 3, 2020

Recently, a blog post called Monoliths are the Future caught a coworkers attention, and he had some interesting questions that stemmed from it:

You know what's interesting; I feel like we all get this sense that we _have_ to be doing microservices

Like any existing architecture is garbage and, regardless of your business or technical constraints, you are failing if you don't immediately make a wholesale switch over to microservices

And so now I feel like I see more and more articles saying, "Hey, whoever told you that you needed to stop everything and do microservices was wrong. You should take both designs into consideration and implement something that meets the needs of your business/technical constraints."

I'm just interested in where the all-or-nothing perspective leaked in; was it our perception as self-conscious technologists? Or was it click-baity writing?

For me, I think a fair share of the attention that microservices gets stems from the email that Jeff Bezo’s wrote around 2002 that was made famous by a Google+ rant from Steve Yegge. In the post he outlined Jeff Bezo’s “Big Mandate” of:

  • All teams will henceforth expose their data and functionality through service interfaces.
  • Teams must communicate with each other through these interfaces.
  • There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
  • It doesn’t matter what technology they use.
  • All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

This list wasn’t shared on the internet until 2011, and SOA and microservices had already become popular at that point. But, what Bezo’s list did was create the building blocks for AWS, and AWS’s incredible success is what I consider to be one of the largest factors in people looking at and believing that microservices are the way to be successful. Many people don’t understand the reason behind AWS’s microservice based success, they simply believe the false equivalency that they will achieve AWS’ successes by using microservices.

But, for me, that list from Bezo’s captures an underlying management strategy that is far more impactful that the microservices themselves. What that list succinctly describes is that all interactions between the systems at Amazon/AWS would now be based around contracts and interfaces that are developed and managed by service providers. This means that the service provider is responsible for working with their customers to develop a usable and meaningful service contract that guarantees that if the client can provide inputs X then their system will provide service Y. They guarantee that the service will be available 24 hours a day. They guarantee that it will be usable without requiring any communication between the two separate teams, without the overhead of one-off reviews and without any specialized approval processes.

This contract-first always-available platform approach effectively breaks one of the most difficult constraints on all projects: Communication Overhead.

In The Mythical Man Month, Fred Brooks describes that when you add another person onto a project you’re also adding an exponential number of points of communication to that project team. If you had 10 people on a project and you add an 11th, then you just added 10 more avenues of communication and slow down.

Within teams that are highly effective, a significant fraction of their productivity comes from their shared knowledge about what they’re building and the goals they’re trying to achieve together. Agile seemingly builds on the back of the lessons learned from the Mythical Man Month by trying to reduce that communication overhead. Agile uses the daily stand-up to build shared knowledge and shared vision on a daily basis. The daily stand-up is where a team member can quickly ask “I want to do X because I think it will give us benefit Y. Is everyone on board with that?” Because everyone at the daily team meeting has been sharing their knowledge, that statement wouldn’t require a great deal of time explaining the context and history of how the thought came to be or why it would be beneficial. The reduction in explanation time between those team members is one of the aspects that makes that team effective.

Oppositely, when you have to communicate outside of your team, that’s when you introduce a communication constraint. Bringing another team into the conversation and bring them up to speed takes time. You have to explain to them where your team is at and get them into your teams mindset. The hardest part of that conversation is that while you’re explaining the background and reasoning for how your project got to it’s current point, the new team is going to view every decision that was made along the way through their current understanding and their guiding principles; not through your teams current understanding and guiding principles. This leads to multiple levels of slow down and overhead, not to mention the hardest thing of all: disagreement.

What Jeff Bezo’s email did was lower the overall cross team communication overhead company wide at Amazon. If your project could satisfy the requirements of interface/contract X, then the service would provide the results Y at any time of day with no waiting for two teams to find time to meet, evaluations of requirements and agreement on purpose of product, or the time consuming process of bringing individuals with differing view points into alignment.

But, many people don’t think about that side of it. I feel like the original designers of microservices have tried to stress the importance of Team Autonomy in Microservices as a key component to ensuring that each team/each service provider can create new functionality without requiring them to derive agreement and get sign-off from another team. In a monolithic database project that I have worked on, I have seen this problem occur around the very few but very important tables that multiple teams share. They require a great deal of communication overhead to ensure all teams are aware, have analyzed and reviewed, and have implemented plans to handle any consequences of an update; all done before even the first action can be taken on the update.

But, as Fred Brooks said, there is No Silver Bullet, and reducing cross team communication overhead is just one piece of a much larger and more complicated puzzle of making an effective working environment.

So, I think “the all-or-nothing perspective leaked in” because of a combination of things: AWS’s great success, Martin Fowler (et.al.) evangelizing mircoservices, and the audience that was absorbing this information not having the full perspective of what was truly driving the benefits.

As if I haven’t already given my two cents … here were my actual thoughts on the original blog post, Monoliths are the Future:

  • There were a number of statements within the post (and I have not listened to the audio) which made me believe that an underlying problem the speakers were grappling with was low code quality standards and coding practices at their companies. For example, statements like “we lost all of our discipline in the monolith”, “they’re initiating things and throwing it over a network and hoping that it comes back”, and “Now you went from writing bad code to building bad infrastructure”.

    I believe that ensuring quality within the products and services you provide is a critical necessity for anything to be successful. In Lean Six Sigma practices, defects are one of the critical wastes. You must ensure high quality and resilient code in order to reduce the amount of time spent on rework.

    You don’t have to use microservices, or Gang of Four, or XYZ to ensure high quality standards; but the company you work for has to define high quality services and products as one of their highest priorities. From there, the people at the company will develop the standards, processes, and tooling to ensure that they are creating high quality products and also monitoring that those standards are upheld every day.
     
  • The line “There are reasons that you do a microservice. So, to me a microservice makes sense in the context of…” was a silver lining to me.

    The writer was outlining that when he can see value with using a particular approach, then he is on board with making that approach successful. This is probably the most important aspect of choosing any architectural approach. If the people that are implementing the approach can see the value within it, then they are not only on board with making it happen, they will find ways to make it better than it was originally designed.
     
  • There is an aspect of microservices where monolithic datastores are separated into small autonomous datastores. This separation is to improve team autonomy and lower communication overhead. But, there is a flip-side to creating those small autonomous datastores. Besides the obvious duplication of data, it also creates a new need of bringing the data back together in order to analyze it from a system wide perspective. Whenever data stores are broken apart, there is a need to create a new centralized datastore for reporting and business insights. These have most recently been coming up as Big Data, Data Lakes, and other data collectors for Business Intelligence and Data Analytics platforms.

    So, eventually, you always get back to a monolithic data store; but maybe not a monolithic application.
     
  • My last thought is pretty negative. Again, I haven’t listened to the audio, so this might be completely off base. I just don’t get a strong sense that the author of the article or the speakers being quoted are really thinking about things from an overall workload productivity perspective. How are all the people of the company working together to make the company’s product? What are the processes that the company has in order to make those products? And, what are the most critical foundational principles that the company needs to do in order to make those processes as effective as possible? If you have the answers to those questions, then the question of microservices vs monoliths will have a clear answer.

ExceptionHandler Needed

on Monday, January 27, 2020

As a follow-up to the Create a Custom ProblemDetailsFactory post, it has been discovered that a custom Exception Handler must be defined in order to use the ProblemDetailsFactory. The Exception Handler can be incredibly simple, but it must produce an IActionResult that will trigger the calling of a ProblemDetailsFactory. This can be accomplished with something as simple as this:

The controller only needs to create a Problem IActionResult to trigger the usage your ProblemDetailsFactory.

To ensure that YourBussProblemDetailsFactory is used, you can create two extension functions to use during Startup.cs’s ConfigureServices and Configure methods. That might look like this:

And the extensions functions that provide the wiring would look like the code sample below. Some noticeable pieces in the example are:

  • Within the IServiceCollection extension method, .AddMvc().AddApplicationPart(thisAssembly) is used to ensure that the controller from above is included within the top level application. This is how you can add controllers from sub-libraries.
  • Within the IApplicationBuilder extension method, the final wiring to setup the global exception handler is created to “/your-buss-exception-handler”.

SSH Key Auth to GitHub on Win 10 w/ VSCode

on Monday, January 20, 2020

My work computer has been using SSH keys to authenticate to GitHub for a while. But I’ve slept a few nights since I set that up and I have no clear memory of what I did.

I wanted to setup my home computer the the same way and struggled to figure out how to do it. So, I thought it might be worth documenting.

The secret (I think) … install git 2.20.0 or higher

In the end, the final change that made SSH key authentication work was updating my git installation from version 2.15.0 to 2.25.0. My work computer has 2.20.0 on it, so I figure that should be the minimum level.

Here’s an outline of the things I tried and notes about them:

Work Computer Home Computer Notes
git (version) 2.20.0 2.15.0 –> 2.25.0 Didn’t work with 2.15.0. Finally worked with 2.25.0.
SSH keys I don’t remember how I generated them. I think I generated the keys using ubuntu WSL.

Copied them from my work computer to my home computer using normal NTFS system (didn’t need Git Bash, WSL, or any of those).

I did register the keys using ssh-add in `Git Bash`, ‘wsl’, and using the Windows 10 ssh-add (see Notes).

But, in the end, I turned off Win10’s ssh-agent service and the SSH keys continued to be used for authentication.

On my current version of Win 10, you can start an ssh-agent service in windows. Which means you don’t need to to use a bash command prompt to execute `ssh-keygen` or `ssh-add` commands.

Reminder: use `ssh-add –l` to list already registered keys.
Github PAT I never created one for this machine. I created one for this machine, and it would work for an individual commit (username: normal github account name, password: PAT)

I don’t think this is needed.

Reducing Noise in Error Logs / Removing PS Errors

on Monday, January 13, 2020

I have a nightly scheduled job which will send out a notification email if the job has an error occur anywhere within it (even when the error is handled). This job infrequently sends out the error email. However, my long history of reviewing these emails has brought me to the point where I assume the error is always:

  • There was a file lock on file X when the file was being save; the function detected the error, waited a brief time period for the lock to clear and then retried the save operation successfully.

I can't remember a time when that wasn't the case. Because of this, I am finding myself less interested in actually reading the error message and desiring to simply ignore the email. But, I know that is going to lead to a situation where something unexpected will happen and I'll ignore the warning emails. Which would be a failure of the entire warning system.

So, what I have is a very narrowly defined and well known case of when the exception occurs, and I have a desire to ignore it. If I setup the code to simply suppress this error after the save operation successfully completes, then I should be able to safely reduce the amount of noise in the error messages that are sent to me. (It should still report the error if the retries never complete successfully)

This is a very common scenario: Teams setup a warning mechanism that is highly effective when a system is first built. At that time, there are a myriad of possible unforeseen errors that could occur. There also hasn’t been enough operational history to feel that the system is stable, so being notified on every potential problem is still a welcome learning experience. As those problems are reduced or eliminated it builds trust in the new system. However, it’s also very common that once a team completes a project and does a moderate amount of post deployment bug fixes, they are asked to move on and prioritize a new project. Which gives no devoted / allocated time to maintaining small and inconsistent issues that arise in the previous project(s).

Unfortunately, the side effect of not giving the time needed to maintain and pay down the technical debt on the older projects is that you can become used to “little” problems that can occur on them; including ignoring the warning messages that they send out. And this creates an effect where you can start to distrust that the warning messages coming from a system are important, because you believe that you know the warning is “little” or “no big deal”.

The best way to instill confidence in the warning and error messages produced by a system is to ensure that the systems only send out important messages, separating the Signal from the Noise.

For my scenario above, the way I’m going to do this is to prevent these handled errors from sending out notification emails. This goes against best practices because I will need to alter the global error monitor in Powershell, $global:Error. But, given that my end goal is to ensure that I only receive important error messages, this seems like an appropriate time to go against best practices.

Below is a snippet of code which can be used to remove error records from $global:Error that fit a given criteria. It will only remove the most recent entries of that error, in order to try and keep the historical error log intact.

You need to be careful with this. If the error you’re looking for occurs within a loop with a retry policy on it, then you need to keep the errors which continued to fail beyond the retry policy, and only remove future errors when the retry policy succeeded. You can better handle the retry policy situation by using the –Last 1 parameter.

Book Review?: The Unicorn Project

on Monday, January 6, 2020

The Unicorn Project (amazon, audible, supplements: itrevolution) is a new book/follow up of The Phoenix Project by Gene Kim.

And, it’s much more inline with what I was expecting the The Phoenix Project to be. The Phoenix Project focused on The 3 Ways with a strong emphasis on it’s connection to Lean Management. This was done intentionally as the book was supposed to be a retelling of The Goal done with DevOps in mind. In order for The Phoenix Project to tell it’s story it needed to be told from the perspective of someone who was required to see the whole picture of the company, to facilitate understanding of The First Way. To do that, the protagonist is a high level CIO type which has overview of all IT operations in the company. This means that a lot of the day-to-day aspects of a mid-level manager or front-line implementer are glossed over. I would even describe the book as mostly focusing on The First Way (taking more than half the book to explain) and The Second and Third Way also get a bit glossed over. But, in the context of that book, it’s fine. Because “the goal” of that book is to introduce The 3 Ways and give practical examples to help them stick with the reader.

This book continues to build upon the information given in The Phoenix Project, but it presents the information in two modified ways:

  • The book is from the point of view of someone who is really a mid-level manager, but the book needs to force her into a front-line implementer position from time to time. This is done to allow for more tangible day-to-day examples to be presented of what can be done.
  • The details of the external world are updated to more closely match the current state of DevOps and IT work in 2018/2019. The book references some of the newer capabilities in NoSQL databases, functional programming, and automated testing.

If The Phoenix Project was about describing The 3 Ways. Then this book is about describing The Five Ideals (which are still Lean aligned + some other ideas) :

They are all very useful ideals, but the book seemed to fall prey to glossing over details on how to achieve them. As mentioned earlier, there was a similar problem in The Phoenix Project. An example in this book is that our protagonist, Maxine, worked with her team to help define that a Continuous Integration (build) system needs to run Unit Tests in order to verify that each check-in of code doesn’t break the overall functionality. This is introduced as a new concept for their team. The night she introduces the idea, she falls ill and is sick for the next three days. When she returns to work, everyone on the team is writing well designed unit tests and the system has full code coverage. What?! To get a team that has never used unit tests to (a) embrace the value that unit tests provide, (b) take the time to learn a unit testing pattern that isn’t brittle, (c) create meaningful code coverage takes weeks and (d) involves a great deal of mentoring, code review, and will cause frustrations about where your teams time is most valuably spent. But, for this book, it can happen overnight with no negative consequences or trade-offs.

One thing that I really like about this book is that it is trying to take years and years of knowledge and distil it into an easily understandable and entertaining format that might get someone interested in learning more. Hopefully, it encourages anyone that enjoys the book to continue reading. The books publisher, itrevolution.com,  has a number of other books that dive deeper into the subject matter of DevOps and Business Management. By reading or listening to any of their books, you will find a long list of referenced material to continue learning from.


Creative Commons License
This site uses Alex Gorbatchev's SyntaxHighlighter, and hosted by herdingcode.com's Jon Galloway.