Apigee–Throughput, Response Time and Error Rate

on Monday, December 31, 2018

I was listening to Lynda.com’s training course DevOps Foundations: Monitoring and Observability with Ernest Mueller and Peco Karayanev. I thought I heard Mr. Mueller say that Google had a a trinity of metrics which they always cover when developing Web APIs: Throughput, Response Time, and Error Rate. He stressed that it’s important to look at all of these indicators on the same dashboard, because they create a more informed view when displayed together. I did a quick search and it didn’t really pop-up on a google site, but what Mueller said was pretty much described on New Relic’s Application Monitoring page.

One thing that made a lot of sense to me was how these three metrics are strong indicators of when a major outage is going to happen. If the response times go up but everything else is the same, then a downstream system (like a database) is probably having an issue. If the throughput drops but the response times don’t then a system earlier in the workflow is probably having an issue. And, a ton of errors is a ton of errors in the web api.

So, I wanted to know how difficult it would be to create a report in Apigee which could visualize all three of these metrics on the same screen.

To do this, I created a Custom Report in the Analyze section of the Edge Admin portal and setup these metrics:

  • Average Transactions per Second

    This should be your throughput. One thing that Mueller and Karayanev said was that you shouldn’t look at averaged metrics in isolation. If you monitor an average for a particular metric, you should also have a Max or Min of the same metric. That way you don’t see misleading information where the average number of transactions look normal, but that is actually hiding spikes in traffic. Unfortunately, there was no pre-built metric for Sum of Transactions per Second.
  • Total Response Time (Average) and Total Response Time (Max)

    Following the guidance from above (don’t look at average metrics in isolation), both of these metrics combine to show a solid view of the real response times.
  • Target Errors (Sum)

    The system also has Proxy Errors and Policy Errors. I chose to go with Target Errors, because they are the errors that should indicate if the actual Web API was having problems.
  • Filter

    I also used a filter at the end to remove two things that could corrupt the data:

    1) The dining-cams proxy is a streaming proxy and never closes it’s connection. So, the average response time is always the maximum value the graph will hold.

    2) In an early post I talked about using a /upcheck endpoint to ensure proxies were up and running. I needed to remove those requests so the data wouldn’t be skewed by synthetic calls.

Overall, it looked like this:

image

We have very few users of the system, so there wasn’t much data to actually show. But, it did provide a few pieces of good information:

  • The Registrations endpoint is pretty sluggish.
    • The private endpoints are expected to take a long time (those response times are actually kind of fast)
  • The Dining Menu endpoint could have it’s error logs looked over.

image

image

.dental domain registry

on Monday, December 24, 2018

A local dentist office has a website at http://lompoc.dental. The simplicity of the name captured my attention. When did .dental become a domain name? Why didn’t they put www in front of it? So simple, so clean.

So, when did .dental become a domain name?

Back in 2014 ICANN opened up a whole slew of new domain name possibilities with a New gTLD Program. And, a company called Donuts gathered a vast amount of funding in order to purchase 307 domain names (.free, .inc, .poker, etc). One of those domains was .dental. It seems like the goal of donuts was to beat the land rush for customized domain names (like lompoc.dental) and be able to recuperate all the money they spent ($56 million?) when registering new domains.

This is probably a pretty solid plan if they are charging a reasonable amount for registering domains with them. I headed over to godaddy.com to checkout what the price would be for someplacenew.dental. Which turned out to be about $70. I figure godaddy is making a little money off the transaction, but probably not much. So let’s say Donuts is making $65 per domain name registry, how many registry’s would they need per year to make back ~$60 million? About 92,300 names per year.

That seems like a lot. I hope they purchased some really good names.

Copying Build Task Groups with VSTeam

on Monday, December 17, 2018

So, we finally got around to updating our VSTS / Azure DevOps XAML based builds to Pipeline builds (the slick web based builds). This effort is just in time to get the functionality switched over before XAML builds get disabled on Azure DevOps. From Brian Harry’s Blog:

By the end of 2018, we will remove all support for XAML builds in all Team Services accounts.  By that time, all customers will need to have migrated to the newer build system version because their XAML builds can no longer be run.  We will continue to support the web based experience for viewing previously completed XAML based builds so that you have access to all your historical data.

The process of converting over these builds has been tremendously helped by the excellent open source VSTeam Powershell module (github). The creator of this module, DarqueWarrior (Donovan Brown), is amazingly talented and particular in his development practices. And, I love him for it.

Using his/his teams module as the underlying framework it was pretty quick and easy to build out a little additional functionality to copy Task Groups from one project to another. I would love to contribute the addition back to the project, but I just don’t have the time to put together the meticulous unit tests, follow the excellent coding standards, and integrate it into the underlying provider. I’m still in a race to get these builds converted before they get turned off.

So, here’s a quick gist of building some copying functionality on top of VSTeam:

How much overhead does Apigee add to a request?

on Monday, December 10, 2018

So, I got asked this question earlier today and was surprised that I never memorized an answer.

It’s definitely dependent on your configuration, but for our purposes it looks like it’s about 60 ms. And, it’s probably less than that (see below).

image

The total time states 267ms, but the response actually sends back to the client around the 220ms mark. Of those 220ms, about 162ms is spent making the round trip to the application server to process the request. Below is a more detailed break down. But, you should be aware that many of the 1ms values listed below are actually < 1ms. The total values are probably lower than the values quoted.

image

MyGetUcsb–Unlisting and Soft Deletes

on Monday, December 3, 2018

In a previous post about moving packages between feeds in MyGet I introduced a Remove-MyGetPackage command. The command’s default functionality was to perform a “hard delete”. Which would completely remove the package from the registry rather than unlisting it. This made sense in the context of a Move-MyGetPackage command. Because you want to move the package from one to feed to another.

But, it introduced some other problems when actually using it to Delete packages.

Sometimes, the day after you publish a package you find a bug, do a quick fix and publish a new version of the package. In those scenarios, I thought it was okay to delete the old package because anyone that had downloaded the old version would just update to the new version when they found it was no longer available in the feed. This turned out to be way more difficult than I thought.

Here’s what I didn’t understand:

The way nuget updates are designed, they need to have the original package available to them before doing an update.

So, if you download someone else’s source code and you try to build it, you can run into a “Package Not Found” error during the process. This might happen because I deleted that version of the package. My assumption would be that the person who downloaded the code would check MyGet and see that a newer version of the package is available and Update-Package to the new version. However, this is where the problem lies.

Update-Package requires that the previous version of the package to be available before it will perform the update. And since the package doesn’t exist anymore, it can’t do that.

And this is by design. As such, the nuget API makes the delete operation a soft delete (unlist). And, I was overriding the original designers intentions by defaulting to a hard delete. So, I did two things to get back inline with the original designers game plan:

  • Added an –Unlist parameter to Remove-MyGetPackage
  • Added a wrapper function called Hide-MyGetPackage (that’s the best verb I could find for “Unlist”).


Creative Commons License
This site uses Alex Gorbatchev's SyntaxHighlighter, and hosted by herdingcode.com's Jon Galloway.