The other day a coworker and I were adding the ElmahCore/ElmahCore library to an AspNetCore web project and we ran into a strange NullReferenceException when using it. The problem didn’t occur when the library was used in an AspNetCore 2.2 project, and this was the first time trying it on an AspNetCore 3.0 project. So, we wanted to believe it was something to do with 3.0, in which case there would probably be other people that were running into the issue and Microsoft have a fix ready for it very soon. But, we needed to move forward on the project, so we wanted to find a work around in the meantime.
Personally, I found a bit of humor with this Exception, because at first glance it looked like it was occurring within an Exception Management system (Elmah). I mean the one thing that an Exception Management system definitely doesn’t want to do is throw Exceptions. However, being that I was already troubleshooting an Exception problem that seemed kind of funny, I leaned into the ridiculousness and decided to debug the Exception Management system with another Exception Management system, Stackify Prefix ( … it’s more of an APM, but it’s close enough).
The truly laugh out loud moment came when I hooked up Stackify Prefix, and it fixed the Elmah exception. Elmah started working perfectly normally once Stackify Prefix was setup in the application. So … one Exception Management system fixed another Exception Management system. (Or, at least, that’s how it felt.)
Of course, I needed to figure out what Stackify did to fix Elmah, so needed to pull Stackify back out of the application and really get into Elmah. Which meant grabbing a clone of ElmahCore to work with.
Even before having the source code available, I had the function name and stack trace of the Exception. So once I had the source code available, zeroing in on the code was pretty straight forward. But, what I found was unexpected.
As best I could tell, the problematic code wasn’t in Elmah, but was occurring from an underlying change inside of Microsoft.AspNetCore.Http.ItemsDictionary. It seemed like an internal field, _items, was null within the ItemsDictionary object. And, when the enumerator for the collection was used, a NullReferenceException was being generated.
(More detailed information at ElmahCore Issue #51)
It seemed like the workaround for the issue was to populate the request.Items collection with a value, in order to ensure the internal _items collection was not null. And to double check this, I loaded back up Stackify Prefix and checked what the value of the collection was when Stackify was involved. Sure enough, Stackify had populated the dictionary with a corellation id; and that’s why it fixed the issue.
For ElmahCore, I implemented a really bad code fix and created a pull request for them. It is a really bad fix because the solution triggers the exception, catches it, and then swallows it. Which makes the Elmah code continue to execute successfully. But, any .NET Profiler that ties into the really low level .NET Profiler APIs (these are the APIs which power APM solutions like Stackify Retrace, AppDynamics, etc) will record that exception and report as if its an exception occurring within your application.
At this point, the next steps are how to inform Microsoft of the problem and get a bug fix/pull request going for them. I’ve opened the bug (AspNetCore Issue #16938), but I will need to get the code compiling on my machine before submitting a pull request.
0 comments:
Post a Comment