Performing OpsGenie Heartbeats with Seq

When we investigated OpsGenie, one feature I was attracted to was Heartbeat Monitoring. This is a feature that can help to answer a fundamental problem - "How do you know if you have a major site or infrastructure outage?"

There are plenty of ways that you could go about this, especially if you have multiple sites with connections that could raise an alert if another site went down, but this is a relatively simple solution, if you have something to send a heartbeat to OpsGenie.

As we were implementing OpsGenie as the nerve centre of our IT operations monitoring and alerting, and Seq was a major part of that design, I contemplated Seq as one of the options. 

Seq as a structured logging server isn't necessarily the system that you would consider to perform an outbound heartbeat call - for starters, there was no functionality or apps that would do it. Secondly, is this really logging related?

I considered this and came up with the following conclusions;

  1. Mandating an application logging server for all key applications and processes means that a heartbeat process should be sending logs to Seq so that we have visibility of this as a monitoring component.
  2. Seq is always online and has extensibility with Seq apps - so instead of implementing something external to Seq, ensuring it performs logging, and directing those logs to Seq - we could simply implement the heartbeat function within Seq.

This was actually the first Seq app that I attempted, and I wanted to accomplish it quickly. Hence I initially forked Seq.Input.Healthcheck, which already had functionality to send a HTTP GET to a URL, and with some tweaks, adapted it into a heartbeat app, Seq.Input.OpsGenieHeartbeat.  It was certainly quick to get up and running, and it worked well. It wasn't long before a major internet outage showed the value of the heartbeat.

I recently reviewed the app with an eye to adding proxy functionality, for sites that don't or can't have their server directly accessing the internet. I could do this with the Healthcheck fork, but I'd ultimately adapted an input designed to generically process one or more URLs and log statistics, into an app that performed a single function. The code that instantiates multiple tasks for each URL is well designed for it's original purpose, with multiple classes for performing and reporting the health check.

Altering the code for a heartbeat meant I was creating somewhat of a Frankenstein's monster. In fact, OpsGenie's normal status code (HTTP 202) was logged as a Warning event. I could correct that, but I was really changing an app from its intended design, for what should be a simple application for a single purpose - just a timer-based check would do the job.

So I decided to simplify. I took the fundamental design of the heartbeat app, and re-implemented it as Seq.App.OpsGenieHeartbeat:

  • Configure a REST API URL for your heartbeat
  • Configure an API key from an OpsGenie API integration
  • Configure the interval for sending your heartbeat (60 seconds default)
  • Optionally - configure a proxy server

Along the way, I added a few diagnostic logs for startup, and set the status codes according to the OpsGenie response - HTTP 202 gets a Debug event log, anything else gets a Warning, and exceptions get an Error. 

One interesting side effect of this is that the typical elapsed time for a heartbeat dropped from 150 - 200ms with the Healthcheck fork to 0.2 - 0.5ms with the new app - variable, of course, depending on the many factors affecting internet speeds.

Neither result is particularly terrible, but it's quite a noticeable difference. The new Heartbeat app doesn't need to inspect the returned content and output stats, it only needs the status code to determine if a heartbeat was successful.  I suspect that Healthcheck also instantiates an HttpClient each time (I haven't checked); as usual, I'm using Flurl.Http, with a cheerful little implementation that configures a HttpClient that is always used - so the first call typically takes ~200ms or so, and then subsequent calls drop to the 0.2 - 0.5ms range.

Bacon is always well received

The result - an app that does everything needed for an OpsGenie Heartbeat, and nothing else. It outputs meaningful Seq logs for each heartbeat, including an AppName property (which I've tended to standardise on for all logs sent to Seq). These logs in turn could be monitored or alerted. And ultimately, the desired effect is achieved: if your site, or indeed your Seq server, drops off the face of the earth, your OpsGenie instance can make certain you know at 2am in the morning.

You can install Seq.App.OpsGenieHeartbeat to your Seq installation by specifying the package id. 

Comments

You may also like:

Setting Jira Priority and Labels with OpsGenie Edge Connector

The default OpsGenie integration with Jira Service Management has a puzzling omission when it comes to their OpsGenie Edge Connector integration - it doesn't send Priority or Tags. This means that tags won't pass through as labels, and the priority will be at the default. My ideal is that...

Calculating timeouts with Event Timeout for Seq

We use quite a number of Event Timeout instances in our Seq environment, to detect processes that have not completed in time. The nature of the Seq.App.EventTimeout implementation is one that relies on a timeout in seconds, and this can result in keeping track of quite a few different calculations....

Lurgle.Logging - a standardised Serilog implementation with extra goodies!

Logging is important Logging is a really important, oft-neglected, aspect of business applications. I can't state that enough. If you don't have good logging, you can't troubleshoot and debug problems, and you have little chance of seeing what's actually going on in your enterprise. In Structured Logging with Seq and...