Table of Contents
Building up the Seq app for OpsGenie
Over the past few weeks, I've worked with Nicholas Blumhardt to enhance the Seq.App.OpsGenie application for Seq. Nicholas is the founder and CEO of Datalust, the company behind Seq, and is very active in the community - which is awesome, and has meant that there's a bunch of open source Seq apps created by him, which extend and enhance Seq's capabilities, including interfaces to other systems and platforms. He created the Opsgenie app, and we've been using it extensively to dramatically transform our monitoring and alerting landscape.
With Nicholas' kind support, encouragement, and feedback, I added a number of enhancements to the app:
- Optionally configure an event property containing an array or comma-delimited list of tags to pass to OpsGenie (pass-through/dynamic tags)
- Configure OpsGenie Priority
- Optionally configure an event property to map to OpsGenie priorities (pass-through/dynamic priorities), with a default priority if not matched.
- Defaults to using @Level, so you can map Seq error levels to OpsGenie priorities!
- Configure Opsgenie Responders
- Optionally configure an event property to map to OpsGenie responders (pass-through/dynamic responders), w ith default responders if not matched
- Additional debug logging
- Logging the alert properties to debug to allow watching for errors and sending them through another app, such as Email+ or Jira as an alternate way to raise an alarm
- Add the $EventUri property for Handlebars templates, to allow simple passing of the full event URI to OpsGenie
These changes are now up and running on Nuget, and I've successfully tested the new features with OpsGenie.
Many of the changes most immediately benefit Event Timeout for Seq, and the recent v1.4.2 release was planned to take full advantage of them. As events panned out, I found a regression in the previous version that caused the AbstractAPI Public Holidays deserialization to fail when there was a public holiday to evaluate. I'd made a couple of mistakes while refactoring with Resharper which were readily corrected.
That meant that Event Timeout 1.4.x was released earlier than expected as it was stable and the best candidate for release with the fix. It had only been awaiting a merge of the last pull for Seq.App.Opsgenie to be released.
With the OpsGenie app now updated on Nuget, you can take advantage of the new features. In short - you can create a single OpsGenie instance to watch for Event Timeout alert events, with Event Timeout controlling the Priority, Responders, and Tags that will be sent to OpsGenie via properties that it logs!
Dynamic Priorities, Responders, and Tags
If, like us, you have different on-call support for the various components of your infrastructure, this is invaluable. You can target the timeout to the right responders, with the right priority, and the tags you need to pass to OpsGenie. All of this feeds straight into OpsGenie rules and policies to give you the power you need over your alerts. And of course - if you have Jira and use the OpsGenie Edge Connector script that I customised to pass OpsGenie priorities and tags through to Jira - your timeout priorities and tags will make it all the way to your Jira tickets!
With Event Timeout, this is controlled with the following configuration properties:
- Priority for timeouts (new to v1.4.x)
- Responders for timeouts (new to v1.4.x)
- Alert tags
These configurations will be passed to Seq when a timeout alert is raised, which leaves them ready to be picked up by an Opsgenie app instance:
So all we need is an OpsGenie instance configured to look for and map these properties!
Here's what that looks like:
You can see the new configuration properties in the above screenshot:
- Priority Property
- Alert Priority or Property Mapping
- Default Priority
- Responder Property
- Responders or Property Mapping
- Default Responders
- Include event tags (checkbox)
- Event tag property
Static Priority, Responder, and Tags
If you simply wanted to pass a static Priority, Responder, and Tags, that's done with the following configs:
- Alert Priority or Property Mapping: enter the desired priority, eg. P3
- Responder Property or Property Mapping: specify the responders with the format "Name=Type". For example, [email protected]=user,Barry=team will pass the user [email protected] and the team named Barry to OpsGenie. If no type is specified (Barry) it will default to Team.
- Alert Tags: enter a comma delimited list of tags to send to OpsGenie. For example, test,alert.
The magic comes when you use the other new properties.
We use three properties to control the pass-through/dynamic property mappings.
- Priority Property: the event property that will be examined to map events. By default, if not specified, the @Level property will be used - so you can map Seq's Fatal, Error, Warning, Information, Debug, and Verbose error levels to OpsGenie priorities! For Event Timeout, you would set this to Priority.
- Alert Priority or Property Mapping: Here's where the translation between the property and OpsGenie occurs. This is done as Name=Priority, eg. Error=P1. For an example of a mapping between @Level and OpsGenie, you could configure Fatal=P1,Error=P2,Warning=P3,Debug=P4,Verbose=P5.
- A mapping must always exist, even if (like Event Timeout) the property will contain the OpsGenie priority. So for Event Timeout, if you are configuring P1, P2, P3, etc - you need to configure a mapping of P1=P1,P2=P2,P3=P3, etc.
- However this has its benefits, too - it means you can look at the same property from different apps - one returns P1, the other returns Fatal - and still have them map correctly by simply including both in the mapping. Plus, it affords the opportunity to change priorities around. For example - we don't use P5 for OpsGenie, so we can specify P5=P4 to remap a P5 that is passed to a priority that we do use.
- Default Priority: To ensure we can always raise an alert - even if a mapping wasn't possible - a default priority must be specified, eg. P3.
These three properties must be set for priority mapping to be performed. You're not constrained to just @Level - any valid property that is passed by an event can be used, including Event Timeout's Priority property!
We use three properties to control pass-through/dynamic responder mappings.
- Responder Property: the event property that will be examined to map events. For Event Timeout, you would set this as Responders.
- Responders or Property Mapping: Translation between the property and OpsGenie responders. You'll configure a list of all responders that can be passed, with the same format as you'd pass for a static set of responders. For example, [email protected]=user,Barry=team,Windows Escalation=schedule.
- Default Responders: If a responder can't be matched from the property, who should we direct the alert to? This should be one or more of the responders that you configured in Responders or Property Mappings. You don't need to provide the Type here- just the name. For example - if I want to make Barry and Windows Escalation the default, I'll just specify Barry,Windows Escalation. The app already knows the responder type, because you've configured it.
This was actually the first feature. Event Timeout already logged a Tags property with tags configured for a given timeout, and being able to pass those through to OpsGenie was the original reason that I started to work on the Seq.App.OpsGenie code.
In short, you simply need to configure as follows:
- Include event tags: Check this box to allow mapping a property containing a comma-delimited string, or array, of tags.
- Event tag property: The name of the property that will contain the tags. This will default to Tags if not specified.
The pass-through/dynamic tag feature will append the tags that are passed from an event to any that you configured in the Alert Tags configuration. This means you can combine static and dynamic tags seamlessly!
Results, of course, are what really matter. Below is a screenshot from Seq of a test alert to OpsGenie. I had to blank out the Responder Mappings data as it had an email address that I didn't want to show - but without a means to show you an OpsGenie alert in action, this is the best illustration of the OpsGenie app passing the Event Timeout priority, responder, and tags through to OpsGenie, where our rules, policies, and escalations can take care of the rest. And, of course, the priorities and tags are translated all the way through to Jira, thanks to my OEC script!
I think this is a cool addition to Seq and the OpsGenie app, and of course a major boost for Event Timeout.
I'd love to see other inputs and "log output" apps implement the Responder, Priority, and Tags to allow the OpsGenie app to pass them through - and of course more Seq apps that interface with other systems could also benefit. I may well take a pass at some of my favourite Seq apps with an eye to this.
I also have an enhancement suggestion open for Seq itself, to allow dashboard alerts to pass these properties. Nicholas seemed pretty positive about it, so fingers crossed!