Alerts explored in-depth

In this tutorial you will learn about alert de-duplication and simple correlation techniques as well as alert tags, custom attributes, environments, services and more.

Contents

Overview

Even though event and resource are the only mandatory attributes the standard alert format is extensive with more than two dozen fields.

This tutorial will explain what the different attributes are and what they are for. And once you understand what the different attributes are for you will be able to chose more useful values to assign to them to get the most out of Alerta.

Prerequisites

It is assumed that you have completed Tutorial 1 or you have access to an Alerta server that you can send alerts to using the alerta command-line tool.

It would also help to have access to the Alerta web console as it can be very helpful to see the alerts update in the console in realtime rather than having to continually run the alerta query command to see the results.

Step 1: De-duplication

Alert de-duplication is used to reduce the number of alerts in the console by only displaying the duplicate alerts once but updating key alert attributes and incrementing a duplicate counter.

To demonstrate de-duplication, run the following command to generate the same alert, multiple times:

$ alerta send -r user01 -e loginError -s major -E Production -S Security \
-t 'user01 login failed.'
57eb528a-84bf-4080-b54a-37e2888207f3 (indeterminate -> major)

$ alerta send -r user01 -e loginError -s major -E Production -S Security \
-t 'user01 login failed.'
57eb528a-84bf-4080-b54a-37e2888207f3 (1 duplicates)

Note that this is the default behaviour. No special configuration or alert format is required. As long as the alert resource and event are the same then alerts will be de-duplicated.

Step 2: Simple Correlation

Alerta has support for simple correlation which means that it can be configured to update one alert with another related alert.

To demonstrate simple correlation, run the following commands to generate alerts that replace each other and only show the most recent:

$ alerta send -r user01 -e loginError -s major -E Production -S Security \
-t 'user01 login failed.' -C loginError -C loginWarn -C loginOk
572cb438-5d09-4cdc-babd-410020e3bc15 (indeterminate -> major)

$ alerta send -r user01 -e loginWarn -s warning -E Production -S Security \
-t 'user01 password reset.' -C loginError -C loginWarn -C loginOk
572cb438-5d09-4cdc-babd-410020e3bc15 (major -> warning)

$ alerta send -r user01 -e loginOk -s normal -E Production -S Security \
-t 'user01 login success.' -C loginError -C loginWarn -C loginOk
572cb438-5d09-4cdc-babd-410020e3bc15 (warning -> normal)

The most important part of the above commands were the -C loginError -C loginWarn -C loginOk arguments. The “-C” option is short for “–correlate” and informs the Alerta server that alerts with these events should be correlated together.

Interestingly the de-duplication logic demonstrated in Step 1 above can be used to produce similar results as this simple correlation.

To demonstrate correlation by de-duplication replace the different login event names with the more generic “loginStatus” and move the actual event name to “value”.

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
1acab7c8-e08e-4fef-98ad-3b07ba238120 (indeterminate -> major)

$ alerta send -r user01 -e loginStatus -v loginWarn -s warning -E Production \
-S Security -t 'user01 password reset.'
1acab7c8-e08e-4fef-98ad-3b07ba238120 (major -> warning)

$ alerta send -r user01 -e loginStatus -v loginOk -s normal -E Production \
-S Security -t 'user01 login success.'
1acab7c8-e08e-4fef-98ad-3b07ba238120 (warning -> normal)

This method gives you the benefits of correlation without the overhead of having to define all the correlated event names in advance.

Step 3: Automatic status changes

In the examples above you sent alerts with different severities and they caused the same alert to transition from severity to severity.

What you might not have noticed is that the alert status also changed. For example, when a new alert is received the status was automatically set to open. And when the alert severity was changed to normal the status automatically changed to closed.

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
12c4d5f4-1be9-436d-a90a-1adc1a473815 (indeterminate -> major)
=> open

$ alerta send -r user01 -e loginStatus -v loginOk -s normal -E Production \
-S Security -t 'user01 login success.'
12c4d5f4-1be9-436d-a90a-1adc1a473815 (major -> normal)
=> closed

In addition to open and closed you can set the status of alerts to ack or assign based on your alert handling procedures.

An important feature of Alerta is that it will automatically re-open an alert that was acked if the severity for the new alert is higher than that already received.

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
9df79583-397b-4d6b-8c6e-3f446bd0c7b3 (indeterminate -> major)
=> open

$ alerta ack --id 9df79583
=> ack

$ alerta send -r user01 -e loginStatus -v loginError -s critical -E Production \
-S Security -t 'user01 login failed.'
9df79583-397b-4d6b-8c6e-3f446bd0c7b3 (major -> critical)
=> open

Alerts are also re-opened if they are closed or expired when any severity except normal is received for that alert.

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
9564d012-1d37-45c2-94c6-ba5e26af8389 (indeterminate -> major)
=> open

$ alerta send -r user01 -e loginStatus -v loginOk -s normal -E Production \
-S Security -t 'user01 login success.'
9564d012-1d37-45c2-94c6-ba5e26af8389 (major -> normal)
=> closed

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
9564d012-1d37-45c2-94c6-ba5e26af8389 (normal -> major)
=> open

Step 4: Environments and Services

The alert environment plays an important role in de-duplication and correlation because it is used to “namespace” the alert resource. Alert environments provide a scope for resources. Alert resources need to be unique within an environment namespace.

This means that if two alerts are received for the same resource but different environments they are considered different alerts and are not de-duplicated or correlated.

This is so that you can have hosts, applications, devices or anything with the same resource name but in different environments and they will be treated independently.

Run the following commands to generate two “loginError” alerts, one for the “Production” environment and the other for “Development”:

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Production \
-S Security -t 'user01 login failed.'
f0948bf7-d351-47f8-8670-0eb84127816b (indeterminate -> major)

$ alerta send -r user01 -e loginStatus -v loginError -s major -E Development \
-S Security -t 'user01 login failed.'
4cd197b8-eb19-49f5-9afe-841390c03ff9 (indeterminate -> major)

The alert service is used to detail the list of effected services.

Step 5: Groups, types and origins

TBC

Step 6: Tags and Custom attributes

TBC

Step 7: Saving raw data

TBC

Next Steps

After you deploy your Alerta server, you might want to try some of the following tutorials: