Suppressing Alerts using Blackouts
In this tutorial you will learn about suppressing alerts during scheduled downtime using blackout periods.
Contents
Overview
Being able to suppress or mute alerts during scheduled downtime to put them into “maintenance mode” is important because false alerts can cause “alert fatigue” and operators can become complacent.
This tutorial will explain how to suppress alerts by defining blackout periods that match on different alert attributes.
Prerequisites
It is assumed that you have completed Tutorial 1
or you have access to an Alerta server that you can send alerts to
using the alerta
command-line tool.
Preferably you have also completed Tutorial 3 which
explains how to enable/disable plugins and how they work. For this tutorial
the “blackout” plugin must be enabled. To enable a built-in plugin simply
add it to the list of PLUGINS
in the server configuration file.
It would also help to have access to the Alerta web console as
it can be very helpful to see the alerts update in the console
in real time rather than having to continually run the alerta query
command to see the results.
Step 1: Blackout by Environment
Alert suppression works by matching alert attributes against any
active blackout periods. At a minimum, a blackout period must define
an alert environment
to suppress.
To demonstrate how to suppress all alerts for the Production
environment run the following commands:
$ alerta send -r host05:/dev/disk1 -e FsInodeUtil -s major -E Production -S System \
-g OS -t '/dev/disk1 inode utilisation high.'
ed8dd6b3-37a5-4687-8a98-99d318eb6c37 (indeterminate -> major)
$ alerta blackout --environment Production
26997703-6705-457a-b603-0c151762129c
$ alerta send -r host05:/dev/disk1 -e FsInodeUtil -s major -E Production -S System \
-g OS -t '/dev/disk1 inode utilisation high.'
217ebb7e-b51a-4f15-b8b6-852c5e965894 (Suppressed alert during blackout period)
Instead of responding with “(1 duplicates)” which might have been expected the response was instead to indicate that the alert matched a blackout period and would be suppressed.
To confirm that the blackout period is active run:
$ alerta blackouts
ID CUSTOMER ENVIRONMENT SERVICE RESOURCE EVENT GROUP TAGS STATUS START DURATION
26997703 * Production * * * * * active 2017/08/01 08:27:03 3600s
Note that the short “blackout id” (ie. 26997703
) shown in the output
above matches the id returned from the alerta
command.
Step 2: Blackout by Service or Group
Blanket alert suppression can be acheived by defining a blackout period
based on service
or group
:
$ alerta blackout -E Development -S Network --duration 86400
51ca8a3b-39fd-4315-a748-9150c63632aa
$ alerta blackout -E Development -g Performance
06beb220-26ac-4c8a-9e23-bd05911a13b2
$ alerta blackouts
ID CUSTOMER ENVIRONMENT SERVICE RESOURCE EVENT GROUP TAGS STATUS START DURATION
51ca8a3b * Development Network * * * * active 2017/08/01 21:02:14 86400s
06beb220 * Development * * * Performance * active 2017/08/01 21:03:36 3600s
Step 3: Blackout by Event and/or Resource
It is possible to suppress alerts from a particular resource
or for
a specific event
(or even more specifically for a particular resource
-
event
combination).
$ alerta blackout -E Development --resource stl-cr-01 --event linkDown
3c31b062-e3f5-418a-93be-0b70ee593d58
$ alerta blackouts
ID CUSTOMER ENVIRONMENT SERVICE RESOURCE EVENT GROUP TAGS STATUS START DURATION
3c31b062 * Development * stl-cr-01 linkDown * * active 2017/08/01 21:18:59 3600s
Step 4: Blackout by Tag
When generic blackouts based on service
or group
, or specific
blackouts based on resource
or event
don’t meet the requirements
it is possible to define a blackout rule based on tags
for maximum
flexibility.
$ alerta blackout --environment Production --tag blackout
f4fc4ba5-a36f-4508-bd01-5550124ce26f
$ alerta send -r host05:/dev/disk1 -e FsInodeUtil -s major -E Production -S System \
-g OS -t '/dev/disk1 inode utilisation high.' --tag blackout
488ea442-73b6-4b28-bd3e-dd0ae281d094 (Suppressed alert during blackout period)
Tip
Add the “blackout” tag
dynamically using a pre-receive hook to make
alert suppression dynamic based on some lookup table, which could be managed
externally to Alerta.
Step 5: Accept alerts during Blackout Periods
To avoid situations where a blackout rule prevents a normal
or
ok
alert from auto-closing an existing alert it is possible to allow
“clearing” alerts that would have otherwise been suppressed.
Set the BLACKOUT_ACCEPT
server configuration variable to the list of
allowable severities:
BLACKOUT_ACCEPT=['normal', 'ok', 'cleared']
Step 6: Ending Blackout Periods
Delete blackout periods using the web UI. There is no support for deleting a
current, active blackout period using the alerta
command-line tool. It is
possible to “purge” expired blackout periods:
$ alerta blackouts --purge
ID CUSTOMER ENVIRONMENT SERVICE RESOURCE EVENT GROUP TAGS STATUS START DURATION
f4fc4ba5 * Production * * * * blackout deleted 2017/08/01 17:35:38 3600s
Next Steps
Now that you understand alert blackouts, you might want to try some of the following tutorials: