Alert Manager Setup

The following is a guide outlining the steps to setup AlertManager to send alerts when a Tangle node or DKG is being disrupted. If you do not have Tangle node setup yet, please review the Tangle Node Quickstart setup guide here.

In this guide we will configure the following modules to send alerts from a running Tangle node.

Alert Manager listens to Prometheus metrics and pushes an alert as soon as a threshold is crossed (CPU % usage for example).

What is Alert Manager?

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. To learn more about Alertmanager, please visit the official docs site here (opens in a new tab).

Getting Started

Start by downloading the latest releases of the AlertManager (opens in a new tab).

This guide assumes the user has root access to the machine running the Tangle node, and following the below steps inside that machine. As well as, the user has already configured Prometheus on this machine.

1. Download Alertmanager

AMD version:

AMD

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.darwin-amd64.tar.gz

ARM version:

ARM

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.darwin-arm64.tar.gz

2. Extract the Downloaded Files:

Run the following command:

tar

tar xvf alertmanager-*.tar.gz

3. Copy the Extracted Files into /usr/local/bin:

Note: The example below makes use of the linux-amd64 installations, please update to make use of the target system you have installed.

Copy the alertmanager binary and amtool:

sudo cp ./alertmanager-*.linux-amd64/alertmanager /usr/local/bin/ &&
sudo cp ./alertmanager-*.linux-amd64/amtool /usr/local/bin/

4. Create Dedicated Users:

Now we want to create dedicated users for the Alertmanager module we have installed:

useradd

sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager

5. Create Directories for Alertmanager:

mkdir

sudo mkdir /etc/alertmanager &&
sudo mkdir /var/lib/alertmanager

6. Change the Ownership for all Directories:

We need to give our user permissions to access these directories:

alertManager:

chown

sudo chown alertmanager:alertmanager /etc/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /var/lib/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager &&
sudo chown alertmanager:alertmanager /usr/local/bin/amtool

7. Finally, let's clean up these directories:

rm -rf ./alertmanager*

Great! You have now installed and setup your environment. The next series of steps will be configuring the service.

Configuration

For implementation examples, refer to our GitHub. (opens in a new tab).

Prometheus

The first thing we need to do is add rules.yml file to our Prometheus configuration:

Let’s create the rules.yml file that will give the rules for Alert manager:

nano

sudo touch /etc/prometheus/rules.yml
sudo nano /etc/prometheus/rules.yml

We are going to create 2 basic rules that will trigger an alert in case the instance is down or the CPU usage crosses 80%. You can create all kinds of rules that can triggered, refer to our full list. (opens in a new tab).

Add the following lines and save the file:

group

groups:
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance $labels.instance down"
          description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
 
      - alert: HostHighCpuLoad
        expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Host high CPU load (instance bLd Kusama)
          description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

The criteria for triggering an alert are set in the expr: part. You can customize these triggers as you see fit.

Then, check the rules file:

promtool rules

promtool check rules /etc/prometheus/rules.yml

And finally, check the Prometheus config file:

promtool check

promtool check config /etc/prometheus/prometheus.yml

Gmail setup

We can use a Gmail address to send the alert emails. For that, we will need to generate an app password from our Gmail account.

Note: we recommend you here to use a dedicated email address for your alerts. Review Google's own guide for proper set-up (opens in a new tab).

Slack notifications

We can also utilize Slack notifications to send the alerts through. For that we need to a specific Slack channel to send the notifications to, and to install Incoming WebHooks Slack application.

To do so, navigate to:

Administration > Manage Apps.
Search for "Incoming Webhooks"
Install into your Slack workspace.

Alertmanager

The Alert manager config file is used to set the external service that will be called when an alert is triggered. Here, we are going to use the Gmail and Slack notification created previously.

Let’s create the file:

nano

sudo touch /etc/alertmanager/alertmanager.yml
sudo nano /etc/alertmanager/alertmanager.yml

And add the Gmail configuration to it and save the file:

Gmail config

global:
 resolve_timeout: 1m
 
route:
 receiver: 'gmail-notifications'
 
receivers:
- name: 'gmail-notifications'
  email_configs:
  - to: 'EMAIL-ADDRESS'
    from: 'EMAIL-ADDRESS'
    smarthost: 'smtp.gmail.com:587'
    auth_username: 'EMAIL-ADDRESS'
    auth_identity: 'EMAIL-ADDRESS'
    auth_password: 'EMAIL-ADDRESS'
    send_resolved: true
 
 
# ********************************************************************************************************************************************
# Alert Manager for Slack Notifications  *
# ********************************************************************************************************************************************
 
 global:
   resolve_timeout: 1m
   slack_api_url: 'INSERT SLACK API URL'
 
 route:
   receiver: 'slack-notifications'
 
 receivers:
 - name: 'slack-notifications'
   slack_configs:
   - channel: 'channel-name'
     send_resolved: true
     icon_url: https://avatars3.githubusercontent.com/u/3380462
     title: |-
      [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
      {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
        {{" "}}(
        {{- with .CommonLabels.Remove .GroupLabels.Names }}
          {{- range $index, $label := .SortedPairs -}}
            {{ if $index }}, {{ end }}
            {{- $label.Name }}="{{ $label.Value -}}"
          {{- end }}
        {{- end -}}
        )
      {{- end }}
     text: >-
      {{ range .Alerts -}}
      *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
      *Description:* {{ .Annotations.description }}
      *Details:*
        {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
        {{ end }}
      {{ end }}

Of course, you have to change the email addresses and the auth_password with the one generated from Google previously.

Service Setup

Alert manager

Create and open the Alert manager service file:

create service

sudo tee /etc/systemd/system/alertmanager.service > /dev/null << EOF
[Unit]
  Description=AlertManager Server Service
  Wants=network-online.target
  After=network-online.target
 
[Service]
  User=alertmanager
  Group=alertmanager
  Type=simple
  ExecStart=/usr/local/bin/alertmanager \
   --config.file /etc/alertmanager/alertmanager.yml \
   --storage.path /var/lib/alertmanager \
   --web.external-url=http://localhost:9093 \
   --cluster.advertise-address='0.0.0.0:9093'
 
[Install]
WantedBy=multi-user.target
EOF

Starting the Services

Launch a daemon reload to take the services into account in systemd:

daemon-reload

sudo systemctl daemon-reload

Next, we will want to start the alertManager service:

alertManager:

start service

sudo systemctl start alertmanager.service

And check that they are working fine:

alertManager::

status

sudo systemctl status alertmanager.service

If everything is working adequately, activate the services!

alertManager:

enable

sudo systemctl enable alertmanager.service

Amazing! We have now successfully added alert monitoring for our Tangle node!

Prometheus Grafana Dashboard

Alert Manager Setup

What is Alert Manager?

Getting Started

Configuration

Prometheus

Gmail setup

Slack notifications

Alertmanager

Service Setup

Alert manager

Starting the Services

Resources

Documentation

Ecosystem

Company

Legal

Community