With Kibana’s basic license you can only use Index
and Server log
connectors for alert monitors, other connectors like Email
require a Gold license instead.
As a work around, you can send emails with the Server log
connector using swatchdog
instead.
In Kibana, I added an alert monitor with condition:
WHEN Max OF system.filesystem.used.pct IS ABOVE 95%
FOR THE LAST 1 minute
and a default server log action message:
{{alertName}} - {{context.group}} is in a state of {{context.alertState}}
Reason:
{{context.reason}}
When this alert is triggered, Kibana logs a message like the following to /var/log/kibana/kibana.log
:
{"type":"log","@timestamp":"2022-05-31T16:12:49+01:00","tags":["info","plugins","actions"],"pid":33378,"message":"Server log: Disk Usage Monitor - my.host.name is in a state of ALERT;;Reason:;system.filesystem.used.pct is greater than a threshold of 95% (current value is 96.9%) for my.host.name;"}
I then installed the swatchdog
package, and added to /root/.swatchdogrc
:
watchfor /Server log/
exec /root/scripts/swatchdog-notify.sh '$_'
swatchdog
matches the entire Server log
line that’s stored by Kibana, and passes it to my script in the $_
variable.
In /root/scripts/swatchdog-notify.sh
, I have a basic email template:
#!/bin/bash
cat <<EOF | sendmail -t
To: [email protected]
Subject: Kibana Alert
From: [email protected]
Kibana has flagged the following:
$(echo $1 | tr '"' '\n' | sed -n '8p;22p')
EOF
I’m converting the "
separators from the Kibana log into new lines, then printing just lines 8 and 22 which are the timestamp and error message respectively.
I run swatchdog
via a systemd script, in /etc/systemd/system/swatchdog.service
:
[Unit]
Description=Swatchdog Service
After=network.target
[Service]
Type=forking
User=root
ExecStart=/usr/bin/swatchdog --daemon -c /root/.swatchdogrc -t '/var/log/kibana/kibana.log'
[Install]
WantedBy=multi-user.target
When the alert triggers in Kibana, I receive an email containing:
Kibana has flagged the following:
2022-05-31T16:12:49+01:00
Server log: Disk Usage Monitor - my.host.name is in a state of ALERT;;Reason:;system.filesystem.used.pct is greater than a threshold of 95% (current value is 96.9%) for my.host.name;