Monit alert system — config examples

Mike R
7 min readAug 21, 2022

--

Monit (from Tildeslash) is an excellent alerting and monitoring system which I have been using for few years to monitor my fleet of servers.

There are several products on the market that do similar services (Nagios, Prometheus, Zabbix, etc) but the reason I use Monit over others is its sheer simplicity and flexibility for writing custom alerting rules.

Other solutions do the job well, but they are absolute behemoths to stand up, manage and understand all the moving parts. Nagios is a big dog among alerting frameworks and it took me quiet a while to grok its structure and how it works. Compared to these, Monit takes an hour to learn to use and deploy.

Additionally, being written in C, Monit is every low on your host’s resource usage. I’ve never had an issue of monit agent using more than 1 or 2% at most of host’s CPU or memory.

While being simple to spin up, manage and control, Monit is very powerful with what it can do, and what it can do is virtually limitless. This guide will show some common use cases for Monit and will break down the syntax on how to write your own Monit rule files to monitor for various system and infrastructure events.

My only beef with Monit is its confusing naming. Its hard to find information by doing web searches for “monit” as this trips up the search algo, so the name could be better and more unique. Additionally, the monitoring Server is called M/Monit while the reporting agent is called Monit, which is again, confusing.

I wont go into the installation details, you can read on how to install both the Monit agent and the Monit server from official docs:

This guide will focus on writing Monit rules and rule syntax

M/Monit vs Monit

to clear things up, Tildeslash (makers of Monit — you can see the source code here) have 2 Monit offerings, the open source free Monit (community), and the M/Monit console which is paid version.

If you are managing more than a handful of servers, I suggest you get the paid M/Monit server, its relatively cheap and is a one-time fee

The server makes it easy for you to visually see all your agents, control alerting via groups and add things like Slack notifications as well as see historical trends (like CPU, memory usage, etc)

M/Monit console runs on a server and listens for events from other agents, while each agent has its own rule file that it reads and responds in case of events, sending the alerts over to the M/Monit Server.

simple monit alerting setup

M/Monit (which I will refer to as Monit Server) can use a variety of databases to store its agent data, from Postgres to SQLite (default DB, for a large set of servers, anything over 30, use a production-ready DB like Postgres or MariaDB)

Sending basic metrics from Agent to Server (CPU and Memory)

once your Server is up and running, it will listen on port 19840 (you can change this in monit server config). Your agents will send their alert and metric data over TCP to this port

Assuming your server is up and your agents are sending their data to the server, lets configure basic CPU and Memory alert for Host 1

this is a basic monit agent rule file assuming you already installed a Monit agent on the host

### symlink in /home/monit for .monitrc > /etc/monit/monit.confroot@host1> ls -la /home/monit
-rw-rw-r--. 1 monit 1002 32 Aug 26 2019 .monit.id
-rw-r--r--. 1 monit monit 6 Aug 6 03:38 .monit.pid
lrwxrwxrwx. 1 root root 21 Aug 26 2019 .monitrc -> /etc/monit/monit.conf
root@host1> cat /etc/monit/monit.conf-------------------------------------------set daemon 5 # Poll at 5-second intervals
set logfile /var/log/monit.log
set eventqueue basedir /home/monit/tmp slots 1000
set mmonit http://monit:monit@server1:19840/collector
set httpd port 19841
allow localhost
allow server1
allow 127.0.0.1
allow monit:monit
check system host1
if memory > 75% then alert
if memory > 80% then alert
if memory > 90% then exec "/etc/monit/scripts/top.py mem"
if memory > 95% then alert

here I am setting the Poll interval on 1st line, or as Monit calls it “cycle” — Monit will poll for system status every 5 seconds or cycles

line 2 is log file location

set eventqueue basedir /home/monit/tmp slots 1000

The set eventqueue statement in line 1 is optional, but recommended. It allows Monit to store event messages if connection to Server should temporarily be unavailable and retry delivery later. This way, no events will be lost. The slots option can be used to set a limit on how many events can be stored so the queue will not grow without limits if Server (M/Monit) is not available.

The size of a queued message is small (ca. 200 bytes) so the space requirements for, let’s say, 1000 queued events is only 200kB

set mmonit http://monit:monit@server1:19840/collector

Here the agent is told where to send its metrics and alerts to, using the default monit:monit username and password (you can use any valid M/Monit user to do this)

set httpd port 19841
allow localhost
allow server1
allow 127.0.0.1
allow monit:monit

this is telling the agent to run its service on port 19841 and allow the Server to talk to it using monit:monit username and password (TLS option is also available)

the next block are actual alerting rules, here I am checking the host for memory usage,

check system host1
if memory > 75% then alert
if memory > 80% then alert
if memory > 90% then exec "/etc/monit/scripts/top.py mem"
if memory > 95% then alert

this will alert every time it reaches above a certain memory usage threshold.

if memory usage is above 90%, Monit agent will execute a custom script “top.py” which will email me with a memory usage snapshot by process, here is an example, a server hit 90% memory usage, Monit ran top.py script and emailed me a memory usage snapshot:

to check the syntax of your Monit agent config file, run

monit -t -c monit.conf

What I love about Monit is its easy flexibility, I can always deploy custom scripts to do additional checking if something is not available via the agent itself.

You can see all my custom monit alert scripts here:
https://github.com/perfecto25/monit-scripts

Monit rule examples

here you can add additional blocks of rules on each host

CPU usage

check system host1
if loadavg (5min) > 75 for 2 times within 64 cycles then exec "/etc/monit/scripts/top.py cpu"

this tells the agent to fire off a custom top.py script if load average is over 75 constantly for 2 times within 64 cycles (64 * 5 seconds (poll interval) = 320 seconds), so this will alert roughly every 5 minutes if my host has high CPU load

here I get a CPU snapshot in my Slack channel and email inbox:

you can also just make it alert you (send email or slack message with an alert)

check system host1
if loadavg (5min) > 75 for 2 times within 64 cycles then alert

Disk space usage

to check disk space, add a new block

check filesystem root-/ with path /
if space usage > 80% then alert
if space usage > 85% then alert
if space usage > 90% then alert
if space usage > 95% then alert
check filesystem host1-/home with path /home
if space usage > 80% then alert
if space usage > 85% then alert
if space usage > 90% then alert
if space usage > 95% then alert

Check file content

if a file receives a string or message that needs an alert, simply add a search string

check file sfptpd.log with path /var/log/sfptpd.log
if content = "error" then alert
if content = "failed to receive Announce" then alert
check file oom-killer with path /var/log/messages; if content = "Out of memory" then alert

Check process state

monitor a process in any number of ways,

## using a port
check host openvpn-service with address 0.0.0.0
if failed port 1194 type udp then alert
check process postgres matching "/usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/" if failed port 5432 then alert## using a pid file (can also start/stop)
check process apache with pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd start" with timeout 60 seconds
stop program = "/etc/init.d/httpd stop"
if failed port 80 for 2 cycles then restart
if failed port 443 for 2 cycles then restart
## using name matching
check process stunnel matching "/usr/bin/stunnel /etc/stunnel/stunnel.conf"
## using name matching with wildcard
check process sshuttle matching "/usr/bin/python /usr/share/sshuttle/main.py /usr/bin/python*"
## use regex to capture all procs with "myproc xxxx run" wildcard
check process matching "/home/user/bin/myproc ........ run"
## using a script output value
check program mail-queue path "/etc/monit/scripts/mailq.sh"
if status != 0 within 2 cycles then alert

Include additional rules

if you have many rules, you can classify and break them down by filename, ie diskcheck, cpu, procs, etc

in your monit.conf file, incude any additional rule sets

include /etc/monit/rules/*.conf

More samples and scripts here https://github.com/perfecto25/monit-scripts

--

--

No responses yet