DRAFT: Homelab monitoring alternatives
Homelab monitoring
I currently use Icinga2 (with Icinga Director) to monitoring internal homelab machines, services, as well as external VPSes and external services.
This has worked reasonably well, but as I’ve been recently migrating to a new Hashicorp cluster, I’ve realised how manual the configuration is.
I want to find a new solution that can perform this monitoring, but configured in a more dynamic setup.
Requirements
- Be fully open source
- Be configurable with Terraform
- Preferably light per-host configuration
- Preferably contains auto-discovery of services, though using host-tags would be acceptable
- Easy-to-install agent (no replicate check scripts etc.)
Explorations
Checkmk
I saw checkmk and immediately drew comparisons to Datadog:


For reference, Datadog has great integration with AWS. It auto-discovers running applications and adds checks.
Checkmk, whilst looking like datadog, does check some of my criteria:
- The agent is light - a simple package install
- It autodiscovers running services
- It has an API and there is a 3rd-party Terraform provider.
However, there’s a lot of negativity about it (https://www.reddit.com/r/devops/comments/cu80wj/anyone_used_checkmk_monitoring_system_looking_for/).
The interface is indeed filled with iframes. The agent-creation process on the page is not too straight forward (at least from an automation perspective - wizard with add, then click discover, then select services to monitor etc.)
It is open source, but there are paid features to allow for.. among other things, dashboards (quite a basic necessity I would have thought!)
Nagios XI
Getting Nagios XI installed wasn’t the best - yes, their official docs (aka random PDF that doesn’t even talk about the latest version). This lead to another document, which lead to a failed install, with a missing /usr/local/nagios-xi directory. I ended up using a random third-party docker image, which seemed to work.
Nagios XI is a little different from the original nagios core (which I remember “fondly” from years past). It has an API for creating hosts (for which there appears to be a Terraform provider for: https://devopsdunkin.github.io/terraform-provider-nagios/).
It has got host discovery - but this results in a page which allows you to select which hosts to add and which discovered services to enable. Installing the Nagios NRPE client straight from their site, resulted in an installation that had to complile the libraries and failed:
cd ./src/; make
make[1]: Entering directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
gcc -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -I ../include -I ./../include -o nrpe ./nrpe.c ./utils.c ./acl.c -L/usr/lib/x86_64-linux-gnu/ -lssl -lcrypto -lnsl
./nrpe.c:45:12: fatal error: ../include/dh.h: No such file or directory
45 | # include "../include/dh.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:48: nrpe] Error 1
make[1]: Leaving directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
make: *** [Makefile:65: all] Error 2
cd ./src/; make install
make[1]: Entering directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
make install-plugin
make[2]: Entering directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
/usr/bin/install -c -m 755 -d /usr/local/nagios/bin
/usr/bin/install -c -m 755 ../uninstall /usr/local/nagios/bin/nrpe-uninstall
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec
/usr/bin/install -c -m 775 -o nagios -g nagios check_nrpe /usr/local/nagios/libexec
/usr/bin/install: cannot stat 'check_nrpe': No such file or directory
make[2]: *** [Makefile:60: install-plugin] Error 1
make[2]: Leaving directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
make[1]: *** [Makefile:54: install] Error 2
make[1]: Leaving directory '/linux-nrpe-agent/subcomponents/nrpe/nrpe-4.1.0/src'
make: *** [Makefile:89: install] Error 2
NRPE-POST
WONDERFUL :D
Nagios XI also requires licensing as well, which doesn’t make it a straight forward answer.
ZenOSS
Whilst waiting for the 4GB ISO to download from their main mirror (SourceForge) at 200KB/s (cry), I decided to find some information on the internet to get a feel for what it’s like (their website is very “markety” and doesn’t give much away). Most of the videos that I found were around 15 years old and showed an ancient typical 2005-era static monitoring solution completely geared around SNMP. Let’s hope the newer version has some surprises!