Occasional blog posts from a random systems engineer

Systems Engineering an agent thanks to the world of Golang

· Read in about 5 min · (1006 Words)

My homelab and work life have followed very different trajectories, though often influencing one another. I like to try out interesting thought experiments at home, see how they work out to determine whether they’re worth investing in.

Right now, my homelab consists of a load of VMs and my initial goal was to find a new way of monitoring (often these small tasks lead to a big spray of different tasks).

I had decided to ditch Icinga, ditch custom monitoring solutions and unify on node_exporter. The reason is not because it’s better and the actual reason isn’t too important for this post. However, what it entails to get there is what I want to talk about. My existing Prometheus-esque stack use node exporter’s Consul discovery heavily, and I liked it. The problem was — most of my VMs weren’t integrated with Consul (until now, it was used in a specific Hashicorp part of my stack).

So, my standard approach to this had been very static: deploy consul-template, which bootstrapped a Vault agent, which in turn allowed for another consul-template to bootstrap a Consul client.

Whilst deploying this iteration of the stack, I had moved heavily to Docker — and this isn’t “Docker for deploying application”, this is using Docker to deploy everything after the initial cloud-init bootstrap of a VM. However, the setup was pretty static and quite cumbersome — tonnes of roles, policies and more roles and more policies (and completely isolated for each application — consul-template, Vault agent, etc.). Pre-provisioning for one of these VMs was literally hundreds of Terraform resources and, whilst it was fine for that part of the stack, for some simple integration for monitoring, I wanted some more dynamic.

So… after being jealous of the fact that cloud providers provide identity management for VMs which can then be used in all sorts of ways, I wrote a small JWT service, which provides a metadata endpoint to the VMs. This meant every VM could query a well-known IP for a token that could then be used to bootstrap itself further.

Once this was done, I set up a basic configuration inside for Vault/Consul, which would allow the VM to authenticate, obtain a Consul token and register itself and, for now, node_exporter as a service.

VM Setup

Now, we come to the part I want to talk about… the VM. As I said, historically this meant deploying a bunch of containers and, for what I was trying to achieve, also node_exporter. This had a lot of downsides:

  • lots of Terraform to maintain (especially deploying to ~50 separate machines)
  • lots of dependencies, it meant pre-provisioning configs onto the machine using Terraform, which I’m not that keen on (means to an end)
  • tonnes of inter-dependencies — all of the containers relied on each other, in a long chain

This seems to be a common thing, though—people layering tonnes of custom services onto a machine to bootstrap just the basic base services before even setting it up for the purpose it’s being deployed for. Running in systemd seems to be no better—lots of packages to deploy interconnecting services, which end up being barely tested (or even testable!), rely on monitoring of systemd to check for health (and pulling logs), especially when most of these services are written by someone else.

So so so! After reading about Microsoft’s mess and probably getting inspiration completely in a complete tangent to the author’s expectations, I decided—you don’t need to be a big corporation to own your own agents.

Goal

Since more and more of the tools that I use now are written in Golang, I wondered why I didn’t just write a single application, single binary(!), that I could just deploy to do all the grunt work. One binary, one log stream, one thing to make sure is up—simple.

I needed to determine exactly what I wanted from each of these applications:

  • I’d start with calling my JWT endpoint to obtain tokens
  • For Vault, I would just use their API library to login
  • For Consul, I’d run the full agent—within the application

After a bit of playing around, a little bit of help from AI (since my personal time has become much more limited), I ended up with a single deployable binary. It took all of 1 configuration, which would enable/disable the Consul service registration for node_exporter. Everything else (domains, datacenters, configs, SSL… EVERYTHING) is hard coded in the binary… because I don’t need them to be configurable. It had a simple routing to check for all of the tokens (Vault, Consul) and automatically renew them (including the JWT).

Going further

Now I had a machine up and registered in Consul, with the node_exporter service failing, so the next step was to deploy node_exporter… but I just couldn’t. bring. myself. to. do. it.

So I created a branch, pulled the contents of main.go from node_exporter, simple go get, call the main function in a goroutine, built it and tested… and I had node_exporter running…. Obviously I did some tuning, but seriously, this was getting too easy.

Not only this… I literally had a single 100MB binary (without trying to remove any unnecessary dependencies), which is already multiple times smaller than the combination of the tools that I was using before. Not only that, but as soon as anything expired, it just rotated and worked… it didn’t need Docker mount binds that sometimes cause issues because of rshared/rprivate blah blah… it just worked.

Next steps

So I want to go further… I want to push in my logging stack (hey half of those are written in Golang, so why not).

But before then, I’ll package it nicely, add some solid ways of deploying it (somewhere between cloud-init and Ansible) and some good monitoring and quality of life for ensuring it stays up and look at failure modes.

But hey, I’ve gone from managing 5 daemons (hopefully increasing coverage to 6–8) to a single one and all tied to an authentication endpoint that is now natively available on every VM.