12 November 2023

Challenges of defining a variable for a packaged systemd unit

I manage an Arch Linux VPS I’d like to keep an eye on. Grafana Cloud combined with node_exporter look like a good start. Grafana Cloud uses Prometheus to collect metrics. Prometheus works by pulling data from exporters. Therefore, my node_exporter needs to be accessible to Grafana Cloud’s Prometheus service.

I don’t want to expose node_exporter to the World Wide Web. In a more controlled environment, it’s possible to get Prometheus and node_exporter onto one network. Through tailscale, nebula, a plain VPN, or a tunnel. Plenty of options. This isn’t possible with Grafana Cloud since it’s beyond my control. Grafana Labs documentation has a solution - install Prometheus on the node. To run Prometheus locally, scrape local node_exporter and push resulting data to Prometheus in Grafana Cloud. This is the intended way to use Prometheus in Grafana Cloud - to ship them data you pull yourself with Prometheus, Grafana agent, or any other compatible tool.

I installed node_exporter on the Arch Linux VPS: pacman -S prometheus-node-exporter. Got it running: systemctl enable --now prometheus-node-exporter.service. Now its web interface is accessible on port :9100. The problem is, it’s publicly accessible Not what I intended. Stopping the service for now.

Although blocking access to a port with a firewall is a valid option, I see it only as an additional measure. This measure is fine for protection against misconfiguration, but ideally, the port should not be exposed at all. Alternatively, we can ditch web communication altogether in favor of a Unix socket. A wonderful option to have but it involves a file system with paths and permissions - extra hoops to jump through. The solution I’ve chosen is to bind the port to a local-only addressThe solution I’ve chosen is to bind the port to a local-only address.

By default, node_exporter binds to *:9100 which is configurable with --web.listen-address= CLI argument. Setting it so 127.0.0.1:9100 limits connections to the ones originating from the machine itself.

systemctl show prometheus-node-exporter.service to check what’s in store for us. It looks like a well-thought-out systemd unit file

...
ExecStart={ path=/usr/bin/prometheus-node-exporter ; argv[]=/usr/bin/prometheus-node-exporter $NODE_EXPORTER_ARGS ; ignore_errors=no ; start_time=[Sun 2023-11-05 20:49:05 UTC] ; stop_time=[n/a] ; pid=27419 ; code=(null) ; status=0/0 }
...

It uses $NODE_EXPORTER_ARGS to save us from overriding the whole ExecStart. That’s plain awesome! When ExecStart changes with future updates, it will be updated by the package manager since we’re not overriding it.

All that is left to do is to set a variable. systemctl edit prometheus-node-exporter.service to define an override. cat /etc/systemd/system/prometheus-node-exporter.service.d/override.conf to check that it’s saved.

[Service]
Environment="NODE_EXPORTER_ARGS=--web.listen-address=127.0.0.1:9100"

Looks good. systemctl daemon-reload to get new config into systemd. systemctl start prometheus-node-exporter.service.

Configs are not configuring

systemctl show prometheus-node-exporter.service shows that the configuration line is present:

...
Environment=NODE_EXPORTER_ARGS=--web.listen-address=127.0.0.1:9100
EnvironmentFiles=/etc/conf.d/prometheus-node-exporter (ignore_errors=yes)
...

Yet it starts with the default parameters. I’ve tried a different port just to make sure that it’s a configuration issue, not an exporter ignoring the IP address part. Still the defaults.

Only after 10 more minutes of playing around with ps and ss, I’ve noticed that this unit file comes with EnvironmentFiles defined. Well, it’s definitively a good place to define a variable. Let’s try it then.

Huh, it’s not empty cat /etc/conf.d/prometheus-node-exporter:

NODE_EXPORTER_ARGS=""

The variable I’m struggling with is conveniently defined here as an empty one. It overrode my attempts to define it in systemd unit override.

So, I’m setting it in an environment file:

NODE_EXPORTER_ARGS="--web.listen-address=127.0.0.1:9100"

Units reload, service restart, and node_exporter is no longer exposed. Mission accomplished 🎉

Concluding Thoughts

Crisis averted but I’m left with a feeling that a thing that is made for convenience is causing me trouble. I’m not aware of any consensus on clearing or resetting a variable for a systemd unit file externally. On the other hand, I can see how it’s useful not to have your service broken because of a variable set elsewhere. Fortunately, systemd doesn’t source many places to get environmental variables for a service. NODE_EXPORTER_ARGS is a self-explanatory and unique name. The chance of interfering with a variable defined elsewhere is extremely low. In my humble opinion it’s better to leave this one commented out. It’s easy to uncomment and use while the user still has the freedom to define a variable in a unit file. Moreover, the Env file is managed by the package manager. The service override file isn’t touched by a package manager, so the chance of getting a .pacnew conflict is lower.

I want to be able to use a systemd unit override. I want to define variables with it. Please, don’t block it with an external (env) file.

Broader Perspective

Why stop at my opinion? What’s there in other Arch packages? Community and AUR packages are less affected by the official Arch Linux way of doing things. I’m going through the main packages.

I’ve managed to find 58 files with variables in ExecStart using GitLab search:

Gently click 🫵🏼 to see 58 links to Arch GitLab instance

To make the following research a bit easier, I’ve downloaded repositories with these files locally. Since some files belong to the same package, there are only 46 repositories to look through.

Almost all Prometheus exporters use this pattern:

Slightly push to see RigGreping through exporters' config files
rg --files | rg 'exporter*\.conf' | xargs -l1 bat --style 'header,grid'
──────────────────────────────────────────────────────────────────────────────────
File: nginx-prometheus-exporter/nginx-prometheus-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
NGINX_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-postgres-exporter/prometheus-postgres-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
DATA_SOURCE_NAME=""
POSTGRESQL_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-blackbox-exporter/prometheus-blackbox-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
BLACKBOX_EXPORTER_ARGS="--config.file='/etc/prometheus/blackbox.yml'"
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-systemd-exporter/prometheus-systemd-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
SYSTEMD_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-node-exporter/prometheus-node-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
NODE_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-bird-exporter/prometheus-bird-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
BIRD_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-mysqld-exporter/prometheus-mysqld-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
DATA_SOURCE_NAME=""
MYSQLD_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-wireguard-exporter/prometheus-wireguard-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
WIREGUARD_EXPORTER_ARGS="--prepend_sudo=true"
──────────────────────────────────────────────────────────────────────────────────
──────────────────────────────────────────────────────────────────────────────────
File: prometheus-memcached-exporter/prometheus-memcached-exporter.conf
──────────────────────────────────────────────────────────────────────────────────
MEMCACHED_EXPORTER_ARGS=""
──────────────────────────────────────────────────────────────────────────────────

It’s not a good sample since most of them are packaged by the same person (Jelle van der Waa). Setting instead of resetting was introduced by others.

By now there is setting an *ARGS variable in Env file and resetting it.

I went through the rest of the packages and files. There are two most common names for variables used in ExecStart - *ARGS and *OPTS. So there is no agreement on that which is fine.

There are some other packages that only reset a variable in a Env file: rngd, subversion, prometheus, kubelet.

badvpn sets one and resets args variable:

NCD_CONFIG="/etc/ncd.conf"
NCD_ARGS=""

and uses both in a unit file:

ExecStart=/usr/bin/badvpn-ncd $NCD_ARGS --config-file $NCD_CONFIG

Some use Env file only to set variables: dkfilter, cyrus-sasl, dhcp, opendkim

Others set variables in a systemd unit file: openssh agent, logstash, logstash:

[Service]
...
Environment=LS_HOME=/var/lib/logstash
Environment=LS_HEAP_SIZE="500m"
Environment=LS_CONF_DIR=/etc/logstash/conf.d
Environment=LS_LOG_DIR=/var/log/logstash
Environment=LS_SETTINGS_DIR=/etc/logstash
...
ExecStart=/usr/share/logstash/bin/logstash -f $LS_CONF_DIR  --path.logs $LS_LOG_DIR --path.data $LS_HOME --path.settings $LS_SETTINGS_DIR
...

Back to the Env file. iodine & Jenkins define a bunch of variables but there are two that get purged JAVA_OPTS, JENKINS_OPTS

JAVA=/usr/lib/jvm/java-17-openjdk/bin/java
JAVA_ARGS=-Xmx512m
JAVA_OPTS=
JENKINS_USER=jenkins
JENKINS_HOME=/var/lib/jenkins
JENKINS_WAR=/usr/share/java/jenkins/jenkins.war
JENKINS_WEBROOT=--webroot=/var/cache/jenkins
JENKINS_PORT=--httpPort=8090
JENKINS_OPTS=
JENKINS_COMMAND_LINE="$JAVA $JAVA_ARGS $JAVA_OPTS -jar $JENKINS_WAR $JENKINS_WEBROOT $JENKINS_PORT $JENKINS_OPTS"

opensearch & elasticsearch set variables in a unit file. In an Env file, JAVA_HOME is defined and other JVM/runtime-specific variables are commented out.

With distcc we finally move to my suggested turf - comments:

DISTCC_ARGS="--allow 127.0.0.1"
#DISTCC_ARGS="--allow 192.168.0.0/24 --log-level error --log-file /tmp/distccd.log"

A default value for arguments and commented out a suggestion on how to use it. My idea of commenting suggestions in getting somewhere. The next two take it to the state I was thinking about after figuring out the node_exporter trick.

openfire env file:

# If you wish to set any specific options to pass to the JVM, you can
# set them with the following variable.
#OPENFIRE_OPTS="-Xmx1024m"%

syncplay env file:

# This is the file that [email protected] loads settings from, it does not affect the binary itself
# See https://syncplay.pl/guide/server/ for a list of available flags and description
#port="--port=8999"
#isolate="--isolate-room"
#password="--password yourpassword"
#salt="--salt RANDOMSALT"
#motd="--motd-file /etc/syncplay/motd"
#ready="--disable-ready"
#chat="--disable-chat"
#maxChars="--max-chat-message-length 500"
#usernameLength="--max-username-length 20"
#statsFile="--stats-db-file /etc/syncplay/stats.db"
#tls="--tls /etc/letsencrypt/live/syncplay.example.com/"

A bit of documenting comments and commented-out suggestions that leave me with the freedom to define variables in whatever way I please.

I’m afraid that there is no silver bullet, no proper way to define environmental variables in any situation. It depends, as always. Whether there are default arguments you have to provide and many probably many other factors I’m missing without a deeper dive. Resetting a variable outside of the systemd unit file where it’s used does feel like a wrong one though.