Basic metrics for systemd jobs with Google Cloud Logging
I have a bunch of batch jobs that are executed by systemd periodically, and I was interested in ways I could have visibility into success rate as well as be notified of any failures. Since I already use Google Cloud on the same project, I decided to give the logging system a try and was pleasantly surprised at how easy it was to set it up, so I decided to document it here in case somebody finds this useful.
Even though the Google service is called Logging, I'm calling these metrics
in the rest of the post as I don't want it to be confused with the actual logs
of a systemd job (eg. what you'd get with journalctl -xe
or systemctl
status <jobname>
).
There are 2 aspects of the solution:
- get the result of a job run
- push the result into Google Logging
Getting the result of a job run
My initial thoughts were that I'll have to write something to query systemd
about the status of the job and then use one of the ways to push the
metrics into Google Logging, but then I discovered the ExecStartPost=
and
ExecStopPost=
hooks. Both can be defined in the [Service]
section of the
systemd unit file and are documented here, but the TL;DR is that systemd
will execute these after the job is started or stopped respectively. Important
detail with the ExecStopPost=
is that it's called regardless of the success
of the main job systemd was executing defined in ExecStart=
.
I'm specifically using ExecStopPost=
since my batch jobs are Type=oneshot
and take anywhere from 5 to 30 seconds. ExecStartPost=
would be useful for a
long running service if you're interested in knowing that it was started.
Another benefit of the ExecStopPost=
is that since the job is done when
systemd invokes it, systemd will pass in 3 environment variables with some
info on the status of the job that just ran, $SERVICE_RESULT
, $EXIT_CODE
and $EXIT_STATUS
. You can see from their names what they represent, for more
info, check out the full documentation.
Pushing results into Google Logging
Google has a bunch of ways to push the logs in, I went with what was the
simplest in this case, using the Cloud SDK and the gcloud
CLI tool.
Install was pretty simple and took about 2 minutes.
Now it was just a matter of putting the gcloud
call into the ExecStopPost=
hook and my first attempt looked like this:
[Service]
...
ExecStart=...
ExecStopPost=/path/to/gcloud logging write price-sync-intraday '{ "message": "Intraday price sync completed with ${SERVICE_RESULT}", "result": "${SERVICE_RESULT}", "exit_code": "${EXIT_CODE}", "exit_status": "${EXIT_STATUS}" }' --payload-type=json
Looks a bit ugly, but works. Third argument to gcloud
is the name of the log
and the fourth is the actual payload. Payload has no structure, except
whatever you put in the message will be the "title" of the log in the log
viewer. Since I'm using this in almost a dozen jobs, and didn't want to paste
this mess in a dozen unit files (and customize the name of the job in each), I
extracted this into a shell script post_metrics.sh
and replaced
ExecStopPost=
:
[Service]
...
ExecStart=...
ExecStopPost=post_metrics.sh %N $SERVICE_RESULT $EXIT_CODE $EXIT_STATUS
Here I'm also using %N
specifier to pass the name of the systemd job to
the script, so the script can post to the appropriate log category.
And finally, here's how that looks like in the Log Viewer:
From here, it was pretty easy to define a metric based on either of the fields in the JSON payload, and then an alert based on that metric.