I have a bunch of batch jobs that are executed by systemd periodically, and I was interested in ways I could have visibility into success rate as well as be notified of any failures. Since I already use Google Cloud on the same project, I decided to give the logging system a try and was pleasantly surprised at how easy it was to set it up, so I decided to document it here in case somebody finds this useful.

Even though the Google service is called Logging, I'm calling these metrics in the rest of the post as I don't want it to be confused with the actual logs of a systemd job (eg. what you'd get with journalctl -xe or systemctl status <jobname>).

There are 2 aspects of the solution:

Getting the result of a job run

My initial thoughts were that I'll have to write something to query systemd about the status of the job and then use one of the ways to push the metrics into Google Logging, but then I discovered the ExecStartPost= and ExecStopPost= hooks. Both can be defined in the [Service] section of the systemd unit file and are documented here, but the TL;DR is that systemd will execute these after the job is started or stopped respectively. Important detail with the ExecStopPost= is that it's called regardless of the success of the main job systemd was executing defined in ExecStart=.

I'm specifically using ExecStopPost= since my batch jobs are Type=oneshot and take anywhere from 5 to 30 seconds. ExecStartPost= would be useful for a long running service if you're interested in knowing that it was started. Another benefit of the ExecStopPost= is that since the job is done when systemd invokes it, systemd will pass in 3 environment variables with some info on the status of the job that just ran, $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS. You can see from their names what they represent, for more info, check out the full documentation.

Pushing results into Google Logging

Google has a bunch of ways to push the logs in, I went with what was the simplest in this case, using the Cloud SDK and the gcloud CLI tool. Install was pretty simple and took about 2 minutes.

Now it was just a matter of putting the gcloud call into the ExecStopPost= hook and my first attempt looked like this:

[Service]
...
ExecStart=...
ExecStopPost=/path/to/gcloud logging write price-sync-intraday '{ "message": "Intraday price sync completed with ${SERVICE_RESULT}", "result": "${SERVICE_RESULT}", "exit_code": "${EXIT_CODE}", "exit_status": "${EXIT_STATUS}" }' --payload-type=json

Looks a bit ugly, but works. Third argument to gcloud is the name of the log and the fourth is the actual payload. Payload has no structure, except whatever you put in the message will be the "title" of the log in the log viewer. Since I'm using this in almost a dozen jobs, and didn't want to paste this mess in a dozen unit files (and customize the name of the job in each), I extracted this into a shell script post_metrics.sh and replaced ExecStopPost=:

[Service]
...
ExecStart=...
ExecStopPost=post_metrics.sh %N $SERVICE_RESULT $EXIT_CODE $EXIT_STATUS

Here I'm also using %N specifier to pass the name of the systemd job to the script, so the script can post to the appropriate log category.

And finally, here's how that looks like in the Log Viewer:

Google Log Viewer

From here, it was pretty easy to define a metric based on either of the fields in the JSON payload, and then an alert based on that metric.