Systemd Timer Monitoring: How to Detect Failed or Missed Timers

Systemd timer monitoring matters when you use Linux timers for real production work: backups, imports, billing tasks, report generation, cleanup scripts, queue maintenance, certificate renewal, and dozens of other scheduled jobs that nobody wants to babysit.

Systemd timers are often cleaner than cron. They integrate with systemctl, log through journald, support dependencies, and can run missed jobs after boot. But they still have one uncomfortable weakness: a timer can stop doing useful work while the server itself looks perfectly healthy.

The machine is up. SSH works. Your app responds. The timer unit exists.

And yet the job did not run.

That is the gap systemd timer monitoring should close.

The problem

A systemd timer is usually made of two units:

# /etc/systemd/system/example-backup.timer
[Unit]
Description=Run example backup daily

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

And the service it triggers:

# /etc/systemd/system/example-backup.service
[Unit]
Description=Example daily backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/example-backup.sh

You enable it:

systemctl enable --now example-backup.timer

Then you check it:

systemctl list-timers

Everything looks fine.

The problem is that “timer exists” does not mean “the work is being completed successfully.”

A timer can be active while the service fails. A service can exit successfully while the script skipped the important part. A job can hang forever. A server can be off during the scheduled window. A deployment can replace the script path. Permissions can change. Environment variables can disappear.

If nobody checks the actual execution signal, these failures can stay silent for days.

Why it happens

Systemd timers are reliable, but they are not magic. They schedule execution. They do not automatically prove that the business task succeeded.

Common failure modes include:

The .timer unit is enabled, but the .service unit fails.
The service exits with code 0, but the script did not complete meaningful work.
The job depends on network access before the network is ready.
The script works manually but fails under systemd’s limited environment.
The timer was disabled during maintenance and never re-enabled.
The server rebooted, and the timer did not catch up because Persistent=true was missing.
A long-running service overlaps with the next scheduled run.
Logs rotate or disappear before anyone checks them.
A package update changes permissions, paths, or runtime behavior.

A classic example is a backup script:

#!/usr/bin/env bash
set -euo pipefail

pg_dump "$DATABASE_URL" > /backups/app.sql
aws s3 cp /backups/app.sql s3://example-backups/app.sql

This may work perfectly from your shell.

But when systemd runs it, $DATABASE_URL may not exist. The AWS credentials may not be loaded. The script may not have permission to write to /backups. DNS may fail for a few minutes after boot.

You will probably see the failure in journald if you look:

journalctl -u example-backup.service

But the whole point of monitoring is not needing to remember to look.

Why it’s dangerous

Missed systemd timers are dangerous because they usually affect work that happens behind the scenes.

Users do not immediately notice that:

backups stopped running
reports were not generated
invoices were not sent
expired sessions were not cleaned up
data syncs stopped
temporary files are filling the disk
webhooks are not being retried
usage counters are stale
SSL renewal hooks did not run

The app can look healthy while important background work is broken.

This is why uptime monitoring is not enough. An uptime check tells you that an HTTP endpoint responded. It does not tell you that last night’s backup finished. It does not tell you that a timer ran at 03:00. It does not tell you that your cleanup job is stuck waiting on a locked file.

For small teams and side projects, this can be especially painful. You may not have a full observability stack. You may not check servers every morning. You may only discover the issue when something has already gone wrong.

A missed timer is rarely dramatic at first. It is quiet.

That is what makes it risky.

How to detect it

Good systemd timer monitoring should answer a simple question:

Did the expected job complete within the expected time window?

There are a few signals you can use.

First, systemd itself can show scheduled timers:

systemctl list-timers --all

This tells you the next run, last run, and associated unit.

Second, you can inspect service status:

systemctl status example-backup.service

Third, you can check logs:

journalctl -u example-backup.service --since "24 hours ago"

These are useful debugging tools.

But they are mostly pull-based. You have to remember to check them.

For production monitoring, you usually want push-based detection. The job should emit a small success signal after it completes. If that signal does not arrive on time, your monitoring system alerts you.

That is heartbeat monitoring.

The timer runs the service. The service runs the script. At the end of a successful run, the script sends a heartbeat ping.

If the ping arrives, the job completed.

If the ping does not arrive by the expected deadline, something is wrong:

the timer did not fire
the service failed
the script crashed
the server was down
the network was unavailable
the job hung before completion

Heartbeat monitoring does not replace logs. It answers a different question: “Did the scheduled work happen?”

Simple solution

Let’s say you have a daily backup job triggered by a systemd timer.

Your service calls this script:

#!/usr/bin/env bash
set -euo pipefail

BACKUP_FILE="/var/backups/app-$(date +%F).sql"

pg_dump "$DATABASE_URL" > "$BACKUP_FILE"
gzip "$BACKUP_FILE"
aws s3 cp "$BACKUP_FILE.gz" "s3://example-backups/"

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

The important part is that the ping happens only after the meaningful work succeeds.

Do not ping at the start. Do not ping before the upload. Do not ping before the database dump completes.

Ping after success.

Your service file might look like this:

[Unit]
Description=Daily application backup

[Service]
Type=oneshot
EnvironmentFile=/etc/example-backup.env
ExecStart=/usr/local/bin/example-backup.sh

Your timer:

[Unit]
Description=Run daily application backup

[Timer]
OnCalendar=03:00
Persistent=true
Unit=example-backup.service

[Install]
WantedBy=timers.target

Then enable it:

systemctl daemon-reload
systemctl enable --now example-backup.timer

Check that systemd knows about it:

systemctl list-timers example-backup.timer

With heartbeat monitoring, you configure the expected interval externally. For example, if the backup runs every day at 03:00, you might expect one ping every 24 hours with a small grace period.

If no ping arrives, you get alerted.

Instead of building that alerting logic yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a monitor, copy the ping URL, and call it from the end of your systemd-triggered script. The important idea is still the same: alert on missing success signals, not just server uptime.

A better pattern for scripts

For more robust scripts, use a trap so failures are easier to debug locally, but keep the success ping at the end.

Example:

#!/usr/bin/env bash
set -euo pipefail

log() {
  echo "[$(date --iso-8601=seconds)] $*"
}

log "Starting backup"

BACKUP_FILE="/var/backups/app-$(date +%F).sql"

pg_dump "$DATABASE_URL" > "$BACKUP_FILE"
gzip "$BACKUP_FILE"
aws s3 cp "$BACKUP_FILE.gz" "s3://example-backups/"

log "Backup completed successfully"

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

log "Heartbeat sent"

This gives you two layers:

journald logs for investigation
heartbeat monitoring for missed execution detection

If the script fails before the final curl, the heartbeat does not fire. That is exactly what you want.

Common mistakes

1. Monitoring only the timer unit

Checking that a timer is enabled is not enough.

systemctl is-enabled example-backup.timer

This only tells you that systemd is configured to schedule it. It does not prove successful execution.

You need to monitor completion, not configuration.

2. Sending the heartbeat too early

A common mistake is placing the ping at the top of the script:

curl -fsS "https://quietpulse.xyz/ping/YOUR_TOKEN"

pg_dump "$DATABASE_URL" > backup.sql

This creates a false positive. The monitor sees a successful ping even if the actual job fails immediately afterward.

The ping should be the last step after the important work completes.

3. Ignoring the systemd environment

Systemd services do not run with the same environment as your interactive shell.

This often breaks scripts that depend on:

shell profile files
local PATH changes
exported secrets
user-specific credentials
working directories

Use explicit paths, EnvironmentFile=, and clear permissions.

4. Forgetting `Persistent=true`

If a server is off during a scheduled time, Persistent=true tells systemd to run the missed timer after boot.

Without it, some jobs may simply be skipped.

For daily maintenance jobs, backups, and syncs, this setting is often worth enabling.

5. Not setting timeouts

A oneshot service can hang longer than expected if a command waits forever.

Use systemd options like:

[Service]
Type=oneshot
TimeoutStartSec=30min

A hung timer can be just as bad as a missed one.

Alternative approaches

Heartbeat monitoring is usually the simplest way to detect missed timers, but it is not the only useful signal.

Journald logs

You can inspect logs with:

journalctl -u example-backup.service --since today

This is excellent for debugging.

But logs are passive. They help after you know something is wrong.

Systemd status checks

You can check failed units:

systemctl --failed

Or inspect one service:

systemctl status example-backup.service

This helps catch hard service failures.

But it may not catch a script that exits successfully while doing incomplete work.

Metrics and dashboards

If you already use Prometheus, Grafana, or another monitoring stack, you can export timer metrics and alert on them.

This is powerful, but it may be too much for a small VPS, indie app, or simple background job.

Email from scripts

Some scripts send email on failure. This can work, but it depends on mail delivery, spam filtering, and correct error handling.

Also, failure-only alerts do not catch every missed run. If the script never starts, it may never send the email.

Uptime checks

Uptime checks are still useful for web apps.

They just do not answer the systemd timer question. Your website can be up while your daily job is broken.

Use uptime checks for endpoints. Use heartbeat checks for scheduled work.

FAQ

What is systemd timer monitoring?

Systemd timer monitoring is the practice of checking whether scheduled systemd timer jobs actually run and complete successfully. It usually combines systemd status, logs, and heartbeat checks that alert when an expected job does not report success.

How do I know if a systemd timer failed?

You can start with:

systemctl list-timers --all
systemctl status your-service.service
journalctl -u your-service.service

For proactive detection, add a heartbeat ping at the end of the job and alert when the ping is missing.

Are systemd timers better than cron?

Systemd timers are often better for Linux services because they integrate with unit dependencies, journald, boot behavior, and systemctl. Cron is simpler and widely known. Both still need monitoring if the scheduled work matters.

Can uptime monitoring detect missed systemd timers?

No, not reliably. Uptime monitoring checks whether a service or endpoint responds. A missed systemd timer can happen while the server and application are still online.

Where should I put the heartbeat ping?

Put the heartbeat ping at the end of the script, after the important work has completed successfully. If you ping at the beginning, you may hide failures that happen later.

Conclusion

Systemd timers are a strong replacement for many cron jobs, but they still need monitoring.

Do not stop at “the timer is enabled.” Monitor whether the job actually completed.

Use systemd logs and status for debugging. Use heartbeat monitoring to catch missed or failed execution automatically. For backups, syncs, reports, cleanup scripts, and other scheduled production work, that small success ping can be the difference between a quiet failure and an early alert.