Back to Blog
2026-04-03 โ€ข 8 min read

Heartbeat Monitoring for Cron Jobs Explained

You set up a backup script to run every night at 2 AM. Cron says it's scheduled. The logs look fine from last week. But nobody actually checked whether it ran last night. Three weeks pass. A database corruption hits, and the backup that should have saved you โ€” never ran. Nobody noticed.

This is the exact problem heartbeat monitoring for cron jobs solves: it tells you when a job doesn't show up on time, without you having to ask.

The Problem

Cron is fire-and-forget. You schedule a task, and that's it. If the job fails to start, hangs, or exits with an error code you don't capture โ€” cron stays silent. There's no built-in mechanism to say "hey, I was supposed to run but something went wrong."

Most teams discover this the hard way. Reports stop generating. Backups go stale. Data syncs fall behind. And the alert comes from an angry customer, not from your own infrastructure.

Why It Happens

Cron jobs fail for reasons that have nothing to do with the script itself:

  • Resource exhaustion โ€” the server ran out of memory, the process got killed by the OOM killer
  • Dependency failures โ€” a database connection pool is full, an API endpoint moved
  • Silent hangs โ€” a network request times out after your timeout threshold, or a lock file wasn't released from a previous run
  • Permission changes โ€” a credentials file rotated, file permissions changed
  • Silent success โ€” the script ran but produced corrupt output (exit code 0, wrong result)

None of these necessarily produce an error in the cron log. The system believes everything is fine.

Why It's Dangerous

The danger scales with how critical the job is. A daily report that stops generating is annoying. A nightly database backup that silently stops is catastrophic.

Here's what makes it worse:

  1. It compounds โ€” the longer a job has been failing, the harder it is to recover. Missing backups snowball. Unprocessed queues grow.
  2. You lose trust โ€” once you discover a silent failure, you start second-guessing everything else.
  3. Detection costs time โ€” by the time you notice, you're not fixing a 5-minute issue. You're recovering from weeks of accumulated damage.

The worst failures are the ones you don't know about.

How Heartbeat Monitoring Works

The concept is borrowed from network monitoring, where a "heartbeat" is a periodic signal that says "I'm alive." Applied to cron jobs, it works like this:

  1. Your job sends a lightweight HTTP request ("I just finished") to a monitoring endpoint when it completes.
  2. The monitoring system expects to receive this signal on a defined schedule.
  3. If the signal doesn't arrive within the expected window, the monitoring system alerts you.
text
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โœ… "I ran!"      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Cron Job   โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’   โ”‚  Monitoring  โ”‚
โ”‚  (any task) โ”‚                       โ”‚  Service     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ”‚
                                  Missed? โ”€โ”€โ”€โ”€โ”ค
                                             โ”‚
                                       ๐Ÿ”” Alert!

The key insight: heartbeat monitoring detects absence of evidence. You don't need to predict every possible failure mode. If the job doesn't check in, something went wrong โ€” and you get told about it.

Simple Solution with curl

The simplest way to add heartbeat monitoring to any cron job is a single curl command:

# Your actual job
/usr/local/bin/backup.sh

# Send a heartbeat signal (only if the previous command succeeded)
if [ $? -eq 0 ]; then
  curl -fsS -m 10 --retry 3 https://your-monitor-endpoint.com/beat/job-123
fi

This sends a GET request after the backup script completes successfully. The monitoring endpoint expects this request every 24 hours. If it doesn't arrive, it fires an alert.

For more detailed monitoring, send exit codes:

/usr/local/bin/backup.sh
EXIT_CODE=$?

curl -fsS -m 10 --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d "{\"status\": \"$EXIT_CODE\", \"duration\": \"$SECONDS\"}" \
  https://your-monitor-endpoint.com/beat/job-123

This pattern works with any cron job โ€” shell scripts, Python scripts, Node.js, Go binaries. If your job can make an HTTP request, it can send a heartbeat.

Integrating QuietPulse into the Workflow

Instead of building this yourself, you can use a simple heartbeat monitoring tool like QuietPulse. You create jobs in the dashboard, copy a unique heartbeat URL into your scripts, and get Telegram alerts when jobs don't check in. No infrastructure, no configuration โ€” paste a URL and you're done. You can try it at quietpulse.xyz.

Common Mistakes

1. Only Sending Heartbeats on Success

If your job fails and never sends a heartbeat, you'll get an alert โ€” but you'll have no idea why it failed. Send the exit code or at least distinguish between "ran successfully" and "ran with errors."

2. Setting Timeout Windows Too Tight

If your job runs between 30 seconds and 3 minutes, don't set the monitoring window to 60 seconds. Random delays (slow DNS, temporary locks) will cause false alarms. Add buffer.

3. Not Handling the Heartbeat Request Itself

If the heartbeat HTTP call fails (network issue on your server), that shouldn't fail your job. Use curl -f with a timeout and don't chain it with set -e in bash scripts.

4. Monitoring Only the Easy Jobs

The jobs you monitor should be the ones that hurt most when they fail. Start with backups, data exports, payment reconciliation โ€” not log rotation.

5. Ignoring the Alert

This sounds obvious, but it happens constantly: teams set up heartbeat monitoring, get the first alert, dismiss it as a fluke, and miss the real pattern. Treat the first missed heartbeat as a real failure until proven otherwise.

Alternative Approaches

Heartbeat monitoring isn't the only way to detect cron job failures, but it's often the most practical. Here's how it compares to other approaches:

Log Monitoring

Parse cron logs (/var/log/cron or journalctl) and look for execution entries. Pros: no code changes. Cons: doesn't detect hangs or silent errors. The job might run and produce garbage output.

Exit Code Tracking

Capture and store exit codes from every cron job execution. Pros: more detail. Cons: requires wrapping every job, and still doesn't detect jobs that never start.

Output Monitoring

Check that your job produces the expected output files or database records. Pros: validates actual results. Cons: complex to set up for every job, requires knowing the expected output format.

Uptime Monitoring

Traditional uptime checks (pinging a server, checking HTTP response). Pros: simple. Cons: only tells you the server is up, not that your specific jobs ran.

Heartbeat Monitoring

The job actively reports completion. Pros: detects any failure that prevents the heartbeat from being sent. Cons: requires a small code change (adding the HTTP call).

For most teams, heartbeat monitoring provides the best signal-to-noise ratio: simple to set up, reliable, and it catches exactly what matters โ€” the jobs that didn't run.

FAQ

What is heartbeat monitoring for cron jobs?

Heartbeat monitoring is a pattern where a scheduled task sends a signal (like an HTTP request) when it completes. A monitoring system expects these signals on a defined schedule and alerts you if they stop arriving. It detects the absence of expected activity.

How is heartbeat monitoring different from log monitoring?

Log monitoring checks that cron tried to run a job. Heartbeat monitoring checks that the job actually completed successfully. A job can appear in cron logs while silently failing or hanging โ€” heartbeat monitoring catches this.

Do I need a special tool for heartbeat monitoring?

Technically, no. You can build a basic version with a simple API endpoint. But dedicated tools like QuietPulse handle scheduling, alert routing, history, and edge cases (timezone handling, grace periods) out of the box.

How often should I expect heartbeats?

Your heartbeat interval should match your job's schedule plus some buffer. A daily job should heartbeat every 24 hours with a grace period of 1โ€“2 hours. An hourly job might heartbeat every 60 minutes with a 15-minute grace period.

Can I send heartbeats from inside Docker containers or Kubernetes jobs?

Yes, as long as the container can make outbound HTTP requests. The heartbeat call is just a curl or equivalent โ€” it works from any environment with network access.

Conclusion

Cron is great at starting jobs and terrible at telling you when they fail. Heartbeat monitoring closes that gap by having each job check in when it's done. One extra line in your script, and you'll never find out about a missed backup from an angry user again.

The simplest approach: add a curl call at the end of your critical jobs. If you want something that handles scheduling, history, and alerts without building infrastructure, tools like QuietPulse make it painless. Either way, monitor the jobs that matter.