2026-05-25

Heartbeat Monitoring: Detect Failed Cron and Batch Jobs

heartbeat monitoring cron monitoring batch jobs uptime monitoring

What Is Heartbeat Monitoring?

Heartbeat monitoring works by having the monitored task send a periodic "I'm alive" signal, and treating the absence of that signal as a failure. Just like a heartbeat: as long as the pulse keeps coming, everything is fine; when it stops, something is wrong.

Most website monitoring works the other way around: the monitoring service reaches out to your site and checks the response. But tasks like cron jobs and overnight batch processes cannot be reached from the outside, so this approach does not work for them. Heartbeat monitoring solves this by having the task ping a URL only when it succeeds, which means you can detect when a job failed to run or crashed partway through.

Active vs Passive Monitoring

Monitoring generally flows in one of two directions.

Active (Pull) Passive (Push)
Signal direction Monitoring service → target Target → monitoring service
Typical targets Websites, APIs, servers Cron jobs, batch tasks, scheduled work
What it detects Site down, slow response, SSL expiry Job not run, job failed, process stopped
Also known as External / synthetic monitoring Heartbeat / dead man's switch

HTTP and Ping checks like synthetic monitoring are active. Heartbeat monitoring is passive, and the key difference is that it treats a signal that never arrives as the alert condition. The two are not competitors; you pick the right one for each target.

Why Cron Jobs and Batch Tasks Need It

Scheduled jobs tend to fail silently and stay broken because nobody notices. Consider these common scenarios:

  • A nightly database backup cron stopped running after a server reboot and stayed dead for weeks
  • An inventory sync batch crashed on an error, but the failure email never went out either
  • A typo in the crontab meant the job never executed at all

These are not "something happened" failures; they are "something that should have happened did not" failures. Active monitoring cannot catch them. Heartbeat monitoring can, because it expects a signal only when the job completes successfully and alerts you when that signal goes missing.

Setting It Up in Cron and Batch Tasks

The core pattern is simple: ping the signal URL only when the task succeeds. Using the shell && operator, the ping runs only if the preceding command exits successfully.

# Run the backup at 3:00 daily and send a heartbeat only on success.
# If the backup fails (non-zero exit), no ping is sent and the monitor
# goes Down once the interval is exceeded.
0 3 * * * /usr/local/bin/backup.sh && curl -fsS https://miterl.com/heartbeat/YOUR_TOKEN

You can also send the signal from inside the script itself, adding a single line at the very end after the real work completes.

import urllib.request

HEARTBEAT_URL = "https://miterl.com/heartbeat/YOUR_TOKEN"

def main():
    run_inventory_sync()   # the actual work
    run_report_export()
    # Reached only if everything above ran without raising an exception.
    urllib.request.urlopen(HEARTBEAT_URL, timeout=10)

if __name__ == "__main__":
    main()

If an exception is raised partway through, the last line is never reached and no ping is sent, so you also detect jobs that fail midway.

Setting Up a Heartbeat Monitor with Miterl

In Miterl, simply choosing "Heartbeat" as the monitor type generates a dedicated ping URL for you.

  1. Create a new monitor and select Heartbeat as the type
  2. Set the expected interval (for a once-a-day job, 24 hours plus a buffer)
  3. Copy the ping URL shown on the monitor detail page
  4. Append a call to that URL at the end of your cron or batch task, on success only

The generated URL looks like this. The token is a hard-to-guess random value, so only a process that knows the URL can send a signal.

# The heartbeat URL Miterl generates (just send a GET request).
# Receiving it sets the monitor to Up and records the last-seen time.
curl -fsS https://miterl.com/heartbeat/abc123def456...
# Response: {"ok":true}

Each time a signal arrives, the monitor is marked Up and the last-seen timestamp is recorded. If the next signal fails to arrive before the configured interval elapses, the monitor automatically flips to Down, an incident is created, and an alert is delivered to Slack or email.

Best Practices for the Interval

The most important design decision in heartbeat monitoring is the interval (timeout). Too short and normal delays cause false alarms; too long and you find out about failures too late.

Job type Frequency Recommended interval
Queue worker Every minute 5–10 minutes
Hourly sync batch Hourly 90 minutes – 2 hours
Daily backup Once a day 25–26 hours
Weekly report Once a week ~8 days

The principle is to set the interval slightly longer than the run frequency. Leave enough margin to absorb variance in run time and delays caused by server load. Miterl supports long intervals such as daily and weekly, so even low-frequency batch jobs can have an appropriate timeout.

Summary

Heartbeat monitoring exists to detect not that something is running, but that something failed to run. It is essential for cron jobs and batch tasks that cannot be reached from the outside.

  • Heartbeat monitoring is passive (push-based) and treats a missing signal as a failure
  • The core pattern is to ping the URL only when a cron or batch task succeeds
  • Miterl issues a dedicated ping URL the moment you choose the Heartbeat type
  • Set the interval slightly longer than the run frequency to avoid false alarms

For monitoring publicly reachable websites, see synthetic monitoring, and for what to do when something breaks, read the incident response guide. Full setup details are in the documentation, and you can try it on a free plan.