2026-05-25

Heartbeat Monitoring: Detect Failed Cron and Batch Jobs

heartbeat monitoring cron monitoring batch jobs uptime monitoring

What Is Heartbeat Monitoring?

Heartbeat monitoring works by having the monitored task send a periodic "I'm alive" signal, and treating the absence of that signal as a failure. Just like a heartbeat: as long as the pulse keeps coming, everything is fine; when it stops, something is wrong.

Most website monitoring works the other way around: the monitoring service reaches out to your site and checks the response. But tasks like cron jobs and overnight batch processes cannot be reached from the outside, so this approach does not work for them. Heartbeat monitoring solves this by having the task ping a URL only when it succeeds, which means you can detect when a job failed to run or crashed partway through.

Active vs Passive Monitoring

Monitoring generally flows in one of two directions.

	Active (Pull)	Passive (Push)
Signal direction	Monitoring service → target	Target → monitoring service
Typical targets	Websites, APIs, servers	Cron jobs, batch tasks, scheduled work
What it detects	Site down, slow response, SSL expiry	Job not run, job failed, process stopped
Also known as	External / synthetic monitoring	Heartbeat / dead man's switch

HTTP and Ping checks like synthetic monitoring are active. Heartbeat monitoring is passive, and the key difference is that it treats a signal that never arrives as the alert condition. The two are not competitors; you pick the right one for each target. For how to design active HTTP response monitoring by status code, see "HTTP Response Monitoring: Detect Failures by Status Code."

Why Cron Jobs and Batch Tasks Need It

Scheduled jobs tend to fail silently and stay broken because nobody notices. Consider these common scenarios:

A nightly database backup cron stopped running after a server reboot and stayed dead for weeks
An inventory sync batch crashed on an error, but the failure email never went out either
A typo in the crontab meant the job never executed at all

These are not "something happened" failures; they are "something that should have happened did not" failures. Active monitoring cannot catch them. Heartbeat monitoring can, because it expects a signal only when the job completes successfully and alerts you when that signal goes missing.

Setting It Up in Cron and Batch Tasks

The core pattern is simple: ping the signal URL only when the task succeeds. Using the shell && operator, the ping runs only if the preceding command exits successfully.

# Run the backup at 3:00 daily and send a heartbeat only on success.
# If the backup fails (non-zero exit), no ping is sent and the monitor
# goes Down once the interval is exceeded.
0 3 * * * /usr/local/bin/backup.sh && curl -fsS https://miterl.com/heartbeat/YOUR_TOKEN

You can also send the signal from inside the script itself, adding a single line at the very end after the real work completes.

import urllib.request

HEARTBEAT_URL = "https://miterl.com/heartbeat/YOUR_TOKEN"

def main():
    run_inventory_sync()   # the actual work
    run_report_export()
    # Reached only if everything above ran without raising an exception.
    urllib.request.urlopen(HEARTBEAT_URL, timeout=10)

if __name__ == "__main__":
    main()

If an exception is raised partway through, the last line is never reached and no ping is sent, so you also detect jobs that fail midway.

Setting Up a Heartbeat Monitor with Miterl

In Miterl, simply choosing "Heartbeat" as the monitor type generates a dedicated ping URL for you.

Create a new monitor and select Heartbeat as the type
Set the expected interval (for a once-a-day job, 24 hours plus a buffer)
Copy the ping URL shown on the monitor detail page
Append a call to that URL at the end of your cron or batch task, on success only

The generated URL looks like this. The token is a hard-to-guess random value, so only a process that knows the URL can send a signal.

# The heartbeat URL Miterl generates (just send a GET request).
# Receiving it sets the monitor to Up and records the last-seen time.
curl -fsS https://miterl.com/heartbeat/abc123def456...
# Response: {"ok":true}

Each time a signal arrives, the monitor is marked Up and the last-seen timestamp is recorded. If the next signal fails to arrive before the configured interval elapses, the monitor automatically flips to Down, an incident is created, and an alert is delivered to Slack or email.

Best Practices for the Interval

The most important design decision in heartbeat monitoring is the interval (timeout). Too short and normal delays cause false alarms; too long and you find out about failures too late.

Job type	Frequency	Recommended interval
Queue worker	Every minute	5–10 minutes
Hourly sync batch	Hourly	90 minutes – 2 hours
Daily backup	Once a day	25–26 hours
Weekly report	Once a week	~8 days

The principle is to set the interval slightly longer than the run frequency. Leave enough margin to absorb variance in run time and delays caused by server load. Miterl supports long intervals such as daily and weekly, so even low-frequency batch jobs can have an appropriate timeout.

Summary

Heartbeat monitoring exists to detect not that something is running, but that something failed to run. It is essential for cron jobs and batch tasks that cannot be reached from the outside.

Heartbeat monitoring is passive (push-based) and treats a missing signal as a failure
The core pattern is to ping the URL only when a cron or batch task succeeds
Miterl issues a dedicated ping URL the moment you choose the Heartbeat type
Set the interval slightly longer than the run frequency to avoid false alarms

For monitoring publicly reachable websites, see synthetic monitoring, and for what to do when something breaks, read the incident response guide. To take automation further — triggering rollbacks or on-call escalations when a heartbeat goes missing — see Webhook Integration for Uptime Monitoring. To wire heartbeat monitoring into a pre-launch script that also checks HTTPS, DNS, and mail authentication, see "Pre-Launch Test Automation Guide for Web Agencies." Full setup details are in the documentation, and you can try it on a free plan. While heartbeat monitoring watches your scheduled tasks, remember that SSL certificate expiry is a separate failure mode — one that catches your site visitors off guard when the padlock disappears. Pair heartbeat checks with automated certificate expiry alerts, as explained in "How to Prevent SSL Certificate Expiry Incidents with Automated Monitoring." For agencies managing multiple WordPress sites, SSL renewal failures have WordPress-specific causes — caching plugins blocking the ACME challenge, security plugins intercepting certbot, and certbot config not surviving server migrations. "Managing SSL Certificate Expiry Across WordPress Client Sites" covers the failure patterns and the Miterl setup to catch them before clients notice. If you are currently on UptimeRobot and wondering whether heartbeat monitoring support is a good reason to switch, "Why Agencies Switch from UptimeRobot to Miterl: 5 Reasons" lays out the full comparison.

Heartbeat monitoring confirms that a scheduled task ran, but it cannot tell you whether an external-facing API or web page is responding slowly. For complete coverage, pair heartbeat checks with HTTP response time monitoring that fires an alert when response latency crosses a threshold or a status code outside 2xx appears. "Response Time Monitoring Guide: Detecting HTTP Response Failures Early" covers threshold configuration and alert setup in detail.