Heartbeat Monitoring: Detect Failed Cron and Batch Jobs
What Is Heartbeat Monitoring?
Heartbeat monitoring works by having the monitored task send a periodic "I'm alive" signal, and treating the absence of that signal as a failure. Just like a heartbeat: as long as the pulse keeps coming, everything is fine; when it stops, something is wrong.
Most website monitoring works the other way around: the monitoring service reaches out to your site and checks the response. But tasks like cron jobs and overnight batch processes cannot be reached from the outside, so this approach does not work for them. Heartbeat monitoring solves this by having the task ping a URL only when it succeeds, which means you can detect when a job failed to run or crashed partway through.
Active vs Passive Monitoring
Monitoring generally flows in one of two directions.
| Active (Pull) | Passive (Push) | |
|---|---|---|
| Signal direction | Monitoring service → target | Target → monitoring service |
| Typical targets | Websites, APIs, servers | Cron jobs, batch tasks, scheduled work |
| What it detects | Site down, slow response, SSL expiry | Job not run, job failed, process stopped |
| Also known as | External / synthetic monitoring | Heartbeat / dead man's switch |
HTTP and Ping checks like synthetic monitoring are active. Heartbeat monitoring is passive, and the key difference is that it treats a signal that never arrives as the alert condition. The two are not competitors; you pick the right one for each target.
Why Cron Jobs and Batch Tasks Need It
Scheduled jobs tend to fail silently and stay broken because nobody notices. Consider these common scenarios:
- A nightly database backup cron stopped running after a server reboot and stayed dead for weeks
- An inventory sync batch crashed on an error, but the failure email never went out either
- A typo in the crontab meant the job never executed at all
These are not "something happened" failures; they are "something that should have happened did not" failures. Active monitoring cannot catch them. Heartbeat monitoring can, because it expects a signal only when the job completes successfully and alerts you when that signal goes missing.
Setting It Up in Cron and Batch Tasks
The core pattern is simple: ping the signal URL only when the task succeeds. Using the shell && operator, the ping runs only if the preceding command exits successfully.
# Run the backup at 3:00 daily and send a heartbeat only on success.
# If the backup fails (non-zero exit), no ping is sent and the monitor
# goes Down once the interval is exceeded.
0 3 * * * /usr/local/bin/backup.sh && curl -fsS https://miterl.com/heartbeat/YOUR_TOKEN
You can also send the signal from inside the script itself, adding a single line at the very end after the real work completes.
import urllib.request
HEARTBEAT_URL = "https://miterl.com/heartbeat/YOUR_TOKEN"
def main():
run_inventory_sync() # the actual work
run_report_export()
# Reached only if everything above ran without raising an exception.
urllib.request.urlopen(HEARTBEAT_URL, timeout=10)
if __name__ == "__main__":
main()
If an exception is raised partway through, the last line is never reached and no ping is sent, so you also detect jobs that fail midway.
Setting Up a Heartbeat Monitor with Miterl
In Miterl, simply choosing "Heartbeat" as the monitor type generates a dedicated ping URL for you.
- Create a new monitor and select Heartbeat as the type
- Set the expected interval (for a once-a-day job, 24 hours plus a buffer)
- Copy the ping URL shown on the monitor detail page
- Append a call to that URL at the end of your cron or batch task, on success only
The generated URL looks like this. The token is a hard-to-guess random value, so only a process that knows the URL can send a signal.
# The heartbeat URL Miterl generates (just send a GET request).
# Receiving it sets the monitor to Up and records the last-seen time.
curl -fsS https://miterl.com/heartbeat/abc123def456...
# Response: {"ok":true}
Each time a signal arrives, the monitor is marked Up and the last-seen timestamp is recorded. If the next signal fails to arrive before the configured interval elapses, the monitor automatically flips to Down, an incident is created, and an alert is delivered to Slack or email.
Best Practices for the Interval
The most important design decision in heartbeat monitoring is the interval (timeout). Too short and normal delays cause false alarms; too long and you find out about failures too late.
| Job type | Frequency | Recommended interval |
|---|---|---|
| Queue worker | Every minute | 5–10 minutes |
| Hourly sync batch | Hourly | 90 minutes – 2 hours |
| Daily backup | Once a day | 25–26 hours |
| Weekly report | Once a week | ~8 days |
The principle is to set the interval slightly longer than the run frequency. Leave enough margin to absorb variance in run time and delays caused by server load. Miterl supports long intervals such as daily and weekly, so even low-frequency batch jobs can have an appropriate timeout.
Summary
Heartbeat monitoring exists to detect not that something is running, but that something failed to run. It is essential for cron jobs and batch tasks that cannot be reached from the outside.
- Heartbeat monitoring is passive (push-based) and treats a missing signal as a failure
- The core pattern is to ping the URL only when a cron or batch task succeeds
- Miterl issues a dedicated ping URL the moment you choose the Heartbeat type
- Set the interval slightly longer than the run frequency to avoid false alarms
For monitoring publicly reachable websites, see synthetic monitoring, and for what to do when something breaks, read the incident response guide. Full setup details are in the documentation, and you can try it on a free plan.