Cron Job Monitoring: Catch Silent Failures Before Users Do

Published June 10, 20269 min read
TL;DR
  • Cron jobs fail by not running — a dead server, a lost crontab, or a hung process produces zero errors and zero alerts
  • Use the heartbeat (dead man's switch) pattern: the job pings a URL on success, and monitoring alerts when the ping does not arrive
  • Chain the ping with &&, never ; — otherwise a failed job still sends a “success” heartbeat
  • Point heartbeats at a Hooklistener endpoint, attach an inactivity monitor, and get Telegram or Slack alerts when a job goes quiet
  • If the job itself is just an HTTP call, run it with Hooklistener Schedules: cron expressions, status-code and keyword success checks, and a 7-day execution history

Most monitoring tells you when something bad happens: an exception, a 500, a timeout. Cron failures are different. A cron job that stops running produces nothing— no error, no log line, no alert. The nightly backup just quietly stops existing, and you find out three weeks later when you actually need it. This guide covers how cron jobs fail silently, how the heartbeat pattern catches those failures, and how to set it up with Hooklistener — or a dedicated tool when that fits better.

How cron jobs fail without telling you

Unmonitored cron has three distinct failure modes, and only one of them is the kind your error tracker can see:

Silent skips: the job never starts

The server was rebuilt and the crontab didn't survive the migration. The VM died. The Kubernetes CronJob was suspended during an incident and never resumed. The container image changed and the scheduler entry points at a script that no longer exists. In every case, nothing runs — so nothing can throw an error.

Hangs: the job starts and never finishes

A database lock, a network call without a timeout, an interactive prompt waiting for input that will never come. Cron does not kill stuck jobs. The process sits there for days, and depending on your setup, new runs either pile up behind it or skip silently.

Partial failures with exit 0

The job runs, half of it fails, and it still reports success. This is the default behavior of a bash script: without set -e, the script keeps going after a failed command and exits with the status of the last command.

That last one bites constantly. Consider a backup script:

nightly-backup.sh
#!/usr/bin/env bash

# pg_dump fails (disk full, auth error, whatever) -> script keeps going.
# The s3 upload of an empty/partial file succeeds -> exit code 0.
# Cron sees "success". You have no backup.
pg_dump mydb > /backups/mydb.sql
aws s3 cp /backups/mydb.sql s3://my-backups/mydb.sql

The fix is one line at the top:

nightly-backup.sh
#!/usr/bin/env bash
set -euo pipefail

# Now any failed command (including inside pipes) aborts the
# script with a non-zero exit code that cron — and your
# heartbeat — can actually see.
pg_dump mydb > /backups/mydb.sql
aws s3 cp /backups/mydb.sql s3://my-backups/mydb.sql

Info:Classic cron emails stdout/stderr to MAILTO, which on most modern servers is a local mail spool nobody reads — or no working mail setup at all. And even a perfectly configured MAILTO only reports jobs that ran and failed. It says nothing about jobs that never ran.

The heartbeat pattern: a dead man's switch for cron

You cannot alert on an error that was never raised. So the heartbeat pattern inverts the logic: instead of waiting for a failure signal, the job sends a success signal — an HTTP request to a known URL — every time it completes. The monitoring side alerts when the signal does not arrive within the expected window. This is the same idea as a dead man's switch on a train: the alarm fires on the absence of input.

In a crontab, the heartbeat is one curl appended to the job:

crontab
# m  h  dom mon dow  command
30 2 *   *   *  /usr/local/bin/nightly-export.sh && curl -fsS -m 10 --retry 3 https://my-project.hook.events/heartbeats/nightly-export > /dev/null

The curl flags matter:

  • -f — treat HTTP errors (4xx/5xx) as a failed command, so a broken heartbeat URL does not look like a delivered ping
  • -sS — silent, but still print real errors to stderr where cron can log them
  • -m 10 — give up after 10 seconds so the ping itself can never hang your job
  • --retry 3 — ride out a transient network blip instead of producing a false “missed heartbeat”

And the single most common mistake in this pattern is using ; instead of &&:

semicolon-pitfall
# ❌ Wrong — ";" runs the ping UNCONDITIONALLY.
# The export can crash and the heartbeat still says "all good".
30 2 * * * /usr/local/bin/nightly-export.sh; curl -fsS https://my-project.hook.events/heartbeats/nightly-export

# ✅ Right — "&&" only runs the ping when the job exits 0.
# A failing job sends no heartbeat, and the missing ping is the alert.
30 2 * * * /usr/local/bin/nightly-export.sh && curl -fsS https://my-project.hook.events/heartbeats/nightly-export

Important:The && chain only works if your script's exit code is honest — which is exactly why set -euo pipefail from the previous section is a prerequisite, not a nice-to-have. A script that swallows failures and exits 0 will happily send heartbeats while it corrupts your data.

For long-running jobs, you can extend the pattern with a second ping at the start (e.g. /heartbeats/nightly-export/start). Comparing start and finish pings tells you whether a job is hanging, not just whether it is missing — and gives you a rough duration for free.

Running scheduled jobs without infrastructure: Hooklistener Schedules

A lot of “cron jobs” are really just HTTP calls on a timer: hit /tasks/nightly-report, trigger a cache rebuild, kick off a data sync. For those, you can skip the server, the crontab, and the heartbeat plumbing entirely. Hooklistener Schedules (available on paid plans) runs cron-scheduled HTTP requests for you, from the Schedulespage in the app sidebar — and because Hooklistener executes the request itself, it knows immediately when a run fails. No silent skips: the scheduler and the monitor are the same thing.

What a schedule gives you:

  • Standard cron expressions — the usual 5-field syntax plus macros like @hourly, with a human-readable description shown in the UI (“At 02:30 every day”) so you catch off-by-one cron mistakes before they ship
  • Any HTTP method — GET, POST, PUT, PATCH, or DELETE, with custom headers and a request body, and a configurable timeout of up to 120 seconds
  • Explicit success criteria — a run succeeds only if the response status matches your expected status codes, and optionally only if the response body contains a keyword you define
  • An active/inactive toggle — pause a schedule during a migration instead of deleting it and recreating it from memory afterwards
  • 7-day execution history — every run is recorded with its status, HTTP status code, duration, and message

The keyword check is worth pausing on, because it addresses the exit-0 problem in HTTP form. Plenty of endpoints return 200 OK with a body like {"status": "failed", "processed": 0}. A concrete setup for a nightly job:

schedule-config
Name:      Nightly report generation
Cron:      30 2 * * *        ("At 02:30 every day")
Method:    POST
URL:       https://api.example.com/tasks/nightly-report
Headers:   Authorization: Bearer <service-token>
           Content-Type: application/json
Body:      {"scope": "all-tenants"}
Timeout:   120 seconds
Success:   Status 200 AND body contains "completed"

With success defined as status 200 plus the keyword completed, a run that returns 200 with "status": "failed"in the body is correctly recorded as a failure. When something breaks at 02:30 and you look at it at 09:00, the execution history answers the questions you would otherwise be grepping logs for: did it run at all, what status code came back, how long did it take, and what was the failure message — for every run over the past 7 days.

Schedules are also manageable programmatically via the REST API, so you can create and audit them from CI or scripts:

schedules-api.sh
# List all schedules in your organization
curl https://api.hooklistener.com/api/schedules \
  -H "Authorization: Bearer hklst_your_api_key_here"

# Review the recent runs of one schedule
curl https://api.hooklistener.com/api/schedules/:id/executions \
  -H "Authorization: Bearer hklst_your_api_key_here"

The same operations are exposed as MCP tools, so an AI coding assistant can create a schedule or pull the execution history of a failing one while you debug — see the MCP setup guide for how to connect one.

Receiving heartbeats: endpoint, inactivity monitor, Telegram

For jobs that genuinely have to run on your own machines — database dumps, filesystem work, anything that is not an HTTP call — you keep the crontab and add the heartbeat from earlier. The receiving side needs two things: somewhere for the ping to land, and an alert when it stops landing.

  1. Create a Hooklistener endpoint to receive the pings (see setting up an endpoint). Every heartbeat is captured with its method, path, timestamp, headers, and body, so the endpoint doubles as a run log you can inspect request by request. Use one endpoint with distinct paths per job — /heartbeats/nightly-export, /heartbeats/log-rotation— and POST a small JSON body (duration, rows processed) if you want context attached to each ping.
  2. Attach an inactivity monitor to the endpoint. This is the dead man's switch half: Hooklistener's inactivity monitors alert you when expected requests stop arriving, with a configurable silence window from 1 minute up to 24 hours, email and Slack notifications, an alert cooldown to avoid spam, and a logged recovery event when pings resume. For an hourly job, a 75-minute window catches a single missed run while tolerating jitter.
  3. Optionally connect Telegramfor a positive confirmation of every run. On paid plans, open the endpoint's Integrations panel, enter your Telegram chat ID, and the Hooklistener bot sends a formatted message for every incoming webhook: HTTP method, path, timestamp, and a body preview of up to 500 characters. There is a Test button right in the panel to verify the connection before you rely on it. A nightly “heartbeat arrived” message in a quiet Telegram channel is a surprisingly effective habit — its absence is noticeable at a glance.

Important:Telegram notifications fire on arrival, not absence — on their own they are a presence signal, not a dead man's switch. The inactivity monitor is what turns a missing ping into an alert. Use both: the monitor wakes you up, the Telegram history and captured requests tell you when the job last succeeded and what it reported.

When a dedicated cron monitoring tool fits better

Honest trade-off time. Healthchecks.io and Cronitorare purpose-built dead man's switches, and for some setups they are simply the better tool:

  • Cron-aware expected windows. You give them the cron expression itself, and they compute exactly when the next ping is due — including grace periods for jobs with variable runtime.
  • Long periods. Weekly and monthly jobs are first-class. Hooklistener's inactivity windows currently max out at 24 hours, so a weekly billing job cannot be covered by a single inactivity monitor.
  • Fleet scale. If you are monitoring dozens or hundreds of cron jobs across many hosts, their per-check dashboards, start/fail ping semantics, and integrations are built precisely for that shape of problem.

Where Hooklistener fits instead: when scheduled jobs are one part of a webhook-shaped system rather than the whole problem. If you are already using it to test webhooks, capture traffic in a webhook inbox, or tunnel to localhost, then Schedules, inactivity monitors, and heartbeat endpoints live in the same place as the rest of your event debugging — one tool, one execution history, one set of alerts. If all you need is “alert me when any of my 200 crontabs goes quiet,” use the dedicated tool without guilt.

Putting it together

A monitoring setup that catches all three silent failure modes:

  1. Move HTTP-triggerable jobs into Schedules. The scheduler that runs the job is also the monitor, so “never started” becomes impossible to miss. Define success strictly: expected status codes plus a response keyword.
  2. Add heartbeats to everything that stays in crontab. set -euo pipefail in the script, && curl -fsS -m 10 --retry 3 to a Hooklistener endpoint at the end of the job.
  3. Alert on absence. Attach an inactivity monitor to the heartbeat endpoint with a silence window slightly longer than the job interval; add Telegram on the endpoint if you also want positive per-run confirmations.
  4. Test the failure path, not just the success path. Deliberately break the job once — rename the script, revoke the token — and confirm the alert actually reaches a human. An unverified dead man's switch is just decoration.
  5. When something breaks, start with the history. The 7-day execution log (or the captured heartbeat requests) tells you the last good run, the first bad one, and what changed in between — which is most of the debugging work done before you open a shell.

The pattern generalizes beyond cron, too: any event stream that is supposed to keep flowing — provider webhooks, queue consumers, sensor data — can be monitored for absence the same way. For the webhook side of that story, see our guide on keeping real-time webhooks reliable.

Related Resources

Monitor Your Scheduled Jobs with Hooklistener

Run cron-scheduled HTTP jobs with explicit success checks, receive heartbeats on a debug endpoint, and get alerted the moment a job goes quiet — all in the same place you already debug webhooks. Get started at app.hooklistener.com.