How an Online Shop Detects Checkout Failures Within 30 Seconds
How a $300K/month online retailer uses Miterl to catch checkout-page failures, inventory glitches, and image CDN outages within 30 seconds — including SLA design and peak-hour intensive monitoring.
The team and assumptions
A B2C select-shop, "Shop D," doing about $300K/month in gross revenue. The frontend runs on Shopify, with custom in-house APIs handling coupons and inventory sync. The team is the shop owner plus four ops staff and one part-time engineer on retainer. About 70% of revenue lands between 8 PM and 11 PM, Friday through Sunday — a textbook e-commerce traffic curve.
For Shop D, "site down" literally means "revenue stops." A 5-minute outage costs roughly $1,000–$3,000 in missed sales. This use case walks through the multi-layer monitoring setup that has driven that loss to nearly zero.
Three failure modes that actually hurt e-commerce
1. Checkout page errors
The shopper adds an item to the cart, hits "Checkout," and gets a 500. The product page itself is fine — homepage monitoring will never catch this — but every order is lost. This is the worst failure mode.
2. Inventory display drift
Out-of-stock items showing as "in stock" (overselling) or in-stock items showing as sold out (lost sales). Both happen when the inventory-sync webhook backs up.
3. Image CDN-wide failures
Every product image goes blank at once. Caused by CDN provider outages, expired signing tokens, or rate-limit blocks. Spreads on social media within minutes — detection speed is everything.
Peak-hour intensive monitoring
Off-peak (5-minute interval)
- Homepage, category pages, top 10 product pages — HTTP monitors
- Checkout page (post-cart navigation target) — keyword monitor
- Inventory API
/api/v1/inventory/health— HTTP monitor - Representative image CDN URL — HTTP monitor
Peak hours (Fri-Sun 19:00–24:00, 1-minute interval)
We use Miterl's Release Mode to crank the interval down to 1 minute during peak revenue hours. Release Mode can be triggered via webhook, so cron handles the schedule:
# crontab: every Friday at 19:00, kick off Release Mode (5 hours = until 24:00)
0 19 * * 5 curl -X POST https://miterl.com/api/v1/webhooks/release-mode/$TOKEN \
-H "Content-Type: application/json" \
-d '{"duration_hours": 5, "interval_seconds": 60}'
Saturday and Sunday have their own cron entries. When duration_hours elapses, monitoring automatically returns to the off-peak interval — no manual reset.
Keyword monitoring on the checkout page
The Shopify checkout page reliably renders specific strings ("Proceed to checkout", "Order summary") when healthy. Those go into a Miterl keyword must_contain monitor — disappear and an alert fires.
We also configure must_not_contain rules for "Internal Server Error", "Page not found" and similar — a double-check that catches malformed responses that still return HTTP 200.
Why we picked 99.9% as the SLA target
Shop D set a 99.9% monthly SLA in Miterl. That's roughly 43 minutes of allowable downtime per month.
| SLA tier | Allowed monthly downtime | Right for |
|---|---|---|
| 99% | ~7h 18m | Early-stage launches |
| 99.5% | ~3h 39m | Stable operations |
| 99.9% | ~43m | Revenue-critical e-commerce |
| 99.99% | ~4m | Enterprise / financial |
99.9% is the sweet spot for Shop D's revenue scale. Miterl's "remaining-budget" alert (fires when less than 25% of allowable downtime remains) gives the team a real-time view of "are we close to a breach this month?" — which has changed how they schedule risky deploys.
Webhook-based maintenance during emergency hotfixes
The standing rule is "do not deploy during peak hours." When a hotfix has to ship anyway, the deploy script wraps the work in a maintenance webhook:
# Before hotfix
curl -X POST https://miterl.com/api/v1/webhooks/maintenance/$TOKEN/start \
-d '{"duration_hours": 1, "name": "Hotfix: Cart bug"}'
# Apply patch + smoke tests
./scripts/hotfix_deploy.sh
./scripts/cart_smoke_test.sh
# Resume
curl -X POST https://miterl.com/api/v1/webhooks/maintenance/$TOKEN/end
Monthly reports as marketing input
Miterl's monthly PDF report goes straight into the marketing team's KPI deck:
- Total uptime last month
- Average response time during peak hours
- Incident counts broken out by cause (5xx vs CDN vs DB)
Quantifying "last month's CDN incident cost roughly $4,200 in missed orders" makes contract-tier upgrades a data-driven decision rather than a gut call.
Related reading
- Miterl Documentation — Release Mode and SLA setup
- Webhook-based maintenance windows — CI/CD integration examples
- Pricing — Plus / Pro SLA reporting
- All use cases — playbooks for other industries