E-commerce & Retail 2026-04-28

How an Online Shop Detects Checkout Failures Within 30 Seconds

How a $300K/month online retailer uses Miterl to catch checkout-page failures, inventory glitches, and image CDN outages within 30 seconds — including SLA design and peak-hour intensive monitoring.

ecommerce online shop checkout monitoring SLA peak hours

The team and assumptions

A B2C select-shop, "Shop D," doing about $300K/month in gross revenue. The frontend runs on Shopify, with custom in-house APIs handling coupons and inventory sync. The team is the shop owner plus four ops staff and one part-time engineer on retainer. About 70% of revenue lands between 8 PM and 11 PM, Friday through Sunday — a textbook e-commerce traffic curve.

For Shop D, "site down" literally means "revenue stops." A 5-minute outage costs roughly $1,000–$3,000 in missed sales. This use case walks through the multi-layer monitoring setup that has driven that loss to nearly zero.

Three failure modes that actually hurt e-commerce

1. Checkout page errors

The shopper adds an item to the cart, hits "Checkout," and gets a 500. The product page itself is fine — homepage monitoring will never catch this — but every order is lost. This is the worst failure mode.

2. Inventory display drift

Out-of-stock items showing as "in stock" (overselling) or in-stock items showing as sold out (lost sales). Both happen when the inventory-sync webhook backs up.

3. Image CDN-wide failures

Every product image goes blank at once. Caused by CDN provider outages, expired signing tokens, or rate-limit blocks. Spreads on social media within minutes — detection speed is everything.

Peak-hour intensive monitoring

Off-peak (5-minute interval)

Homepage, category pages, top 10 product pages — HTTP monitors
Checkout page (post-cart navigation target) — keyword monitor
Inventory API /api/v1/inventory/health — HTTP monitor
Representative image CDN URL — HTTP monitor

Peak hours (Fri-Sun 19:00–24:00, 1-minute interval)

We use Miterl's Release Mode to crank the interval down to 1 minute during peak revenue hours. Release Mode can be triggered via webhook, so cron handles the schedule:

# crontab: every Friday at 19:00, kick off Release Mode (5 hours = until 24:00)
0 19 * * 5 curl -X POST https://miterl.com/api/v1/webhooks/release-mode/$TOKEN \
  -H "Content-Type: application/json" \
  -d '{"duration_hours": 5, "interval_seconds": 60}'

Saturday and Sunday have their own cron entries. When duration_hours elapses, monitoring automatically returns to the off-peak interval — no manual reset.

Keyword monitoring on the checkout page

The Shopify checkout page reliably renders specific strings ("Proceed to checkout", "Order summary") when healthy. Those go into a Miterl keyword must_contain monitor — disappear and an alert fires.

We also configure must_not_contain rules for "Internal Server Error", "Page not found" and similar — a double-check that catches malformed responses that still return HTTP 200.

Why we picked 99.9% as the SLA target

Shop D set a 99.9% monthly SLA in Miterl. That's roughly 43 minutes of allowable downtime per month.

SLA tier	Allowed monthly downtime	Right for
99%	~7h 18m	Early-stage launches
99.5%	~3h 39m	Stable operations
99.9%	~43m	Revenue-critical e-commerce
99.99%	~4m	Enterprise / financial

99.9% is the sweet spot for Shop D's revenue scale. Miterl's "remaining-budget" alert (fires when less than 25% of allowable downtime remains) gives the team a real-time view of "are we close to a breach this month?" — which has changed how they schedule risky deploys.

Webhook-based maintenance during emergency hotfixes

The standing rule is "do not deploy during peak hours." When a hotfix has to ship anyway, the deploy script wraps the work in a maintenance webhook:

# Before hotfix
curl -X POST https://miterl.com/api/v1/webhooks/maintenance/$TOKEN/start \
  -d '{"duration_hours": 1, "name": "Hotfix: Cart bug"}'

# Apply patch + smoke tests
./scripts/hotfix_deploy.sh
./scripts/cart_smoke_test.sh

# Resume
curl -X POST https://miterl.com/api/v1/webhooks/maintenance/$TOKEN/end

Monthly reports as marketing input

Miterl's monthly PDF report goes straight into the marketing team's KPI deck:

Total uptime last month
Average response time during peak hours
Incident counts broken out by cause (5xx vs CDN vs DB)

Quantifying "last month's CDN incident cost roughly $4,200 in missed orders" makes contract-tier upgrades a data-driven decision rather than a gut call.