How We Calculate Status

Is AI Down turns anonymous crowd reports into a small set of labels and a confidence score. No magic—mostly counting, time windows, and guardrails so one person cannot move the needle alone.

The short version

For each AI product we track, we look at how many weighted reports arrive in overlapping windows (roughly minutes, about an hour, and about a day). A calm baseline with almost no fresh friction reads as healthy. A coordinated jump in a short window reads like an incident. Everything in between gets a softer label until the pattern is obvious.

Reports in

Structured submissions per platform

Spam-throttled

Windows

Compare volume vs baseline

5 min / 1 h / 24 h style slices

Label + confidence

Human-readable output

High / medium / low

Inputs and limits

What people submit

Each status page accepts structured reports: what went wrong, which product, and rough timing. That is the main input—we are not secretly probing vendor APIs from here.

•Categories match common AI failure modes (timeouts, bad output, login, billing, and similar)
•CAPTCHA and rate limits reduce drive-by spam
•We read text you send to prioritize severity in aggregation, not to publish your story verbatim as news

How we slice time

The backend scores reports inside multiple rolling windows (for example on the order of minutes, an hour, and a day). Spikes matter more than a lone old ticket.

•A quiet hour after a storm still influences the longer window
•Sudden jumps versus a rolling baseline are the usual “something’s up” signal
•Category weights can make an outage-style report count differently than a cosmetic glitch

What we are not doing

Is AI Down is a community mirror, not an official SRE dashboard for OpenAI, Anthropic, Google, or anyone else.

•We do not claim to measure packet loss inside a provider’s network
•Vendor status pages and RSS feeds may appear as “official sources” on a page for humans—they do not automatically override math here unless we wire that in explicitly
•When in doubt, trust the provider’s own incident page for authoritative truth

Status labels you will see

You may still see legacy wording on some pages (for example “Operational,” “Minor issues,” or “Probably down”) while older data migrates. The five labels below are what we use for the current AI-focused pipeline.

Likely Operational

Few or no recent problem reports compared with what we expect for that platform, so nothing looks off right now.

Roughly when we pick it

•Report counts in short and long windows stay near the usual baseline
•No sharp spike that lines up across overlapping windows
•The model is not seeing a coordinated wave of failures

Possible Issues

Enough fresh reports that something may be wrong, but volume or timing still looks ambiguous.

Roughly when we pick it

•Elevated report rate versus baseline, without a full spike profile yet
•Signals can come from one category or a mix (latency, errors, auth, etc.)
•We may still be one noisy burst away from a clearer call

Potential Problems

Patterns suggest trouble—often an uptick that has not yet met the threshold for “likely down,” or mixed signals.

Roughly when we pick it

•Report weight crosses internal thresholds for concern but not for worst case
•Sometimes reflects regional or category-specific pain more than a total outage

Likely Down

A strong burst of aligned reports in a tight window—what you would expect when many people hit the same failure mode.

Roughly when we pick it

•High report volume relative to baseline in overlapping windows
•Similar symptoms described across many submissions
•Not used for a single anecdotal complaint

Monitoring Reports

Reports exist, but clustering looks localized, inconsistent, or still moving—worth watching before we move the headline status.

Roughly when we pick it

•Activity without a clean spike shape, or conflicting categories
•We keep the headline cautious until patterns stabilize

Confidence

High confidence

Windows agree: the story is the same in short and longer views.

•Spike shape is clear against baseline
•Enough independent reports that it is unlikely to be one person refreshing

Medium confidence

Something is happening, but volume, timing, or category mix is still messy.

•Mixed signals between windows or regions
•Borderline counts—could flip after a few more reports or a quiet hour

Low confidence

Thin data, contradictory hints, or we are early in an incident arc.

•Very few points on the chart
•One loud report without corroboration yet

Under the hood

Scoring

•Reports are bucketed into short, medium, and long lookbacks before we map to a status enum.
•We compare current volume to a rolling expectation, not to zero.
•One-off noise should not look like a regional outage.

Live updates

•New accepted reports feed the calculation pipeline quickly.
•Open pages can refresh over WebSockets so you are not stuck on a stale badge.
•We bias toward stability so labels do not flicker on a single stray report.

Refresh cadence and storage

Not a once-a-day batch job

•Ingestion is continuous; recomputation happens on a short interval and when traffic spikes.
•Very short bursts are smoothed so the headline does not oscillate every minute.

History

•We retain enough history for charts and incident timelines described in the privacy policy.
•Details live in Legal → Privacy.

Questions or ideas?

If a label feels wrong for a real incident—or you want to propose a fairer weight for a report category—we can iterate. This page should match reality; when the code changes, this text should too.

Contact