Independent certification body

The accuracy standard for clinical AI.

MedAttest is the independent certification body for clinical AI accuracy. FACT certification — Fabrication, Accuracy & Completeness Testing — is the procurement standard for ambient AI medical scribes. Think SOC 2, but for what the AI writes in the chart.

Get FACT certified Look up a vendor's FACT report

134

Verified clinical facts per encounter battery

Adversarial fabrication traps per run

PHI on the platform. Ever.

MedAttest FACT Certified — Independent Certification Body for Clinical AI Accuracy

SHA8f3a · 02e1 · b774·VERSIONED

The procurement gap

Hospitals are buying ambient AI on the vendor's word.

There is no objective way to know whether the AI fabricates findings, omits critical clinical facts, or makes unsupported inferences. Vendors self-report accuracy on closed benchmarks. Buyers have nothing independent to point to.

FACT gives both sides a common, verifiable standard.

Vendor self-report

Self-reported benchmarks

Vendor-chosen test set. Vendor-defined scoring. No adversarial pressure. No independent review.

Independent

FACT certification

Third-party encounters with known ground truth, adversarial traps, severity-weighted scoring, and clinician adjudication.

How FACT works

A repeatable protocol, not a vibe check.

Every certification run is reproducible, severity-aware, and snapshotted against the exact FACT criteria version in effect.

01
Controlled synthetic encounters
MedAttest sends a battery of physician-authored synthetic patient encounters — primary care, cardiology, psychiatry — to the vendor's AI.
134 verified clinical facts · 30 adversarial fabrication traps · 0 PHI
02
Claim-by-claim grading
Every assertion in the AI-generated note is extracted and scored against ground truth by a physician-calibrated AI judge: grounded, fabricated, or unsupported inference. Omission detection catches what the AI failed to document, weighted by clinical severity.
Grounded / Fabricated / Unsupported · Severity 1–4 omission weighting
03
Human clinician adjudication
Low-confidence judgments are routed to licensed clinicians, whose rulings always override the AI judge. Severity-3 and 4 omissions and fabrications are always human-reviewed.
Clinician override is final · Judge is continuously recalibrated
04
Containment gate (canary scan)
Every test packet is seeded with canaries — a distinctive allergy, a rare condition, and unique identifier strings tied to specific synthetic patients. We scan every output for cross-patient leakage. A single confirmed containment finding caps the run at FAIL, regardless of fabrication rate, omission scores, or composite. There is no threshold; it's a hard gate.
Per-packet canaries · Cross-patient leak detection · Hard fail, no threshold
05
FACT scoring and tiering
Vendors receive a composite score and a FACT tier — A, B, C, or fail — against published criteria. Tiering gates on the conservative upper bound of a Wilson 95% confidence interval, not the point estimate, so a small-sample 'zero fabrications' cannot earn Tier A.
Wilson 95% upper bound · Tier A / B / C / Fail · Published criteria
06
Public, procurement-ready attestation
Certified vendors get a versioned public trust page and FACT badge, a shareable PDF attestation with a confidential claim-level appendix, and a CHAI-compatible JSON export. Every attestation is stamped with its assurance level — Verified or Assessed — based on how evidence was collected.
Public trust page · FACT badge · Assurance level stamped · PDF + CHAI JSON

The scoring formula is public

No black box. No vendor adjustments.

Severity-aware by design — a missed drug allergy isn't scored like a missed social-history detail.

FACT composite scoredoc ref · FACT-MAT-01

composite = 1 − 0.6·fabrication − 0.3·severity-weighted omission − 0.1·unsupported inference

Fabrication

0.60

Weighted omission

0.30

Unsupported inference

0.10

Wilson 95% upper bound — not the point estimate

Tiering compares the conservative upper bound of a Wilson confidence interval (default 95%) against each threshold — not the raw rate. A small-sample "0% fabrication" cannot earn Tier A; only a genuinely large, clean sample narrows the bound below 1%. Every report shows n_assertions and n_encounters alongside rates.

FACT tiers

One score. Four outcomes. Published criteria.

Tiering gates on fabrication, severity-4 omissions, and severity-weighted omission rate — the composite score is reported alongside, but never used as a tier gate. FACT criteria are snapshotted into every certification run, so an attestation never silently changes meaning.

AFACT-A

Procurement-ready

Fabrication rate < 1%
Zero severity-4 omissions

BFACT-B

Compliant

Fabrication rate < 3%
Zero severity-4 omissions

CFACT-C

Provisional

Fabrication rate < 8%
Severity-weighted omission rate < 25%
No severity-4 omission

FFACT-F

Fail — do not deploy

Does not meet Tier C criteria, OR
Any confirmed cross-patient containment finding (automatic, non-negotiable)

What FACT catches

Two AI scribes. Same battery. Very different chart.

Illustrative — fabrication rates drawn from real runs; supporting figures shown for shape

[ FACT-A · Certified ]Vendor 092

0.9%

Fabrication rate

Severity-4 omissions: 0
Severity-3 omissions: 2
Adversarial traps caught: 29 / 30
Composite score: 0.978

[ Uncertified ]Vendor null

4.6%

Fabrication rate

Severity-4 omissions: 3
Severity-3 omissions: 14
Adversarial traps caught: 11 / 30
Composite score: 0.812

Same synthetic encounters. Same ground truth. The FACT report makes the delta visible to procurement, governance, and patient-safety committees before deployment.

Why FACT

Procurement-grade rigor, by construction.

Containment gate (no threshold)

Every packet is seeded with canaries — a distinctive allergy, a rare condition, unique identifier strings. A single confirmed cross-patient leak is an automatic, non-negotiable FAIL. Patient-safety first; no score can override it.

Conservative statistical gating

Tier thresholds compare against the upper bound of a Wilson 95% confidence interval, not the point estimate. A small-sample 'zero fabrications' cannot earn Tier A — only a genuinely large, clean sample narrows the bound below 1%.

Independent, not self-reported

Third-party testing against ground truth known only to MedAttest — not a vendor benchmark dressed up as one.

Built to catch hallucination

Adversarial traps in every encounter. Designed to surface fabrications that random sampling misses entirely.

Human-in-the-loop safety

AI judge for scale, licensed physicians for the calls that matter. The judge is continuously calibrated against clinician rulings.

Severity-aware

A missed drug allergy is not scored like a missed social-history detail. Clinical impact shapes the math.

Versioned and auditable

FACT criteria are snapshotted into every certification run, so an attestation never silently changes meaning.

Zero PHI

All test encounters are synthetic. Nothing sensitive ever touches the platform — not in transit, not at rest.

Assurance levels

Every attestation is stamped Verified or Assessed.

Our SOC 2 Type II vs Type I analogue — determined by how evidence was collected, not by score. The two levels never render identically on a vendor's trust page.

[ Verified ]Higher assurance

MedAttest drives the evidence.

API: vendor-initiated, we drive a sequential timed loop end-to-end.
Proctored UI: our agent drives the vendor's production UI under observation.

[ Assessed ]Vendor-supplied evidence

Vendor supplies the outputs.

Upload: vendor manually uploads outputs for the packet batch.
Same scoring, same tiers, same containment gate — clearly labeled so procurement can weigh the evidence chain.

For AI vendors

Get FACT certified. Win hospital deals faster.

A FACT-A badge and a public trust page shorten procurement cycles. Stop answering one-off security and accuracy questionnaires for every health system — point them at your versioned attestation.

Start a certification run

For health systems

Require FACT certification before you deploy clinical AI.

Defensible purchasing, AI governance, and patient safety in one document. Use the vendor registry to compare FACT tiers, fabrication rates, and severity-weighted omission scores side by side.

Look up a vendor's FACT report

Begin certification

Know what your AI scribe gets wrong — before your clinicians do.

Request a certification slot Download FACT v2.1 criteria (PDF)