RESEARCH

Evidence, not marketing

TELOS governance is grounded in production data, standardized benchmarks, and established statistical methodology. We publish where we perform well and where we don't.

NIST NCCoE Comment

Submitted April 2026 · AI-Identity@nist.gov

Comment on the NCCoE concept paper "Accelerating the Adoption of Software and AI Agent Identity and Authorization." Introduces purpose-bound agent identity as a complement to static authorization, with implementation evidence from 29,520+ governed actions.

Key contribution: agent identity should include not only WHO the agent is and WHAT it can access, but WHY it was deployed and HOW WELL it is serving that purpose — validated per-action in real time.

Full comment (coming soon) →

Benchmark Results

Adversarial Safety Benchmarks

0% observed attack success rate across 2,550 scenarios from four standardized benchmark suites.

AILuminate

Content safety

MedSafetyBench

Healthcare domain

HarmBench

Harmful content generation

SB 243

Regulatory compliance

95% CI upper bound: 0.12%. Caveats: benchmark selection may not cover all attack surfaces. No claim against novel or adaptive adversaries. These are content-safety benchmarks, not governance-evasion tests.

SetFit Boundary Classifier

0.9804

AUC (5-fold CV)

91.8%

Detection rate

5.2%

False positive rate

171 training examples, balanced classes. LOCO AUC: 0.972. AUC uncertainty: ± 0.018.

Production Deployment

29,520+

Governed actions

31 days

Continuous operation

Calibration cycles

20ms

p95 latency

One agent (Claude-based), one operator, single deployment on Apple M3 Ultra. Verdict distribution: 85.9% EXECUTE, 11.0% ESCALATE, 1.8% CLARIFY, 1.2% INERT.

Known Limitations

• Category C accuracy: 69.7% on contextually ambiguous actions — insufficient for safety-critical deployments without human oversight
• Domain accuracy: 82.6% on Nearmap healthcare corpus — cross-domain performance varies
• False positive rate: 24.8% in generic contexts (5.2% overall, 8% healthcare-specific)
• Single-system evidence: Validated on one agent architecture, one model family, one deployment
• Embedding model: MiniLM-L6-v2 trained on English web text — multilingual/multicultural performance untested
• Goodhart vulnerability: Agent cannot modify scoring function but can learn decision boundary through observation

Validation Methodology

Governance claims are validated through the Counterfactual Configuration Replay System (CCRS). Historical actions are replayed against alternative configurations using established statistical methodology from financial risk management, pharmaceutical regulation, and aviation safety.

Statistical Tests

• McNemar's test (verdict consistency)
• Kupiec's POF test (coverage accuracy)
• Christoffersen's interval test (independence)

Cross-Domain Provenance

• Basel Committee (financial backtesting)
• FDA (pharmaceutical process capability)
• FOQA/ASAP (aviation safety monitoring)
• NRC (nuclear safety analysis)

Standards Engagement

• IEEE Standards Association — Project Authorization Request in preparation for runtime governance of autonomous AI agents
• NIST AI RMF — GOVERN and MAP function alignment
• NIST AI 600-1 — Generative AI Profile cross-reference
• SAAI Framework — Safety, Alignment, and Assurance Initiative contributor
• EU AI Act — Article 40 harmonised standards interoperability
• OECD AI Paper No. 56 — Agentic AI conceptual foundations (February 2026)

Questions about our methodology or results?

Ask on Discord research@telos-labs.ai