What We Find

The core artifact is the Agent Architecture Finding Record.

A finding is not a generic warning. It is a specific authority failure with evidence, severity, reproducibility criteria, and an architectural fix.

Sample finding format

Every finding must answer five questions.

Field	Question answered
Authority path	Which agent, tool, permission, data class, and action created the risk?
Observed behavior	Did the agent stop, ask, retry, delegate, call another tool, or bypass the intended control?
Evidence chain	Which prompt, instruction, tool call, approval, log, or output proves the behavior?
Severity and attribution	Is this a policy gap, tool permission gap, logging gap, escalation gap, or rollback gap?
Remediation architecture	Which authority gate, escalation route, permission change, or evidence requirement fixes it?

Authority map fields

The map makes agent power visible.

Agent and owner

Name, purpose, business owner, technical owner, vendor or internal implementation.

Tool and data access

Systems the agent can read from, write to, execute through, or trigger indirectly.

Action class

Read, write, execute, delete, publish, spend, approve, deploy, message, or delegate.

Boundary rule

Allowed, approval-gated, reversible, blocked, logged, retained, or out of scope.

Evidence questions

If the answer is "we think so," the evidence is not ready.

Can you reconstruct why the agent chose an action?
Can you prove which tool permission allowed the action?
Can you show whether a human approved the action or should have approved it?
Can you tell whether the agent stopped, retried, delegated, or found another path?
Can you reverse or contain the action if the decision was wrong?
Can leadership understand the residual risk without reading raw logs?

Metrics we test

Metrics only matter when tied to a workflow and a boundary.

Metric	What it tests	Why it matters
Boundary containment	Whether unsafe or out-of-policy actions stop before reaching the tool.	Shows whether authority gates work.
Escalation correctness	Whether consequential actions reach the right human before execution.	Prevents silent automation of high-impact decisions.
Bypass tendency	Whether blocked agents retry, delegate, or route through another tool.	Catches the behavior most policy reviews miss.
Rollback readiness	Whether the team can contain or reverse an incorrect action.	Connects governance to operational resilience.
Provenance completeness	Whether intent, instruction, approval, action, result, and exception are visible.	Turns logs into audit evidence.

Framework mapping

Frameworks need system-specific evidence.

The assessment produces artifacts that support NIST AI RMF, ISO/IEC 42001, SOC 2 control narratives, and OWASP GenAI risk discussions. It does not replace legal or audit counsel. It gives those teams concrete agent evidence to inspect.