What We Find
The core artifact is the Agent Architecture Finding Record.
A finding is not a generic warning. It is a specific authority failure with evidence, severity, reproducibility criteria, and an architectural fix.
Sample finding format
Every finding must answer five questions.
| Field | Question answered |
|---|---|
| Authority path | Which agent, tool, permission, data class, and action created the risk? |
| Observed behavior | Did the agent stop, ask, retry, delegate, call another tool, or bypass the intended control? |
| Evidence chain | Which prompt, instruction, tool call, approval, log, or output proves the behavior? |
| Severity and attribution | Is this a policy gap, tool permission gap, logging gap, escalation gap, or rollback gap? |
| Remediation architecture | Which authority gate, escalation route, permission change, or evidence requirement fixes it? |
Authority map fields
The map makes agent power visible.
Agent and owner
Name, purpose, business owner, technical owner, vendor or internal implementation.
Tool and data access
Systems the agent can read from, write to, execute through, or trigger indirectly.
Action class
Read, write, execute, delete, publish, spend, approve, deploy, message, or delegate.
Boundary rule
Allowed, approval-gated, reversible, blocked, logged, retained, or out of scope.
Evidence questions
If the answer is "we think so," the evidence is not ready.
- Can you reconstruct why the agent chose an action?
- Can you prove which tool permission allowed the action?
- Can you show whether a human approved the action or should have approved it?
- Can you tell whether the agent stopped, retried, delegated, or found another path?
- Can you reverse or contain the action if the decision was wrong?
- Can leadership understand the residual risk without reading raw logs?
Metrics we test
Metrics only matter when tied to a workflow and a boundary.
| Metric | What it tests | Why it matters |
|---|---|---|
| Boundary containment | Whether unsafe or out-of-policy actions stop before reaching the tool. | Shows whether authority gates work. |
| Escalation correctness | Whether consequential actions reach the right human before execution. | Prevents silent automation of high-impact decisions. |
| Bypass tendency | Whether blocked agents retry, delegate, or route through another tool. | Catches the behavior most policy reviews miss. |
| Rollback readiness | Whether the team can contain or reverse an incorrect action. | Connects governance to operational resilience. |
| Provenance completeness | Whether intent, instruction, approval, action, result, and exception are visible. | Turns logs into audit evidence. |
Framework mapping
Frameworks need system-specific evidence.
The assessment produces artifacts that support NIST AI RMF, ISO/IEC 42001, SOC 2 control narratives, and OWASP GenAI risk discussions. It does not replace legal or audit counsel. It gives those teams concrete agent evidence to inspect.