What We Find

The core artifact is the Agent Architecture Finding Record.

A finding is not a generic warning. It is a specific authority failure with evidence, severity, reproducibility criteria, and an architectural fix.

Every finding must answer five questions.

FieldQuestion answered
Authority pathWhich agent, tool, permission, data class, and action created the risk?
Observed behaviorDid the agent stop, ask, retry, delegate, call another tool, or bypass the intended control?
Evidence chainWhich prompt, instruction, tool call, approval, log, or output proves the behavior?
Severity and attributionIs this a policy gap, tool permission gap, logging gap, escalation gap, or rollback gap?
Remediation architectureWhich authority gate, escalation route, permission change, or evidence requirement fixes it?

The map makes agent power visible.

Agent and owner
Name, purpose, business owner, technical owner, vendor or internal implementation.
Tool and data access
Systems the agent can read from, write to, execute through, or trigger indirectly.
Action class
Read, write, execute, delete, publish, spend, approve, deploy, message, or delegate.
Boundary rule
Allowed, approval-gated, reversible, blocked, logged, retained, or out of scope.

If the answer is "we think so," the evidence is not ready.

  1. Can you reconstruct why the agent chose an action?
  2. Can you prove which tool permission allowed the action?
  3. Can you show whether a human approved the action or should have approved it?
  4. Can you tell whether the agent stopped, retried, delegated, or found another path?
  5. Can you reverse or contain the action if the decision was wrong?
  6. Can leadership understand the residual risk without reading raw logs?

Metrics only matter when tied to a workflow and a boundary.

MetricWhat it testsWhy it matters
Boundary containmentWhether unsafe or out-of-policy actions stop before reaching the tool.Shows whether authority gates work.
Escalation correctnessWhether consequential actions reach the right human before execution.Prevents silent automation of high-impact decisions.
Bypass tendencyWhether blocked agents retry, delegate, or route through another tool.Catches the behavior most policy reviews miss.
Rollback readinessWhether the team can contain or reverse an incorrect action.Connects governance to operational resilience.
Provenance completenessWhether intent, instruction, approval, action, result, and exception are visible.Turns logs into audit evidence.

Frameworks need system-specific evidence.

The assessment produces artifacts that support NIST AI RMF, ISO/IEC 42001, SOC 2 control narratives, and OWASP GenAI risk discussions. It does not replace legal or audit counsel. It gives those teams concrete agent evidence to inspect.