This test measures whether instruction authority in our system is structurally constrained or merely assumed.
We asked an AI-literate engineer to attempt to subvert our prompt injection defenses.
The primary attack surface will be email. That choice is deliberate. Email is structurally ambiguous. It carries narrative authority, inconsistent formatting, and social engineering pressure. Most AI systems fail there quietly.
We are not modifying the system before the test. No additional filters. No silent hardening. The purpose is to observe the existing architecture under adversarial pressure.
The operating model is simple:
This boundary is mechanical. The model is not expected to reason about persuasion. It is constrained by provenance. An email may contain override language, escalation framing, or system-like syntax. That text has no execution pathway.
A capable red team will avoid obvious phrasing. The likely attempts are structural:
The risk is not immediate execution. The risk is authority drift. If any component begins treating external text as instruction-bearing, the boundary has already failed.
We are not testing whether the model can recognize malicious language. We are testing whether instruction flow is physically constrained.
If an email says to ignore previous instructions, the phrase must remain inert because the system provides no pathway for it to gain authority.
If the red team finds a gap, that becomes a design correction.
If they do not, we document why.
The test has not started yet. We are ready for it.