When AI agents can book meetings, build websites, and process payments without human input, the question is no longer "Is the code secure?", it is "Can the AI Models be tricked?"
CLIENT
Global Web Services & Hosting Provider
INDUSTRY
SaaS / Web Services & AI Platforms
Critical & High Findings Uncovered
AI Agent Attack Vectors Identified
Time to First Validated Finding
When AI Takes the Wheel
The client had built something genuinely impressive: an autonomous AI-first service engine, codenamed NexusFlow - that used the Model Context Protocol (MCP) to let users interact via natural language. Instead of clicking buttons, users spoke to an LLM which would then build websites, manage profiles, and process end-to-end payments without human intervention.
The underlying REST APIs were secure in isolation. But as the system moved to agentic workflows, something fundamentally new emerged: an LLM layer capable of making real decisions, with real consequences, in real time. And that layer had never been tested by an adversary.
The Attack Surface Nobody Was Watching
Standard DAST scanners check for code vulnerabilities; SQL injection, XSS, misconfigured headers. They cannot detect what Siemba calls a cognitive vulnerability: a flaw where the AI simply agrees to do something it should refuse.
Two specific risks made this engagement critical:
Most teams find out from an attacker. Some find out from Siemba.
Book a Security AssessmentAI-on-AI: GenPT Meets Expert PTaaS
Siemba deployed a hybrid approach - GenPT for automated adversarial testing at scale, combined with human PTaaS experts for deep business logic validation. The two capabilities are complementary: GenPT covers breadth, PTaaS covers depth.
Adversarial prompting at scale (GenPT)
GenPT autonomously crafted complex, multi-turn adversarial prompts designed to confuse the agent into overriding its system instructions, testing for system prompt leakage and instruction bypass at a scale no human tester could achieve manually.
AI-on-AI attack simulation (GenPT)
The platform used its own AI models to dynamically generate injection payloads, testing whether the target agent could be coerced into validating transactions that had never actually occurred.
Business logic deep-dive (PTaaS)
Siemba's pentesters mapped the hidden communication between user prompts, Agent Servers (MCP), and backend APIs, identifying hallucination risks and specifically testing AuthN/AuthZ within agent-driven sessions, where standard tools miss the interaction entirely.
.png)
From Vulnerable to Hardened
F-01 - Context-Aware Guardrails
The engineering team decoupled the AI's "reasoning" layer from its "execution" privileges. Manipulation of one can no longer affect the other, the agent can be convinced of anything, but can execute only what its privilege scope explicitly permits.
F-02 - Deterministic Verification Layers
Hard-coded logic checks were established at every payment validation step. These checks are outside the AI's reasoning layer entirely, no prompt, however persuasive, can override them.
F-03 - LLM Output Sanitization
Strict input sanitization protocols were applied specifically at the LLM output layer, ensuring the agent functions as a firewall against malicious payloads rather than a carrier for them.
From Code Security to Cognitive Security
This engagement proved something that will define the next decade of application security: APIs can be perfectly secure while the AI agents using them are completely exploitable. The vulnerability is not in the code, it is in the reasoning.
As agentic workflows become the default mode of software delivery, security teams need tools that can think adversarially at the AI layer. GenPT was built for exactly this. Standard scanners were not.
Siemba's GenPT tests what traditional scanners cannot see. The cognitive layer. Find out what your AI agents are capable of before attackers do!
Book a Demo