“We were building AI agents that interacted with payments, sensitive workflows, and user data. Siemba’s team immediately understood that the real risk wasn’t just the application layer, it was the AI’s decision-making logic itself.."

JH
James Holloway
VP of Engineering Security

"The team approached the engagement by testing how the agents reasoned, handled context, processed prompts, and executed actions under adversarial conditions. All of it requires a fundamentally different cybersecurity skill set than traditional web application testing.”

JH
James Holloway
VP of Engineering Security
9

Critical & High Findings Uncovered

3

AI Agent Attack Vectors Identified

72 hrs

Time to First Validated Finding

How a Global AI Platform Secured the Attack Surface Nobody Had a Playbook For


KEY SOLUTIONS

GenPT · PTaaS (Adversarial Prompt Testing, AI Agent Logic Validation, Business Logic Analysis)

THE SCENARIO

When AI Takes the Wheel

The client had built something genuinely impressive: an autonomous AI-first service engine, codenamed NexusFlow - that used the Model Context Protocol (MCP) to let users interact via natural language. Instead of clicking buttons, users spoke to an LLM which would then build websites, manage profiles, and process end-to-end payments without human intervention.

The underlying REST APIs were secure in isolation. But as the system moved to agentic workflows, something fundamentally new emerged: an LLM layer capable of making real decisions, with real consequences, in real time. And that layer had never been tested by an adversary.

THE THREAT

The Attack Surface Nobody Was Watching

Standard DAST scanners check for code vulnerabilities; SQL injection, XSS, misconfigured headers. They cannot detect what Siemba calls a cognitive vulnerability: a flaw where the AI simply agrees to do something it should refuse.

Two specific risks made this engagement critical:

  • The "Confused Deputy" - could a carefully worded prompt trick the agent into performing high-privilege actions the user was never authorized to take? Could an attacker, in plain language, instruct the AI to skip a payment step, expose an API schema, or execute an admin workflow?

  • Logic vs. Code - traditional penetration testing finds code flaws. This engagement needed to find reasoning flaws: moments where the AI's "logic" could be overridden by a persuasive enough instruction.

What if your AI agent could be talked into processing a payment that never happened?

Most teams find out from an attacker. Some find out from Siemba.

Book a Security Assessment

THE TEST

AI-on-AI: GenPT Meets Expert PTaaS

Siemba deployed a hybrid approach - GenPT for automated adversarial testing at scale, combined with human PTaaS experts for deep business logic validation. The two capabilities are complementary: GenPT covers breadth, PTaaS covers depth.

Adversarial prompting at scale (GenPT)

GenPT autonomously crafted complex, multi-turn adversarial prompts designed to confuse the agent into overriding its system instructions, testing for system prompt leakage and instruction bypass at a scale no human tester could achieve manually.

AI-on-AI attack simulation (GenPT)

The platform used its own AI models to dynamically generate injection payloads, testing whether the target agent could be coerced into validating transactions that had never actually occurred.

Business logic deep-dive (PTaaS)

Siemba's pentesters mapped the hidden communication between user prompts, Agent Servers (MCP), and backend APIs, identifying hallucination risks and specifically testing AuthN/AuthZ within agent-driven sessions, where standard tools miss the interaction entirely.

Findings_CS1 (1)

THE FIX

From Vulnerable to Hardened

F-01 - Context-Aware Guardrails

The engineering team decoupled the AI's "reasoning" layer from its "execution" privileges. Manipulation of one can no longer affect the other, the agent can be convinced of anything, but can execute only what its privilege scope explicitly permits.

F-02 - Deterministic Verification Layers

Hard-coded logic checks were established at every payment validation step. These checks are outside the AI's reasoning layer entirely, no prompt, however persuasive, can override them.

F-03 - LLM Output Sanitization

Strict input sanitization protocols were applied specifically at the LLM output layer, ensuring the agent functions as a firewall against malicious payloads rather than a carrier for them.

THE LESSON

From Code Security to Cognitive Security

This engagement proved something that will define the next decade of application security: APIs can be perfectly secure while the AI agents using them are completely exploitable. The vulnerability is not in the code, it is in the reasoning.

As agentic workflows become the default mode of software delivery, security teams need tools that can think adversarially at the AI layer. GenPT was built for exactly this. Standard scanners were not.

"The insights from Siemba didn't just point out what we needed to fix, they taught us how to think about security in a more sophisticated and proactive way. This has significantly propelled us forward, making our approach to cybersecurity more robust and better prepared to face the challenges ahead."

Alvin Allen
Head of Cybersecurity, FrontSteps

Is Your AI Agent Secure?

Siemba's GenPT tests what traditional scanners cannot see. The cognitive layer. Find out what your AI agents are capable of before attackers do!

Book a Demo