Posted in

New ‘Agentjacking’ Attack Tricks AI Coding Assistants Into Running Malicious Code

Cybersecurity researchers have identified a new attack technique called Agentjacking, a method that abuses AI-powered coding assistants to execute malicious code directly on developers’ machines.

Discovered by security firm Tenet Security, the attack exploits the interaction between Sentry, a widely used error-monitoring platform, and AI coding tools such as Claude Code and Cursor. The findings highlight how trusted AI workflows can become a new attack surface for cybercriminals.

How the Agentjacking Attack Works

At the core of the attack is a design weakness involving Sentry’s event ingestion system and its integration with AI agents through the Model Context Protocol (MCP).

According to Tenet Security researchers Ron Bobrov, Barak Sternberg, and Nevo Poran, attackers can submit specially crafted error reports to Sentry using a publicly accessible credential known as a Data Source Name (DSN). These malicious error messages are then retrieved by AI coding assistants and interpreted as legitimate troubleshooting instructions.

As a result, the AI agent may unknowingly execute attacker-controlled commands on the developer’s machine.

Attack Flow Explained

The Agentjacking attack typically follows these steps:

  1. Identify a Public Sentry DSN
    Attackers locate a target organization’s Sentry DSN, which is often embedded in websites and applications.
  2. Inject a Malicious Error Event
    Using the DSN, the attacker submits a fake error report to Sentry’s ingestion endpoint.
  3. Embed Hidden Instructions
    The malicious report contains carefully formatted markdown content designed to resemble legitimate diagnostic guidance.
  4. AI Agent Retrieves the Data
    When a developer asks an AI coding assistant to investigate unresolved Sentry issues, the AI retrieves the injected error event through MCP.
  5. Execution of Malicious Code
    The AI interprets the embedded instructions as trusted recommendations and executes the attacker’s commands with the developer’s permissions.

Why This Attack Is Dangerous

Unlike traditional cyberattacks that require phishing emails, compromised servers, or malware delivery, Agentjacking targets the trust relationship between developers and their AI assistants.

The attacker never directly interacts with the victim’s infrastructure. Instead, malicious instructions are disguised as legitimate troubleshooting guidance inside an error report. When the AI agent processes the information, it effectively becomes an unwitting participant in the attack.

Because the commands execute with the developer’s existing privileges, attackers may gain access to:

  • Environment variables
  • Git credentials
  • Private repository URLs
  • Developer identities
  • Sensitive project information

The Role of MCP and Trust Boundaries

The vulnerability highlights a broader security challenge surrounding the Model Context Protocol (MCP) ecosystem.

AI agents often treat information retrieved from connected services as trustworthy. In this scenario, the AI assistant cannot distinguish between a genuine application error and one intentionally injected by an attacker.

This breakdown of trust boundaries creates an opportunity for arbitrary code execution whenever the AI processes manipulated external data.

Real-World Impact

Tenet Security reported identifying 2,388 organizations with valid and injectable Sentry DSNs exposed to this attack technique.

Researchers also conducted controlled testing against more than 100 organizations and achieved an 85% success rate when targeting injected errors across several popular AI coding assistants.

These findings suggest that Agentjacking is not merely a theoretical risk but a practical attack method capable of affecting a large number of organizations currently adopting AI-powered development tools.

Sentry’s Response

Sentry acknowledged the issue but reportedly chose not to implement a direct fix, stating that the problem is “technically not defensible” due to the nature of the architecture involved.

However, the company has introduced a global content filter designed to block a specific malicious payload pattern identified during the research.

A New Security Challenge for AI Development

The discovery of Agentjacking underscores an important reality: as organizations increasingly rely on AI coding assistants, the AI systems themselves become valuable targets for attackers.

Traditional security controls such as:

  • Endpoint Detection and Response (EDR)
  • Web Application Firewalls (WAF)
  • Identity and Access Management (IAM)
  • VPN protections
  • Cloud-based security services
  • Network firewalls

may not detect this type of attack because every action appears legitimate and authorized.

Final Thoughts

Agentjacking demonstrates how attackers can exploit trusted AI workflows rather than traditional software vulnerabilities. By manipulating the data consumed by AI coding assistants, threat actors can potentially execute commands on developer machines without breaching infrastructure or deploying malware.

As AI agents become increasingly integrated into software development pipelines, organizations will need to rethink trust models, validate external data sources, and implement safeguards that prevent AI systems from blindly acting on unverified instructions.

The research serves as a warning that the next generation of cybersecurity threats may focus not only on users and systems—but also on the AI assistants developers depend on every day.

Leave a Reply

Your email address will not be published. Required fields are marked *