When Claude Makes a Mistake: AI Incident Response for Enterprise Teams

Mar 10, 2026

When Claude Makes a Mistake: AI Incident Response for Enterprise Teams

What happens when an AI tool takes an unintended action or leaks data? Most IR playbooks don't cover it yet. Here's how to build one that does.

Your incident response playbook almost certainly covers ransomware, phishing, and insider threats. It almost certainly does not cover what happens when your AI assistant follows a malicious instruction embedded in a document, sends an email it was not asked to send, or quietly exfiltrates data through a misconfigured integration. That gap needs closing.

AI incidents are not hypothetical. As enterprise AI adoption accelerates and tools like Claude gain access to real systems through integrations like MCP, the conditions for AI-specific incidents are already in place. The question is not whether one will happen in your organisation, but whether you will be ready when it does.

What Is an AI Incident?

Before you can respond to an AI incident, you need to define one. For enterprise purposes, an AI incident is any unintended or unauthorised AI action, output, or data exposure that has or could have a negative impact on the organisation, its employees, customers, or partners.

This includes: a prompt injection attack that causes an AI agent to exfiltrate data or take an unintended action; an AI tool processing confidential data it was not authorised to access; an AI-generated output being published or sent that contains false, harmful, or sensitive information; an AI integration behaving unexpectedly due to a misconfigured MCP server or API connection; and an employee using an unapproved AI tool in a way that creates regulatory or legal exposure.

Not every unexpected AI output is an incident. An AI tool giving a poor answer is a quality issue. An AI tool sending a confidential document to the wrong recipient is an incident.

Phase 1: Detection

AI incidents are often silent. Unlike a ransomware attack that announces itself, a prompt injection or data leak via an AI integration may leave no immediately obvious trace. Detection therefore requires proactive monitoring rather than waiting for something to break visibly.

Key detection signals include: anomalous MCP tool calls logged by your SIEM, such as file reads outside normal scope or unexpected API calls; outbound data transfers to AI service endpoints that are larger than expected; user reports of unexpected AI behaviour, such as Claude taking an action they did not request; alerts from your CASB on unusual data uploads to AI platforms; and OAuth consent grants appearing for AI tools that were not formally approved.

Build detection rules in your SIEM specifically for AI-related activity. Without them, you are relying on luck.

Phase 2: Containment

Once a potential AI incident is identified, the priority is to stop the bleeding without destroying evidence. Containment steps depend on the incident type, but a general framework applies.

For AI agent or MCP-related incidents, immediately revoke the relevant MCP server's access tokens or API keys to stop further automated actions. Disable the affected integration at the platform level if possible. Do not simply close the application, as background processes may continue.

For data exposure incidents, identify what data was involved and where it may have been sent. If an AI tool transmitted data to a third-party service, contact that vendor immediately to request deletion and obtain written confirmation. Preserve logs before any remediation that might overwrite them.

For AI-generated content incidents, such as a false or harmful output that was published or sent, act quickly to retract or correct the content. Document exactly what was generated, when, and to whom it was sent, as this will be needed for any regulatory notification.

Phase 3: Investigation

The investigation phase answers four questions: what happened, how it happened, what data or systems were affected, and who needs to be notified.

For AI incidents, investigation often requires technical skills that traditional IR teams may not have. You will need to understand how the AI tool was configured, what integrations were active, what prompts were sent, and what actions the tool took. If your organisation uses Claude via the API, Anthropic's audit logging provides a record of API calls. For Claude for Enterprise, organisational audit logs are available through the admin console.

Prompt injection investigations are particularly complex. You will need to identify the source of the malicious instruction: was it in a document the AI processed, a webpage it browsed, a database entry it read? Trace the full chain from the injected instruction to the action taken.

Phase 4: Notification

AI incidents may trigger notification obligations under data protection law. If personal data was involved, GDPR requires notification to the relevant supervisory authority within 72 hours of becoming aware of a breach, and to affected individuals without undue delay where the breach is likely to result in a high risk to their rights and freedoms.

The involvement of AI does not change these obligations. An AI tool that processes personal data without authorisation, or that exposes personal data through a misconfiguration, constitutes a personal data breach in the same way as any other technical failure.

Document your notification decisions carefully, including the legal basis for any decision not to notify. Regulators are increasingly scrutinising AI-related incidents, and a well-documented decision-making process will serve you better than an undocumented assumption that notification was not required.

Phase 5: Recovery and Lessons Learned

Recovery from an AI incident involves more than restoring systems to their previous state. It requires understanding why the incident occurred and putting controls in place to prevent recurrence.

Common root causes of AI incidents include: overly permissive MCP server configurations, as covered in our earlier post on securing Claude's access to MCP tools; lack of input validation, where the AI tool processed untrusted data from external sources without appropriate safeguards; insufficient monitoring, where anomalous AI behaviour was occurring but no detection rules existed to surface it; and policy gaps, where employees used AI tools in ways that were not explicitly prohibited but created risk.

For each root cause identified, define a specific remediation action with an owner and a deadline. Then schedule a post-incident review within two weeks while the details are still fresh.

Building Your AI IR Playbook

Every organisation with meaningful AI tool adoption should have an AI-specific appendix to their incident response plan. At minimum, it should cover: the definition of an AI incident for your organisation; the roles and responsibilities for AI incident response, including who has the authority to revoke AI tool access; detection sources and monitoring requirements; containment procedures for agent, data exposure, and content incidents; notification obligations and decision criteria; and the escalation path for incidents involving customer data or regulatory risk.

The playbook should be reviewed every six months, or after any significant change to your AI tool landscape.

A Note on Vendor Coordination

AI incidents often involve a third-party vendor whose platform was the vector or recipient of the incident. Establish relationships with your key AI vendors before an incident occurs. Know who your account contact is, what their security incident contact process is, and what data retention and deletion capabilities they offer. Trying to establish these relationships in the middle of an incident is far harder than doing so in advance.

Anthropic, for instance, provides enterprise customers with a dedicated support channel and maintains a responsible disclosure process for security issues with its models and products. Knowing how to reach them quickly matters when time is critical.

The Bottom Line

AI incidents are coming. The organisations that handle them well will be those that treated the possibility seriously before it became a reality, built detection capabilities, documented their response procedures, and practised them. The playbook you build now is the one you will rely on when it counts.

‹ governance

audit ›