1. Getting Started
Throughout this book, we’ll explore several important considerations to review and assess the incident response process. Some involve vocabulary, enabling everyone on the team to communicate effectively and efficiently. Others are strategies that provide a clear way to measure the effectiveness of the response effort.
In this chapter, we’ll look at some of the common terms used in incident response. We’ll also examine the important considerations of core incident response concepts.
Incident
An incident is an adverse event in an information system and/or network, or the threat of such an event occurring. [1] An incident implies harm, or an attempt to cause harm.
This is the classic definition of an incident, but there are many nuances to consider. For example, not all incidents look the same: some result from a deliberate attack, others from human error, and still others from environmental causes beyond anyone’s control. Defining an incident as malicious, accidental, or environmental helps clarify its nature and shapes how the response team approaches it.
Malicious Incidents
These incidents are the result of deliberate action by a threat actor or adversary. Exploiting a vulnerability, abusing stolen credentials, or deploying malicious code to compromise a system all qualify as malicious incidents.
| Malicious incidents are the primary focus of this book. While the incident response practices in this book also apply to accidental and environmental incidents, the techniques and strategies discussed here are primarily intended to address malicious incidents in which an attacker is actively trying to compromise systems and data. |
Accidental Incidents
These incidents result from unintentional actions that cause harm to systems or data. A user deleting sensitive data, a firewall misconfiguration that exposes internal services to the internet, a developer committing API credentials to a public repository, or a partner accidentally exposing customer data in a Google Cloud bucket all warrant an incident response. [2] Accidental incidents often overlap with malicious incidents during the early stages of investigation, since the initial indicators (e.g., exposed data or unauthorized access) can look similar before the root cause is understood. When the root cause is understood, the incident response effort shifts to remediation and prevention, with less focus on determining a threat actor’s actions.
Environmental Incidents
These incidents are caused by natural events or infrastructure failures rather than human action. Examples include power outages that result in unclean system shutdowns and data corruption, flooding that damages a data center, or a cloud provider outage that disrupts critical services. While environmental incidents are outside the scope of this book, organizations should maintain response plans that address them alongside cybersecurity incidents.
Event
An event is any observable occurrence in a system and/or network. This is a straightforward explanation, but the concept of events and how analysts assess and evaluate them can be complicated and vary when working with commercial products and vendors that shape vocabulary to fit their offerings. The following examples expand on the concept of events.
Whether working in an incident response team or a security operations center (SOC), whether involved in a case by applying digital forensics investigation techniques, or as a system administrator maintaining systems, analysts work with events. Events are an unending flow of information that characterizes and records the activities of systems and networks. Examples of events include:
-
Fortinet firewall log entry:
msg="Unregistered device localhost add succeeded" device="localhost" adom="FortiManager" -
Apache web server error log:
PHP Fatal error: Uncaught Error: Call to undefined function add_action() in public_html/wp-content/plugins/hello.php:69 -
Support desk ticket #12345: User reports sluggish performance on their workstation
-
Windows Event ID 104:
The System event log was cleared. -
Linux auth log entry:
systemd: pam_unix(systemd-user:session): session opened for user httpd by (uid=0) -
Network device log:
Administrator accessed UniFi OS via unifi.ui.com. Source IP: 95.181.86.2 -
OPS engineer reports a previously unrecognized local user on camera access server:
badmin
Many analysts think of events exclusively as data recorded by logging sources. However, events can also come from non-digital sources, such as a system administrator’s report of a system outage, or a user’s request for assistance when their workstation suddenly becomes sluggish.
When discussing the flow of events in any collection of systems, this book uses the analogy of a waterfall. Consider the waterfall as the flow of events, cascading down from the top of a rocky outcropping, as shown in Figure 1.
Analysts extract insights from event flows, identifying patterns and anomalies that could indicate an incident. The event might be a message in a log file, an activity observed on the network, a report of unusual activity from a user, or another observable occurrence. Most events are not indicative of an attack or incident. However, some events will warrant further investigation. These are referred to as Events of Interest (EOI).
Event of Interest
An Event of Interest (EOI) is an event that warrants further investigation. When considering the unending flow of events across all systems, analysts quickly realize they cannot process them all. Filtering is necessary to identify which events are interesting. The definition of interesting will vary from organization to organization.
Consider the events shown in Figure 2. This illustration annotates several of the events:
-
Unrecognized browser session: The web application server records the start of a new session for a user who has not previously logged in.
-
Access from source IP 95.181.86.2: A Web Application Firewall (WAF) logs an access request from 95.181.86.2, an IP address that is known to be associated with malicious activity by Cyber Threat Intelligence (CTI) sources.
-
Unusual access time: A system accesses an application at 3:00 AM, a time when the system is typically not in use.
-
Incorrect username login attempt: A user attempts to log in with the username
admin, which is not a valid username for the system.
Each of these events could indicate an incident, or they could be benign. However, any single event by itself provides limited insight. Events are best reviewed in context, and that context is often provided by other events, ordered by time, source, reporting system, and other factors. While some individual events may indicate the presence of an incident, analysts often need to assess the flow of events as a whole to identify an incident.
Consider the updated illustration shown in Figure 3. This illustration merges some of the previous EOI into an aggregated view, denoting multiple login failure events and a successful login followed by an MFA failure as aggregate EOI. This view is still not necessarily an incident, depending on organizational policy (see the sidebar Organizational Definition of EOI). However, the aggregate EOI provides a fuller picture of the events that could indicate an incident warranting investigation.
For the incident response team, events are an important data source that provides insights into potential incidents. However, they are not the only sources analysts use to assess threats and incidents.
Indicator of Compromise
An Indicator of Compromise (IOC) is evidence of a successful compromise. While some events can also be used as IOCs, IOCs are typically more specific data points. IOCs are associated with a source of intelligence, such as a threat intelligence feed, to provide the insight needed to characterize evidence as produced by a compromise. IOCs may be characterized as atomic, computed, or behavioral. [3]
The most commonly reported IOC type is an atomic IOC. The phrase atomic means the IOC is a single, specific data point that can’t be broken down further. Atomic IOCs are valuable as an analysis tool because they rarely generate false positives. They are also easy to apply in automated detection systems, as no additional computation or analysis is required to identify the IOC. Common examples of atomic IOCs include file names, registry keys, URLs, and IP addresses.
A computed IOC is derived from data observed in a system after some calculation or analysis. For example, an attacker might randomize the name of a file used to stage a persistence mechanism on a compromised Windows workstation, thereby preventing easy atomic IOC identification based on the file name. However, computing a hash of the file contents, disregarding the file name, produces an IOC that can be used to identify the file. Other examples of computed IOCs include a regular expression match in a command-line expression. The pattern can trigger an IOC alert when the command line exceeds a specified length or contains content that matches a known attack pattern. Like atomic IOCs, computed IOCs are easy to apply in automated detection systems, but require slightly more processing to identify.
Finally, a behavioral IOC is a data point derived from the combined analysis of attacker behavior, events, atomic IOCs, and computed IOCs. Behavioral IOCs are more complex to identify, as they require analyzing data and understanding attacker Tactics, Techniques, and Procedures (TTPs). They also require a baseline of normal system behavior. Behavioral IOCs are also a valuable tool for detecting incidents, allowing analysts to identify attackers using new or unknown techniques through anomaly analysis or pattern matching.
Analysts work with IOCs to identify the presence of an attacker in a system, on the network, in logging data, or as part of forensic analysis. Threat intelligence feeds provide a valuable source of IOCs, allowing analysts to identify known threats and apply indicators to their systems to detect attacker presence. For example, consider the IOCs shown here, disclosing logging entries when an attacker exploits a vulnerability in the Fortinet FortiGate firewall product (CVE-2024-55591):
Following admin creation log with seemingly randomly generated user name and source IP:
type="event" subtype="system" level="information" vd="root" logdesc="Object attribute configured" user="admin" ui="jsconsole(127.0.0.1)" action="Add" cfgtid=1411317760 cfgpath="system.admin" cfgobj="vOcep" cfgattr="password[*]accprofile[super_admin]vdom[root]" msg="Add system.admin vOcep[4]
Using this threat intelligence, analysts can examine Fortinet device logs to identify successful exploitation of the Fortinet FortiOS vulnerability described in CVE-2024-55591. By developing a detection capability that can automatically evaluate Fortinet device logs, analysts can use the IOC to identify compromised Fortinet devices. In this case, analysts can match the log entry using an atomic IOC by focusing on the static elements of the log entry. Using the syntax described by the Sigma detection format, analysts can develop a computed IOC using pattern matching with the regular expression syntax. [5] A sample Sigma rule is included in Listing 2.
title: Fortinet FortiOS Exploit Attempt - FG-IR-24-535
id: 8c67a5b8-9e6b-4f34-a90f-7b728f02856e
status: experimental
description: Detects exploitation of CVE-2024-55591 against Fortinet FortiOS devices where an attacker creates a new admin user.
references:
- https://www.fortiguard.com/psirt/FG-IR-24-535
author: Joshua Wright
date: 2025-03-12
logsource:
category: firewall
product: fortinet
service: fortios
detection:
keywords: (1)
- 'type="event"'
- 'subtype="system"'
- 'logdesc="Object attribute configured"'
- 'action="Add"'
- 'cfgpath="system.admin"'
- 'cfgattr="password[*]accprofile[super_admin]vdom[root]"'
condition: keywords
falsepositives:
- Unknown
level: high
tags:
- attack.persistence
- cve.2024.55591
- cve.2024.24472
- fortios
- fortinet
| 1 | The keywords used in the Sigma rule to identify the FortiOS exploit matching the IOC intelligence disclosed in the Fortinet advisory. |
An IOC is a piece of intelligence that analysts can make actionable. Whether applied through Sigma rules for log detection, an Endpoint Detection and Response (EDR) product, a Network Detection and Response (NDR) product, or another tool, the IOC represents the actionable insight needed to identify incidents.
Mean Time To Detect
Mean time to detect (MTTD) is a metric used to measure the average time it takes to identify a security incident. It is used to measure the performance of an organization’s SOC and incident response team.
MTTD is calculated by taking the total time to detect all incidents over a specific period and dividing it by the number of incidents detected during that period:
As a performance metric, MTTD is useful for tracking the time lag between when an incident occurs and when it is detected. For example, if an attacker compromises a system and remains undetected for several days, the MTTD will be high, indicating that the organization’s detection capabilities need improvement. Conversely, a low MTTD indicates that the organization can quickly identify and respond to incidents.
Challenges with MTTD Measurement
MTTD is a broad metric that can vary significantly based on the type of incident, the organization’s size and complexity, and the maturity of its security operations. According to various security reports, average MTTD ranges from days to months, depending on the organization and attack type. The IBM Cost of a Data Breach Report indicates that organizations take an average of 204 days to identify a breach. [7] The Verizon Data Breach Investigations Report shows significant variance across different attack vectors, with some incidents detected in minutes while others remain undetected for months. [8] Organizations with mature security operations centers (SOCs) and robust detection capabilities often achieve MTTD measured in hours or even minutes for critical incidents. Industry leaders with advanced threat detection and automated monitoring systems report MTTD of less than twenty-four hours for most incident types. [9] [10]
Another challenge with MTTD is defining when an incident occurs. In many cases, the exact time of the compromise is unknown, and organizations should estimate the occurrence time based on the earliest observed IOC. However, this metric is subject to change, and the occurrence time will be updated as new evidence is discovered during the investigation. Often, the occurrence time is only known after the incident has been fully investigated and remediated, making the MTTD a retrospective metric.
For MTTD to be useful, it should be tracked over time and consistently defined across the organization. Organizations should establish clear criteria for what constitutes an incident and ensure that all incidents are logged with accurate timestamps for when they occurred and when they were detected. By tracking MTTD over time, organizations can identify trends, assess the effectiveness of their detection capabilities, and make informed decisions about where to invest in improving their security posture.
Mean Time To Respond
Mean time to respond (MTTR) is a metric that measures the average time to respond to and resolve a security incident after it is detected. While MTTD measures the efficacy of an organization’s detection processes, MTTR measures the efficiency of its incident response capabilities.
MTTR is calculated by taking the total time to respond to and resolve all incidents over a specific period and dividing it by the number of incidents resolved during that period:
As a performance metric, MTTR helps track how quickly an organization can contain, resolve, and recover from security incidents. For example, if an organization takes an average of forty-eight hours to respond to and resolve incidents, this provides a baseline for measuring improvement efforts. A lower MTTR indicates that the organization has effective incident response processes, adequate staffing, and appropriate tools to quickly address security threats. Conversely, a high MTTR may indicate gaps in response procedures, resource constraints, or technical challenges that slow down incident resolution.
MTTR is often used alongside MTTD to provide a comprehensive view of an organization’s security posture. While MTTD measures how quickly threats are identified, MTTR measures how quickly they are addressed. Together, these metrics help organizations understand the complete lifecycle from initial compromise to resolution.
The total time from when an incident occurs to when it is resolved can be expressed as:
Challenges with MTTR Measurement
For MTTR to be a useful metric, organizations should clearly define what constitutes "resolution" for each incident type. This might include containing the threat, removing the attacker’s presence, recovering affected systems, and verifying that the threat has been fully addressed. Consistency in measurement is crucial for MTTR to be a valuable metric, as different teams or departments may have varying definitions of what it means to resolve an incident.
Like MTTD, MTTR varies widely by incident type, organization size and complexity, and the maturity of its incident response processes. For example, a simple phishing incident in which a user clicks a malicious link might be resolved in a matter of hours. A complex ransomware attack that requires system restoration from backups could take weeks or months to resolve. Because the impact of these incidents varies significantly, organizations can track MTTR using different categories.
-
Incident type: Different incident types (e.g., malware infection, data breach, insider threat) can have vastly different MTTRs. Tracking MTTR by incident type provides a more granular insight into response effectiveness.
-
Severity level: Incidents can be categorized by priority and severity (e.g., P0/Critical, P1/High, P2/Medium, and P3/Low), with MTTR tracked separately for each level.
-
Using percentiles: Instead of relying solely on average MTTR, organizations can track median and percentile MTTR (e.g., 90th percentile) to better understand the distribution of response times and identify outliers. MTTR P50/median provides insight into the typical response time, while MTTR P90 highlights the longer tail of incidents that take significantly more time to resolve.
-
Time to containment vs. time to resolution: Organizations can track MTTR in stages, such as time to containment (stopping the immediate threat) versus time to resolution (complete recovery and validation). This provides insight into tracking different stages of the incident response process, where containment is often a critical milestone that can prevent further damage and can be measured separately from the overall resolution time.
Incident response teams should work with decision makers to determine the most relevant categories for tracking and reporting MTTR. The focus should be on what provides the most actionable insight to improve incident response effectiveness and to measure the impact of investments in tools, training, and process improvements.