Chapter 8 — Log Analysis

Logs are the most-mundane and the most-essential forensic source. Every modern system — operating systems, applications, network devices, security tools, cloud services — emits records of its activity. Authentication attempts, file accesses, configuration changes, network connections, errors, transactions. In a serious incident investigation, log analysis is rarely glamorous but almost always central. This chapter covers what logs are forensically valuable, how logs are correlated with each other and with other system information, how logs are stored and retrieved in a way that supports legal use, and the standard log formats, applications, and tools that make log analysis practical at scale.

8.1 Analysing system, application, and access logs

Logs

A log is a structured record of events generated by a system, application, or device, typically containing a timestamp, a source identifier, a description of the event, and supporting metadata, persisted in a file or sent to a logging service for later analysis.

Logs differ from other digital evidence in being designed, in principle, for examination. Their format is documented; their content is intentional; their retention is part of normal operations. Forensic value emerges from this designed-for-inspection nature.

Categories of logs

System logs. Generated by the operating system about its own operation.

Windows. The Windows Event Log subsystem categorises events into channels: Security, System, Application, Setup, Forwarded Events, and many application-specific channels. Stored in .evtx format under C:\Windows\System32\winevt\Logs\.
Linux. Traditionally /var/log/ directory with syslog, messages, auth.log, kern.log, and others. Modern systemd-based distributions use journald with journalctl for query access; raw logs in /var/log/journal/.
macOS. Unified logging system since macOS Sierra (2016); accessible via log show and Console.app. Older files in /var/log/.

Application logs. Generated by applications.

Web servers. Apache access_log and error_log; Nginx access.log and error.log; IIS logs.
Database servers. MySQL general query log, slow query log, binary log; PostgreSQL log files; Oracle alert log; SQL Server error log.
Application servers. Tomcat catalina.out; JBoss/WildFly logs; node.js application logs; Django/Flask logs.
Mail servers. Postfix, sendmail, Exchange logs.
Security tools. IDS/IPS alert logs; antivirus quarantine logs; EDR telemetry; vulnerability-scanner reports.

Access logs. Subset of application logs focused on access events.

VPN logs. Connection and disconnection events; IP addresses assigned; durations.
Authentication logs. Login attempts (successful and failed); MFA events; privilege escalations.
Building access (badge) logs. Where applicable for combined physical-digital investigations.
File-access auditing. On Windows with auditing enabled, every access to monitored files.

Windows event logs

Particularly important for forensics. The principal channels:

Security log. The most important channel forensically. Records:

Logon/logoff events (4624 success, 4625 failure, 4634 logoff).
Account creation, modification, deletion (4720, 4722, 4738, 4726).
Privilege use (4672, 4673, 4674).
Object access (4663 — when auditing is enabled).
Process creation (4688 — when auditing is enabled).
Policy changes (4719).
Many others; the full list runs to hundreds of event IDs.

System log. Operating-system events — service starts and stops, driver loads, shutdown events.

Application log. Application-emitted events — often less structured than Security log entries.

Forwarded Events. Centralised collection from other systems via Windows Event Forwarding.

Application and Service Logs. Per-application channels — PowerShell (Microsoft-Windows-PowerShell), AppLocker, Sysmon (if installed), Task Scheduler.

Sysmon deserves a separate note. The System Monitor service from Microsoft Sysinternals provides much more granular telemetry than the default Windows audit subsystem — process creation with command line, network connections, file creates, registry modifications, image loads. Sysmon logs to Microsoft-Windows-Sysmon/Operational. Defacto standard in security-conscious Windows environments.

Linux logs

Common files in /var/log/:

auth.log / secure — authentication events (SSH logins, sudo, su).
messages / syslog — general system messages.
kern.log — kernel messages.
dmesg — boot messages.
wtmp / btmp — login records (binary; viewed with last and lastb).
audit.log — Linux Auditd records (if enabled).
dpkg.log / yum.log — package management.
nginx/, apache2/, mysql/ — application directories.

Linux's auditd subsystem provides Windows-Security-Log-style detailed auditing when configured.

Web server logs

A web server log line in Apache Combined Log Format:

192.168.1.100 - alice [21/May/2026:14:30:15 +0545] "GET /admin/dashboard HTTP/1.1" 200 5420 "https://example.com/" "Mozilla/5.0 ..."

Fields: source IP, identd (rarely used), authenticated user, timestamp (with time zone), request line (method, URL, HTTP version), status code, response size, referer, user-agent.

For forensics:

Source IP localises traffic.
Authenticated user attributes activity.
URLs reveal what resources were accessed.
Status codes show success and failure.
User-agent identifies the client type (browser, tool, bot).

For a Nepali bank's web server, log lines like the above provide the timeline of administrative access to the management dashboard — invaluable for investigating unauthorised access.

Database logs

For databases hosting customer or transactional data:

MySQL/MariaDB. General query log records every query; slow query log records queries above a threshold; binary log records changes (used for replication and point-in-time recovery, also forensically useful).
PostgreSQL. Configurable; can log statements, durations, connections, errors.
MongoDB. Default logs to /var/log/mongodb/mongod.log.

Database logs are critical when investigating data exfiltration — what queries were run, by which user, returning what volumes of data.

Application logs in practice

Beyond standard servers, applications log their own activity. Logging discipline varies dramatically:

Mature applications log structured events with timestamps, correlation IDs, user identifiers, and machine-readable fields (JSON is common).
Less mature applications log unstructured text, often without timestamps or with locale-specific timestamps, sometimes with sensitive data (passwords, tokens) leaked into log output.

For a forensic analyst working with diverse application logs, format conversion and normalisation are routine tasks.

8.2 Correlating logs with other system information

A single log entry rarely tells the full story. Correlation across sources is where investigations come together.

Why correlation matters

A typical incident timeline involves:

A user receives a phishing email (mail server log).
The user clicks a link (web proxy log, EDR network log).
A document is downloaded (browser cache, EDR file activity).
The document opens; a macro runs (Sysmon process creation events).
PowerShell is launched (Sysmon, PowerShell logging).
A network connection is made to a C2 (firewall logs, Sysmon network events, EDR).
Reconnaissance is performed (Sysmon process activity, Security log).
Credentials are dumped (Security log event 4688, EDR, possibly memory artefacts).
Lateral movement (additional logon events on other systems, network logs).
Data is collected and exfiltrated (file-access events, network logs).

Each step touches different log sources. The investigator's job is to assemble them into a coherent narrative.

Correlation by time

The simplest and most-fundamental correlation. Two events happening on different systems within a brief window are likely related.

For time-based correlation to work:

Every system's clock must be synchronised (NTP).
Time zones must be tracked.
Timestamps must have appropriate precision (millisecond-level for fast-moving sequences; second-level often sufficient for slower).

Clock-skew is a common forensic problem. A system whose clock is 5 minutes off introduces consistent 5-minute discrepancies. Documenting clock-skew and correcting for it is part of careful analysis.

Correlation by identity

User identifier, host identifier, IP address, session ID. Following a single identity through multiple sources.

A user alice shows up:

In the AD Security log on the domain controller (logon events).
In the firewall log (her workstation's IP).
In the web proxy log (her HTTP traffic).
In the EDR telemetry (process activity on her workstation).
In application logs (queries to the customer database).
In the cloud audit log (her actions in AWS).

Stitching her activity across all sources is correlation by identity.

Correlation by indicator

An IOC — a hash, an IP, a domain — observed in one source motivates a search across other sources. Threat intelligence may identify an IP as suspicious; correlating that IP across firewall, proxy, web server, and EDR logs builds the picture of what connected to it and what they did.

Correlation by host

For investigating a specific compromised host, every log mentioning that host's hostname or IP, in chronological order, is the starting point.

Tools for correlation

Doing correlation by hand on text files is feasible for small investigations. For larger investigations, structured tools are essential:

SIEM platforms. Splunk, IBM QRadar, Microsoft Sentinel, Elastic Stack, Wazuh. Section 8.4 discusses these.
Specialised forensic suites. Magnet AXIOM, Cellebrite Pathfinder, FTK include log-correlation features.
Custom scripting. Python with pandas, jq for JSON logs, command-line utilities.

The SIEM as correlation engine

A SIEM (Security Information and Event Management) system is a platform that aggregates logs and security events from many sources, normalises them into a common schema, indexes them for fast search, applies correlation rules to detect patterns spanning multiple sources, and provides analyst interfaces for investigation and dashboarding.

A SIEM is the operational tool that institutionalises correlation. Common SIEM use cases:

Search. "Show all events involving IP 203.0.113.42 in the last 30 days."
Detection rules. "Alert when more than 10 failed logins from a single IP in 5 minutes."
Dashboards. Real-time monitoring of security metrics.
Forensic queries. Historical analysis after an incident.

Correlating logs with non-log information

Beyond logs, several other information sources support investigations:

Configuration data. What was the system configured to do at the time of the incident? Asset inventory, configuration management database (CMDB), IT-service-management records.
HR data. Who was employed, in what role, with what authority at the relevant time?
Physical access records. Badge logs, video surveillance.
External threat intelligence. What is known about the apparent attacker; what have they done elsewhere.
Vendor data. What is the cloud provider's audit trail; what does the SaaS provider's log show.

Modern SIEM and security platforms increasingly integrate these — particularly the relationship between HR records and digital identity records (so that an investigator can see what user alice@example.com corresponds to in the HR system, and her role and access level at the time).

8.3 Storing and retrieving logs for legal admissibility

Logs that will potentially be used in legal proceedings have additional requirements beyond operational utility.

Retention requirements

How long must logs be kept?

Regulatory. Banking, telecom, and healthcare regulators typically require 1-7 years depending on the data and jurisdiction. Nepal Rastra Bank's directives include retention periods for various banking records.
Contractual. Customer contracts may specify retention.
Investigative. Incidents may not be discovered for months. Logs from before discovery must still exist.
Legal hold. Once litigation is anticipated, all potentially-relevant data (including logs) must be preserved — even beyond normal retention.

Common practice: 90 days online (fast retrieval), 1-2 years near-line (slower retrieval), 5+ years archive (slow but available). The exact periods depend on regulation and cost.

Integrity requirements

Logs used in legal proceedings must be shown to be unaltered:

Hash chains. Each log file or chunk hashed; hashes themselves logged elsewhere.
Write-once storage. Log archive on storage that cannot be modified after writing (WORM media, S3 Object Lock, Azure immutable storage).
Centralisation. Logs forwarded to a logging server reduces the chance of in-place tampering by attackers on the originating system.
Signed transports. Syslog over TLS with mutual authentication; secure log-forwarding agents.

Chain of custody for log evidence

When logs are produced as evidence, the same chain-of-custody discipline as other digital evidence applies:

The logs were generated by the system in normal operation (foundation).
The logs were retained without modification (integrity).
The export of logs for evidence was done by a documented person at a documented time using documented tools (chain of custody).
The export hash is computed and recorded.
The export is preserved unchanged through analysis to presentation.

Centralisation

A central log management system simplifies many things:

Single point of search. Query across all systems from one place.
Tamper resistance. Attackers who compromise a host cannot easily modify logs that have already been forwarded.
Retention management. Standard retention policies applied uniformly.
Access control. Restricted access to log data; separation of duties between system administrators and security analysts.

In Nepal:

Major commercial banks centralise logs to SIEM platforms; logs are subject to retention requirements under NRB directives.
Telecom operators (NTC, Ncell) maintain centralised logging for regulatory and operational purposes.
Government agencies vary; the Government Integrated Data Centre has central logging capability with varying coverage.
Many smaller organisations have minimal centralised logging — a major weakness exposed in 2024-25 incidents.

Regulatory frameworks for log retention in Nepal

Sector-specific:

Nepal Rastra Bank directives include log retention periods for banks. Specific periods vary by data type but are typically multi-year.
Nepal Telecommunications Authority has retention requirements for telecom operators.
Electronic Transactions Act has broad requirements for record retention.
Privacy Act 2075 affects what can be logged about individuals.

For an MSc student or practitioner, the operational rule: consult sector-specific regulation and legal counsel for retention requirements; default to longer rather than shorter when in doubt; document the retention policy and demonstrate compliance.

Reconstructing from incomplete logs

Real investigations frequently encounter incomplete logs:

The log was not enabled at the time.
The log rotated out before the incident was discovered.
The host was compromised and logs were deleted.
The log central server crashed and a window of data was lost.

Reconstruction from indirect sources:

Other systems' logs that mention the missing one.
Backups of the host or logs taken before the gap.
Memory artefacts (Chapter 3) if memory was captured during the gap.
Network captures (Chapter 4) if captures exist for the time.
Application data (Chapter 2) — file modification times, registry, browser history.

A complete reconstruction is rarely possible. The analyst documents what was determined, what could not be determined, and the limitations.

8.4 Common log formats, applications, and tools

Common log formats

Syslog (RFC 3164 and RFC 5424). The traditional Unix log format. RFC 3164 is the historical BSD syslog; RFC 5424 is the modern structured version with timestamps in ISO 8601 format.

Example RFC 5424:

<165>1 2026-05-21T14:30:15.123+05:45 server01 sshd 1234 - - Failed password for invalid user admin from 203.0.113.42 port 54321 ssh2

Components: priority, version, timestamp, hostname, application, process ID, message ID, structured data, free-form message.

Apache Common Log Format / Combined Log Format. Web server logs as shown in Section 8.1.

Windows Event Log (EVTX). Binary XML structure. Each event has an ID, timestamp, source, level, computer, user, and event-specific data fields.

JSON logs. Modern applications increasingly emit JSON. Each event is a JSON object with named fields. Easy to parse programmatically. Common in cloud-native applications.

Common Event Format (CEF). ArcSight-developed; widely supported. Structured text format.

Log Event Extended Format (LEEF). IBM QRadar's format. Similar to CEF.

Sigma rules. Not a log format but a generic detection-rule format that can be translated to SIEM-specific queries.

Log management tools

The Elastic Stack (ELK / Elastic Stack).

The dominant open-source log-management platform.

Elasticsearch. Distributed search engine. Stores and indexes logs for fast queries.
Logstash. Log ingestion and processing pipeline.
Kibana. Visualisation and dashboarding.
Beats (Filebeat, Winlogbeat, Auditbeat). Lightweight agents on the source systems forwarding logs to Logstash or Elasticsearch.

The Elastic Stack is the standard self-hosted SIEM alternative. Commercial Elastic offers managed services and additional features (Elastic Security with detection and response capabilities).

OpenSearch.

A fork of Elasticsearch maintained by Amazon and the OpenSearch community after Elastic's licence changes in 2021. Functionally similar to Elasticsearch. Used in AWS-hosted log analytics and by organisations preferring an Apache-licensed alternative.

Splunk.

The commercial leader in log management. Powerful query language (SPL); strong analytics; expensive at scale. Used by many large enterprises and security teams. Splunk Enterprise Security adds SIEM features on top of the base platform.

Graylog.

Open-source SIEM-style log management. Lighter weight than ELK. Used by smaller organisations.

Wazuh.

Open-source SIEM and XDR. Builds on the Elastic Stack with security-specific agents, rule sets, and dashboards. Increasingly adopted by Nepali organisations as a cost-effective SIEM option.

SIEM and SOAR platforms

Commercial SIEM:

Splunk Enterprise Security.
Microsoft Sentinel. Cloud-native SIEM on Azure.
IBM QRadar. Long-established enterprise SIEM.
Sumo Logic.
LogRhythm.
Securonix.
Google Chronicle (Google Security Operations). Hyperscale log analytics.

Open-source / lower-cost:

Wazuh.
OSSIM (open-source SIEM, from AT&T Cybersecurity / now LevelBlue).
TheHive + Cortex for case management + analyzer orchestration (incident-response complement to SIEM).

SOAR (Security Orchestration, Automation and Response). Adjacent to SIEM. Automates response workflows.

Splunk SOAR (formerly Phantom).
Palo Alto Cortex XSOAR.
IBM SOAR.
TheHive's Cortex (some SOAR-like features).

Building a SIEM with open-source tools

The syllabus for this subject specifically mentions developing a SIEM based on open-source tools — a common laboratory project. A typical reference architecture:

Log collection. Filebeat / Winlogbeat / Auditbeat on endpoints; rsyslog or syslog-ng on network devices; cloud-log streaming.
Ingestion and parsing. Logstash or direct ingestion into Elasticsearch / OpenSearch.
Storage and indexing. Elasticsearch / OpenSearch cluster.
Dashboarding. Kibana / OpenSearch Dashboards.
Detection rules. Sigma rules translated to detection queries; ElastAlert or built-in Elastic Security for alerting.
Investigation. TheHive for case management.

For an MSc project at IOE Pulchowk, building this stack on a few VMs, configuring sample sources, importing Sigma rules, generating sample events to trigger detections — this is a feasible 1-3 month exercise that produces both learning and a usable artefact.

Sigma rules — vendor-agnostic detection

Sigma is a generic, YAML-based rule format for log-based detection. A Sigma rule looks like:

title: PowerShell Encoded Command
description: Detects PowerShell launched with encoded command
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - '-enc '
    condition: selection
level: medium

A Sigma rule is translated to a specific SIEM's native query language by a converter. Sigma's value: write the detection logic once; deploy across multiple SIEMs.

The SigmaHQ project on GitHub maintains a large community-maintained rule repository.

A worked log-analysis exercise

A bank suspects unauthorised access to a sensitive database. Investigation begins with logs.

Phase 1 — Authentication logs (Active Directory).

Query Windows Security log for logon events to the database server. Find: a successful logon to the server at 2026-05-15 02:34 UTC from an account alice that does not normally log on at this hour.

Phase 2 — Workstation logs.

Identify the source workstation IP from the AD logon event. Pull the workstation's Security log around the same time. Find: alice's workstation account logged on earlier at 02:31 from the same IP. EDR shows a remote-desktop session was initiated.

Phase 3 — Network logs.

Pull firewall and VPN logs for the source IP at the time. Find: the IP traces back to a VPN session from outside the corporate network. VPN logs show the session was authenticated as alice from an unusual geographic location.

Phase 4 — VPN authentication.

Pull VPN authentication logs. Find: the VPN session used the correct username and password but did not pass MFA — there was no MFA challenge. Configuration audit reveals MFA was misconfigured on the VPN concentrator for one user group including alice's.

Phase 5 — Database logs.

Pull the database server's audit logs for the session. Find: queries against several customer tables; large result sets returned; possibly exported via the database export functionality.

Phase 6 — Application logs.

Pull the database management application's logs. Find: a CSV export was generated and downloaded.

Phase 7 — Correlation and reconstruction.

Combined timeline:

02:30 UTC — VPN connection from external IP authenticated as alice without MFA challenge.
02:31 UTC — Remote-desktop to alice's workstation.
02:34 UTC — RDP from workstation to database server.
02:35-02:50 UTC — Database queries against customer tables.
02:48 UTC — CSV export generated.
02:50 UTC — Disconnect.

Phase 8 — Verification.

Contact alice. She was in Kathmandu at that time; not using the VPN; did not authorise the access. The compromise is confirmed: stolen credentials, MFA misconfiguration allowed bypass.

Phase 9 — Scope.

Pull all logon events from the same source IP — additional accounts may have been targeted. Pull all VPN connections without MFA challenge in the past month — identify other potentially-compromised sessions.

Phase 10 — Remediation.

Force password reset for alice and other potentially-affected users.
Fix MFA configuration on VPN concentrator.
Notify affected customers about potential data exposure.
File regulatory report to NRB.

Phase 11 — IOC sharing.

VPN source IP and any related IPs shared with sector peers and npCERT.
Detection rules added to SIEM for similar future patterns.

The exercise illustrates how logs, properly collected and analysed, reconstruct an incident in detail. Without the logs, the investigation would be limited to whatever the attacker forgot to clean up.

Common pitfalls in log analysis

Time-zone confusion. Logs from different systems in different time zones; the analyst converts incorrectly.
Identity mapping errors. Account alice on one system may not be the same alice on another (different accounts, federated identities, NAT-induced ambiguity).
Missing context. A log entry alone is rarely sufficient; combining with the conceptual framework of what normal looks like reveals what is abnormal.
Volume overwhelm. Too many events; the analyst gives up before finding the relevant ones. Strategic filtering and query design are essential.
Tool gaps. A SIEM may not have parsers for every log source; raw logs may need custom processing.
Confirmation bias. Finding what is sought; ignoring contradicting data.

The discipline that overcomes these is methodical work — define the timeline of interest, define the identities of interest, define the actions of interest, search systematically across all relevant sources, document findings, validate interpretations, and remain open to alternative explanations.

Learning path

For an MSc student:

Hands-on with system logs. Set up Sysmon on a Windows test system; explore the Application and Service Logs.
Set up a small SIEM. Wazuh or Elastic Stack on VMs. Forward logs from a few sources.
Practice queries. Build dashboards for common patterns. Translate Sigma rules.
Solve CTF challenges. Many CTFs include log-analysis tracks. SANS DFIR challenges; CyberDefenders.
Read real incident reports. Mandiant, CrowdStrike, and other public reports describe what was found in logs; reverse-engineer the queries that would have produced the findings.

Log analysis is unglamorous but indispensable. A forensic-and-incident-response programme that does logs well will resolve incidents that programmes lacking log discipline never even notice. The discipline rewards the practitioner who develops it: where intuition takes the investigator to a hypothesis, logs provide the evidence that the hypothesis is right.

The syllabus of this subject ends here, but the practice of digital forensics and incident response begins. The skills covered across these eight chapters — the legal and procedural foundations, the techniques for disk, memory, network, mobile, and cloud evidence, the analysis of malware and the analysis of logs — combine in real investigations. Few cases use just one chapter's techniques; most use several together. The career that an MSNCS graduate enters in 2026 — whether as a forensic examiner at the Cyber Bureau or a Nepali commercial bank, as an incident responder at npCERT or an ISP, as a security analyst at a consultancy, or as a researcher continuing into doctoral work — will draw on every chapter, often in the same case.

· min read

8.1 Analysing system, application, and access logs​

Logs​

Categories of logs​

Windows event logs​

Linux logs​

Web server logs​

Database logs​

Application logs in practice​

8.2 Correlating logs with other system information​

Why correlation matters​

Correlation by time​

Correlation by identity​

Correlation by indicator​

Correlation by host​

Tools for correlation​

The SIEM as correlation engine​

Correlating logs with non-log information​

8.3 Storing and retrieving logs for legal admissibility​

Retention requirements​

Integrity requirements​

Chain of custody for log evidence​

Centralisation​

Regulatory frameworks for log retention in Nepal​

Reconstructing from incomplete logs​

8.4 Common log formats, applications, and tools​

Common log formats​

Log management tools​

SIEM and SOAR platforms​

Building a SIEM with open-source tools​

Sigma rules — vendor-agnostic detection​

A worked log-analysis exercise​

Common pitfalls in log analysis​

Learning path​