Skip to main content

Chapter 7 — Malware Analysis

Malware is at the centre of most cybersecurity incidents. A ransomware attack starts with the deployment of an encryptor. A data-exfiltration campaign uses a custom tool. A banking-trojan targets eSewa, Khalti, or mobile-banking apps in Nepal. An advanced threat actor maintains persistence with a custom backdoor. In every case, understanding the malware — what it does, how it works, what indicators it leaves — is central to investigation, containment, and prevention of the next attack. This chapter covers what malware analysis is, the static-dynamic-hybrid analytical approaches, and the standard tools (Cuckoo Sandbox, Ghidra, Volatility) that make the work practical.

7.1 Introduction and importance of malware analysis in digital forensics

Malware analysis

Malware analysis is the discipline of examining malicious software to understand its behaviour, capabilities, origin, and impact, producing technical findings that inform detection, incident response, threat intelligence, and prevention.

The discipline sits at the intersection of digital forensics, reverse engineering, and threat intelligence. The malware itself is binary code (executable file, script, document with embedded macro, kernel driver, firmware). The analyst's job is to determine what that code does and what its presence means for the affected organisation.

Categories of malware

Modern malware spans many functional categories. Distinguishing them helps the analyst focus.

Viruses. Code that attaches to other programs and replicates when those programs run. Mostly historical; the term is sometimes used loosely for any malware.

Worms. Self-propagating malware that spreads across networks without user action. WannaCry (2017) was a worm exploiting SMBv1.

Trojans. Code that appears legitimate but executes malicious functions. Banking trojans (Zeus, Emotet, TrickBot, Qakbot historically; newer families continuously emerging) target financial credentials.

Ransomware. Encrypts victim files and demands payment for the decryption key. Conti, LockBit, BlackCat/ALPHV, and many others have dominated criminal-ransomware landscapes in recent years.

Remote Access Trojans (RATs). Provide remote control of infected systems. Used by both criminal and nation-state actors.

Spyware. Covertly monitors user activity. Includes keyloggers, screen capture, microphone/camera activation.

Adware. Displays unwanted advertisements. Often bundled with legitimate software; rarely the focus of forensic work unless tied to broader compromise.

Cryptominers. Use the victim's compute resources to mine cryptocurrency. Common but typically low-impact per host; aggregate impact across many hosts can be substantial.

Rootkits. Hide their presence using OS-level tricks (process hiding, file hiding, hooking). Difficult to detect.

Bootkits / firmware malware. Infect firmware or boot sectors. Persist below the operating-system level.

Mobile malware. Banking trojans for Android (Anubis, FluBot, Cerberus, Brata, and continuously new variants). iOS malware (rare due to platform security) when it exists is usually targeted and sophisticated.

Fileless malware. Operates entirely in memory; does not write executable files to disk. Detection requires memory forensics (Chapter 3).

In Nepal, observed threats include banking-trojan APKs targeting eSewa, Khalti, and mobile-banking app users; phishing-delivered Office-macro malware against organisations; ransomware (with varying degrees of attribution); and occasional nation-state targeting reported in security advisories.

Goals of malware analysis

Several specific outputs the analyst typically produces:

Behavioural understanding. What does the malware do when run? Which files does it touch, which network destinations does it contact, which credentials does it harvest, which persistence mechanisms does it install?

Indicators of compromise (IOCs). Specific artefacts that other organisations can use to detect the same malware: file hashes (MD5, SHA-1, SHA-256), file paths, registry keys, mutex names, IP addresses, domain names, URLs, network signatures.

Capabilities. What the malware can do beyond what it did this time — additional commands the C2 can issue, lateral-movement capabilities, data-targeting patterns.

Attribution support. Tactics, techniques, and procedures (TTPs) that may align with known threat groups. Final attribution requires more than malware analysis but malware findings contribute.

Detection signatures. YARA rules, Snort/Suricata signatures, Sigma rules — used by SOC tooling to detect related activity.

Remediation guidance. What to clean from affected systems; what to look for elsewhere.

Role in incident response

Within the incident-response phases (Chapter 1):

  • Detection. Malware signatures enable detection. EDR products use behavioural signatures to flag suspicious processes.
  • Analysis. Once malware is found, analysis determines the scope — what does it do, who else is affected.
  • Containment. Identification of malicious IPs and domains allows network-level blocking.
  • Eradication. Knowing the malware's persistence mechanisms allows complete removal.
  • Recovery. Confidence that the system is clean depends on complete analysis.
  • Lessons learned. New IOCs and TTPs feed back into detection.

For npCERT and the Cyber Bureau in Nepal, malware analysis is a core capability. International collaboration (with regional CERTs, with vendor threat-intelligence teams) supplements the in-country capability — particularly for novel or sophisticated malware.

7.2 Static, dynamic, and hybrid analysis

The two foundational analytical approaches — and their combination.

Static analysis

Static analysis is the examination of malware without executing it, inspecting the file contents, structure, and embedded artefacts to understand its capabilities and behaviour, performed in a controlled environment with no risk of triggering the malware's actions.

The analyst inspects the binary, the strings, the imports, the disassembled code, and any embedded resources.

What static analysis can extract:

Strings. Embedded text — error messages, hard-coded paths, URLs, IPs, configuration data, encryption keys, build artefacts. The strings utility extracts printable ASCII and Unicode sequences. Many investigations begin here — what strings does the file contain?

File hash. MD5, SHA-256 of the file. Used for IOC sharing and threat-intelligence lookups (VirusTotal, MalwareBazaar, Hybrid Analysis).

File metadata. Timestamps, version information, digital signatures, language. The compile timestamp (PE COFF header) is sometimes intentionally manipulated by attackers but is forensically useful otherwise.

Imports and exports. What APIs the file uses. A program that imports CryptEncrypt, WriteFile, and InternetOpenUrl likely encrypts data, writes to files, and contacts the network — the broad capability outline.

Sections. PE files have sections (.text, .data, .rdata, .rsrc). Unusual sections (named .packed, .upx, custom names) often indicate packing or obfuscation.

Resources. Icons, configuration data, embedded executables. Trojans often carry payload as a resource that the main code drops and executes.

Disassembly. Translation of machine code to assembly instructions. Reveals the actual logic.

Decompilation. Higher-level reconstruction — translating machine code or bytecode into approximated C, C++, or other source code. Easier to read than raw assembly.

Limitations of static analysis

Modern malware actively defeats static analysis:

  • Packing. The actual code is compressed or encrypted within the file. Static inspection sees the unpacker plus a blob of unintelligible data. Unpacking is needed before meaningful static analysis.
  • Obfuscation. Code that is intentionally complex — meaningless branches, unused variables, manipulated control flow.
  • Anti-disassembly. Constructs that confuse disassemblers (junk bytes, conditional jumps with both branches identical, etc.).
  • Anti-decompilation. Specific patterns that decompilers handle poorly.
  • Self-modifying code. Code that rewrites itself at runtime. Static analysis sees the initial code; the runtime code is different.

For unpacked, unobfuscated malware, static analysis can recover much. For modern commodity malware (with off-the-shelf packers) and especially for targeted/APT malware (custom obfuscation), static analysis alone is often insufficient.

Dynamic analysis

Dynamic analysis is the examination of malware by executing it in a controlled environment and observing its behaviour — files created, registry modifications, network connections, processes spawned, memory changes — providing direct evidence of what the malware actually does at runtime.

The malware is run in a sandbox — an isolated environment instrumented to record activity.

What dynamic analysis observes:

  • File system activity. What files are created, modified, deleted. Where the malware installs itself.
  • Registry activity (Windows). Keys created or modified. Persistence mechanisms (Run keys, Services, Scheduled Tasks).
  • Process activity. Processes started, parent-child relationships, command-line arguments. Process injection.
  • Network activity. DNS queries, HTTP requests, TCP/UDP connections, payload contents (when not encrypted).
  • Memory activity. Allocations, written executable regions (indicative of self-modifying code or injection).
  • API calls. Specific OS functions invoked, with parameters.
  • Mutexes and synchronisation. Named mutexes often serve as the malware's "I'm already running here" check; useful IOCs.
  • User-interface interaction. Window titles, screenshots of activity.

Sandbox environments

The infrastructure for safe dynamic analysis:

  • Virtual machines. VirtualBox, VMware, Hyper-V VMs with snapshots so the environment can be reset after each analysis. Standard.
  • Specialised sandbox products. Cuckoo Sandbox (and successors), CAPE Sandbox, Joe Sandbox, Any.run, Hybrid Analysis — each provides automated execution and reporting.
  • Bare-metal sandboxes. Physical machines (with imaging-based reset) used when VM-detection evasion is suspected.
  • Network simulation. Tools like INetSim or FakeNet-NG simulate Internet services (DNS, HTTP, FTP, SMTP) for malware to talk to without exposing real Internet.

Isolation requirements. Sandbox networks must not allow malware to reach real systems, leak data, or participate in further attacks. Standard precautions: isolated network segments, no production credentials, no real PII, careful monitoring of outbound connections.

Limitations of dynamic analysis

Dynamic analysis also has limitations:

  • Time-limited observation. A 5-minute sandbox run sees only what the malware does in 5 minutes. Malware with sleep delays, scheduled activation, or specific trigger conditions may not exhibit interesting behaviour in the observation window.
  • Sandbox detection. Modern malware checks for sandbox indicators (virtual-machine artefacts, debugging tools, mouse movement, system uptime, common analysis tool processes) and behaves differently or terminates if detected.
  • Trigger-based behaviour. Malware that only runs malicious code when contacted by a specific C2 server with a specific command. The sandbox sees nothing without that trigger.
  • Environmental dependency. Malware that needs specific software (a particular browser, a specific Office version, a domain-joined system) may not detonate in a generic sandbox.
  • Anti-analysis techniques. Active counter-measures against analysts and analysis tools.

Hybrid analysis

Hybrid analysis combines static and dynamic techniques in an integrated workflow, using each to inform the other — static analysis identifies what to look for in dynamic; dynamic results illuminate what to focus on in static.

The practical workflow:

  1. Initial triage with static analysis. Hashes, strings, basic file structure. Quick.
  2. Dynamic sandbox run. Behaviour summary in 5-10 minutes.
  3. Targeted static analysis. With dynamic findings to guide — disassemble specific functions that were called, decode strings that were obfuscated, examine specific code paths that were executed.
  4. Deeper dynamic analysis. With debugger attached, single-step through interesting routines. Manipulate inputs to trigger paths not seen in automated runs.
  5. Iteration. Repeat as needed.

This is the standard approach in serious malware analysis. Neither static nor dynamic alone is sufficient against modern malware.

Anti-analysis techniques

A glossary of techniques malware uses to defeat analysis:

Packing and crypting. Compressing or encrypting the actual code so static analysis sees only the unpacker.

Code obfuscation. Intentionally complex code structures.

Anti-VM. Checks for VM indicators — MAC addresses (VMware, VirtualBox have characteristic prefixes), registry keys, running processes (VMtools, vboxservice), hardware characteristics (CPU model, manufacturer strings).

Anti-debugger. Checks for debugger presence (IsDebuggerPresent, exception-based detection, timing checks).

Anti-emulator. Checks for emulation by exploiting differences between real and emulated CPUs.

Anti-disassembly. Code structures that confuse static disassemblers.

Environmental keying. Code that only runs on the intended target system — fingerprinting hardware, domain, location, language settings.

Time bombs. Activation conditional on a specific date or after a delay.

Trigger-based execution. Activation conditional on a specific C2 command, user interaction, file presence, or other condition.

Polymorphism and metamorphism. The malware changes its own code on each replication, defeating signature-based detection.

For analysts, defeating these is part of the work. Standard counter-measures include using realistic sandbox environments, patching anti-VM checks, running for longer durations, and using bare-metal infrastructure when necessary.

7.3 Using tools — Cuckoo Sandbox, Ghidra, and Volatility for malware analysis

The three named tools cover the workflow: dynamic analysis (Cuckoo), static analysis (Ghidra), and memory analysis (Volatility).

Cuckoo Sandbox

Cuckoo Sandbox is an open-source automated malware analysis system that executes suspicious files in instrumented virtual machines, collecting behavioural traces — file activity, network traffic, API calls, memory dumps — and generating detailed reports for analyst review.

Cuckoo was created in 2010 by Claudio Guarnieri. It became the de facto open-source sandbox for many years. The original Cuckoo project's main development slowed; the modern lineage is CAPE Sandbox (Configurable Analysis Platform for Executables), which continues active development and improves on Cuckoo's foundations.

Cuckoo (and CAPE) architecture:

  • A host system orchestrates analysis.
  • One or more guest VMs are instrumented (Windows, Linux, Android variants).
  • The user submits a sample to the host.
  • The host transfers the sample to a guest, executes it, and collects telemetry.
  • A report is generated.

Telemetry collected:

  • API call traces. Every Win32 API call, with parameters and return values.
  • File system activity. Files created, modified, deleted.
  • Registry activity. Keys read or written.
  • Network activity. Packets, DNS queries, HTTP requests.
  • Process activity. Processes spawned.
  • Screenshots. Periodic screen captures during execution.
  • Memory dumps. Process memory at termination.

Reporting. Cuckoo / CAPE produce HTML and JSON reports summarising findings. Many SIEM and threat-intelligence platforms can ingest Cuckoo JSON.

Use cases for an MSc student:

  • Run sample malware (from public corpora like MalwareBazaar, theZoo, or CTF challenges) in a Cuckoo / CAPE instance.
  • Examine the resulting reports.
  • Cross-reference findings with manual analysis.

Setup considerations. Cuckoo/CAPE deployment is non-trivial — requires multiple VMs, host instrumentation, network isolation. Public web-based alternatives (Any.run, Hybrid Analysis, VirusTotal) provide similar capability without local setup, though with the trade-off of submitting samples to third-party services (a concern for sensitive samples).

Ghidra

Ghidra is an open-source software reverse engineering framework developed by the US National Security Agency and released publicly in 2019, providing disassembly, decompilation, and analysis capabilities across many processor architectures, used for malware analysis, vulnerability research, and software-engineering tasks.

Ghidra was a surprise public release — most NSA tools never see daylight. Its release democratised access to capability previously requiring expensive commercial tools (IDA Pro being the established market leader at high cost).

Ghidra capabilities:

  • Disassembly. For Intel x86/x64, ARM, MIPS, PowerPC, and many other architectures. Cross-references, function detection, calling-convention identification.
  • Decompilation. Produces C-like pseudocode from machine code. Critical for understanding logic at a glance. The Ghidra decompiler is widely respected; some analysts prefer it to commercial decompilers.
  • Code-graph visualisation. Control-flow graphs, call graphs.
  • Symbol management. Renaming functions, variables, types as the analyst learns the code.
  • Scripting. Java and Python interfaces for automation. Useful for batch analysis or for specific extraction tasks.
  • Collaboration. Shared projects — multiple analysts can work on the same binary.
  • File format support. PE (Windows), ELF (Linux), Mach-O (macOS), various firmware formats.

Typical Ghidra workflow:

  1. Import the file. Ghidra auto-detects format.
  2. Analyse. Ghidra runs initial automatic analysis — identifying functions, references, strings.
  3. Browse. Examine the entry point, exported functions, suspicious-looking code.
  4. Decompile. Look at functions in C-like form.
  5. Rename and annotate. Replace auto-generated names with meaningful ones as understanding develops.
  6. Cross-reference. Find where strings are used, where functions are called.
  7. Iterate. Build up an understanding of the malware's structure.

Other static analysis tools complementary to Ghidra:

  • IDA Pro / IDA Free. Commercial; market-historical leader.
  • radare2 / Rizin / Cutter. Open-source alternatives.
  • Binary Ninja. Modern commercial.
  • PEStudio. PE-file analysis (Windows).
  • CFF Explorer. PE editing and inspection.
  • dnSpy / ILSpy. For .NET malware.
  • Strings, file, hexdump. Basic command-line utilities.

For an MSc student, Ghidra is the right primary tool — free, capable, and widely used.

Volatility for malware analysis

Volatility (introduced in Chapter 3) is not just for memory forensics in general; it is specifically powerful for malware analysis.

Malware-relevant Volatility plugins:

  • malfind. Detects executable memory regions in processes that do not correspond to legitimately-loaded modules. Classic injected-shellcode detection.
  • hollowfind. Detects process hollowing (replacing a legitimate process's memory with malicious code).
  • ldrmodules. Compares the loader's list of loaded modules against memory-scan results. Discrepancies indicate hidden or unlinked modules.
  • apihooks. Detects API hooks that malware uses to intercept system calls.
  • callbacks. Lists kernel callbacks; rootkits register here to hide.
  • ssdt. System Service Descriptor Table — Windows kernel hooking point.
  • modscan and driverscan. Find kernel modules that have been hidden.
  • yarascan. Run YARA rules against memory contents. Find specific byte patterns indicating known malware.

A typical Volatility-based malware investigation:

  1. Acquire memory from the suspect system.
  2. pslist and psscan — identify processes; compare results to find hidden ones.
  3. malfind — check each process for injected code.
  4. netscan — identify network connections (potential C2).
  5. dlllist — examine loaded DLLs in suspect processes.
  6. cmdline and cmdscan — extract command-line history.
  7. procdump and memdump — extract process executables and memory for static analysis.
  8. yarascan — apply known-malware signatures.

The extracted process executables go to Ghidra or another disassembler for deeper analysis.

YARA rules

YARA is a rule-language and tool for identifying and classifying malware based on textual or binary patterns, developed by Victor Alvarez (originally at VirusTotal), used by analysts, vendors, and CERTs to specify and share detection signatures.

A YARA rule:

rule sample_banking_trojan {
meta:
description = "Detects sample banking trojan"
author = "researcher"
date = "2026-05-21"
strings:
$a = "config.bin"
$b = { 4D 5A ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? }
$c = "esewa" nocase
condition:
($a and $c) or $b
}

YARA rules are used in:

  • Memory scanning (Volatility's yarascan).
  • Disk scanning (find-malicious-files style).
  • Threat intelligence sharing.
  • Sample classification.

A growing collection of public YARA rules covers known malware families. Maintained by ReversingLabs, Florian Roth, and many security researchers; collected at YARA-Forge and GitHub repositories.

Threat intelligence integration

Malware analysis feeds threat intelligence:

  • IOCs collected during analysis are shared with industry sharing groups, CERTs, and threat-intelligence platforms.
  • Indicators of compromise from external feeds are used to scan for additional affected systems.

In Nepal:

  • npCERT disseminates threat advisories.
  • Cyber Bureau maintains some internal threat data; international cooperation extends it.
  • Banking sector has informal sharing (and is moving toward more formal ISAC-style arrangements through 2024-26).

International feeds:

  • MISP (Malware Information Sharing Platform). Open-source platform for threat-intelligence sharing.
  • AlienVault OTX, ThreatFox, AbuseIPDB, URLhaus. Public threat-intelligence sources.
  • Commercial threat intelligence. Mandiant, CrowdStrike, Recorded Future, Group-IB.

A worked malware-analysis exercise

A Nepali bank receives a phishing email with an attached Excel file. The file is suspected to be malicious. The forensic team performs analysis.

Phase 1 — Triage (5 minutes).

  • Compute SHA-256 hash. Search VirusTotal — 32 of 70 engines detect; classified as a downloader.
  • File format: Office Open XML (.xlsx); the file has macros (.xlsm had it been renamed).
  • Extract embedded macros with olevba (from oletools).
  • Static reading of the macro shows obfuscated PowerShell invocation.

Phase 2 — Static (30 minutes).

  • Decode the obfuscated PowerShell. Reveals a download-and-execute command targeting a remote payload.
  • The downloaded URL is logged as an IOC.

Phase 3 — Dynamic (15 minutes).

  • Submit the file to CAPE Sandbox.
  • Observe: PowerShell invocation, network connection to the URL, download of a binary, execution of the binary, additional network connections to a different C2 server.

Phase 4 — Static on the dropped payload (1-3 hours).

  • Acquire the dropped binary from the sandbox memory dumps.
  • Analyse in Ghidra. Functions: keylogging, screen capture, file enumeration in user directories, encrypted communication with C2.
  • Classify as a remote-access trojan / banking trojan.

Phase 5 — Memory analysis on the test sandbox (1 hour).

  • Volatility plugins on the sandbox memory dump.
  • Confirm process injection patterns.
  • Extract C2 communication artefacts.

Phase 6 — IOC compilation and detection.

  • Generate YARA rule from observed strings and code patterns.
  • Compile IOC list: file hash, URLs, C2 IPs, registry keys for persistence, mutex names.
  • Push IOCs to EDR and network detection tooling.

Phase 7 — Scope assessment.

  • Search internal systems for the IOCs.
  • Identify whether any user clicked the link or executed the file.
  • Initiate cleanup on any affected systems.

Phase 8 — Reporting.

  • Internal report with findings, IOCs, and recommendations.
  • Share IOCs with npCERT and banking-sector contacts if appropriate.

The exercise represents a few hours' to days' work for a competent analyst; the lessons feed into the bank's detection capability for similar future campaigns.

REMnux

REMnux is a Linux distribution specifically configured for malware analysis and reverse engineering, pre-packaged with hundreds of relevant tools, maintained by Lenny Zeltser and the SANS DFIR community.

REMnux includes Cuckoo / CAPE prerequisites, Ghidra, radare2, oletools, network-simulation tools (INetSim, FakeNet), debuggers, and many specialised utilities. Standard environment for malware analysts who want a ready-built toolkit.

For MSc students at IOE Pulchowk, REMnux is a practical learning environment — set up in a VM, follow tutorials, work through sample malware. The combination of REMnux for the analyst's workstation and a separate Cuckoo / CAPE sandbox for safe execution covers the full workflow.

Learning path

For an MSc student wanting to develop malware-analysis capability:

  1. Foundations. Read Practical Malware Analysis (Sikorski and Honig, 2012) — still the canonical introductory text.
  2. Tools. Get Ghidra running. Work through Ghidra tutorials. Install REMnux in a VM.
  3. Practice. Analyse samples from public corpora — MalwareBazaar, ANY.RUN public submissions, vx-underground. Start with simple samples; progress to harder ones.
  4. CTF challenges. Pick up reverse-engineering and malware challenges from CTFs (Flare-On is the annual high-bar event; many smaller CTFs include relevant challenges).
  5. Industry resources. Follow blogs of vendor threat-intelligence teams (Mandiant, Microsoft, CrowdStrike, Trend Micro, Kaspersky GReAT, and many others). The publicly-shared technical writeups are tutorials in current malware techniques.
  6. Develop a specialty. Different analysts focus on different malware families, architectures (x86, ARM, mobile), or attacker categories (criminal, nation-state). Specialising deeply on one area is often more valuable than spreading broadly.

The next chapter turns to the data source that increasingly underpins every other forensic domain — the structured logs that systems and applications generate.

· min read