Skip to main content

Chapter 7 — Emerging Trends in Cryptography and Security

The previous chapters covered the established foundations — symmetric and asymmetric cryptography, hash functions, PKI, data security, cybersecurity, and data privacy. This final chapter looks forward. It covers the technical and operational directions the field is moving in over the late 2020s: threat intelligence as a defensive discipline, the post-quantum transition that is the most consequential cryptographic event since RSA's invention, homomorphic encryption as the most ambitious attempt to compute on encrypted data, supply-chain security as a response to a decade of upstream compromises, federated learning as the privacy-preserving alternative to centralised AI training, AI/ML as both a defensive tool and a new attack surface, and several other trends shaping the discipline's near future.

7.1 Threat intelligence and predictive analytics

Threat intelligence is the discipline of collecting, analysing, and operationalising information about adversaries — their tools, infrastructure, techniques, motivations, and targets — to inform defensive decisions, with the goal of moving from reactive incident response to proactive risk reduction.

Threat intelligence shifts defensive posture from "respond when something happens" to "anticipate what is likely and prepare for it." It is now a standard function in mature security organisations, with dedicated teams, commercial feeds, and frameworks for sharing.

Types of threat intelligence

Threat intelligence has different audiences and time horizons.

Strategic. High-level, business-context-aware analysis. Threat actors targeting the industry, geopolitical risks affecting the supply chain, regulatory shifts. Consumed by executives and boards. Time horizon: months to years.

Operational. Information about specific campaigns and adversaries. Which threat actor is currently active, what tools they use, what their target profile looks like. Consumed by security leadership. Time horizon: weeks to months.

Tactical. Techniques, tactics, and procedures (TTPs) — how attackers actually operate. The MITRE ATT&CK framework is the standard taxonomy. Consumed by security analysts and engineers designing detections. Time horizon: months.

Technical. Specific indicators of compromise (IoCs) — IP addresses, domain names, file hashes, URLs, email subjects. Consumed by detection systems. Time horizon: days to weeks.

A mature programme operates at all four levels.

Indicators of compromise vs indicators of attack

Indicators of Compromise (IoC). Specific artefacts that suggest a system has been compromised. A known-bad IP address appearing in firewall logs, a known-bad file hash on disk, a specific registry key created by malware. IoCs are atomic and detectable.

Indicators of Attack (IoA). Behavioural patterns that suggest an attack is in progress, independent of specific artefacts. A process spawning a command shell from an Office document, a user authenticating from two impossible locations within an hour, a service account suddenly performing administrative actions. IoAs are pattern-based and harder for attackers to vary.

IoCs go stale quickly — attackers rotate infrastructure constantly. IoAs are more durable — the underlying tactics change more slowly. Modern threat-intelligence programmes prioritise IoAs while still consuming IoC feeds for known-bad detection.

MITRE ATT&CK

MITRE ATT&CK is a globally accessible knowledge base of adversary tactics and techniques based on real-world observations, organised into a matrix that maps the techniques attackers use across the phases of an attack lifecycle, maintained by the MITRE Corporation as an open framework that defenders use for threat modelling, gap analysis, and detection engineering.

ATT&CK has become the dominant taxonomy for talking about attacker behaviour. The current version (v17 as of 2026) covers enterprise environments (Windows, macOS, Linux, cloud, network, containers), mobile (iOS, Android), and industrial control systems.

The matrix organises techniques by tactic (the attacker's goal at each stage):

  1. Reconnaissance. Gathering information about the target.
  2. Resource Development. Building or acquiring infrastructure.
  3. Initial Access. Getting into the target environment.
  4. Execution. Running attacker code.
  5. Persistence. Maintaining access.
  6. Privilege Escalation. Getting higher privileges.
  7. Defense Evasion. Avoiding detection.
  8. Credential Access. Stealing credentials.
  9. Discovery. Learning about the environment.
  10. Lateral Movement. Moving to other systems.
  11. Collection. Gathering data of interest.
  12. Command and Control. Communicating with attacker infrastructure.
  13. Exfiltration. Removing data.
  14. Impact. Disrupting or destroying.

Each tactic contains specific techniques and sub-techniques. T1566 (Phishing), T1190 (Exploit Public-Facing Application), T1078 (Valid Accounts) are among the most-used techniques in current real-world attacks.

Uses of ATT&CK.

  • Threat modelling. Map the techniques relevant to the organisation's threat landscape.
  • Detection coverage. Map current detection capabilities to ATT&CK techniques; identify gaps.
  • Threat-hunting hypotheses. "Are attackers using technique X in our environment?"
  • Red-team scope definition. Plan exercises around specific techniques.
  • Incident-response context. Map observed behaviour to known techniques.
  • Vendor evaluation. Compare detection products by ATT&CK coverage.

The ATT&CK Evaluations programme tests commercial endpoint products against scripted attack scenarios from real threat groups, with results published publicly.

STIX and TAXII

Standards for sharing threat intelligence:

STIX (Structured Threat Information eXpression). A standardised format for describing threat-intelligence objects — indicators, threat actors, campaigns, malware, attack patterns. JSON-based since STIX 2. Maintained by OASIS.

TAXII (Trusted Automated Exchange of Intelligence Information). A protocol for transmitting STIX data between organisations and platforms. REST-based.

Together, STIX/TAXII let organisations share intelligence automatically. Information Sharing and Analysis Centres (ISACs) — industry-specific bodies for sharing threat data among members — use STIX/TAXII.

Threat intelligence sources

Commercial feeds. Recorded Future, Mandiant Advantage, CrowdStrike Falcon Intelligence, Intel 471, Flashpoint, Group-IB. Subscription-based, with curated intelligence and analyst services.

Open-source feeds. Abuse.ch (URLhaus, ThreatFox, MalwareBazaar), AlienVault OTX, the SANS Internet Storm Center, MISP communities. Free, varying quality.

Government feeds. CISA's automated indicator sharing, the FBI's InfraGard, national CERT feeds. Free for qualified organisations.

ISACs. Industry-specific sharing. FS-ISAC (financial services), H-ISAC (health), E-ISAC (electricity), and so on.

Internal intelligence. The most valuable source for any one organisation — what attackers actually do to this organisation. SIEM correlations, EDR detections, incident-response findings, honeypot data.

Dark-web monitoring. Commercial services monitor cybercriminal forums and dark-web markets for mentions of the organisation, exposed credentials, leaked data.

Threat hunting

Threat hunting is the proactive process of searching through environments for adversary activity that has evaded existing detections, based on hypotheses about what attackers might be doing, performed by skilled analysts rather than triggered by alerts.

Hunting is the human-driven counterpart to automated detection. The hunter forms a hypothesis ("attackers using technique X would leave residue Y in our environment"), queries logs and telemetry to test the hypothesis, and either confirms compromise, eliminates the hypothesis, or develops the finding into a new detection rule.

Mature SOC operations include dedicated hunters who develop and execute hunt missions on a regular cadence.

Predictive analytics in security

The aspirational frontier of threat intelligence: not just describing what has happened but predicting what will happen.

Vulnerability prioritisation. Models predict which CVEs are likely to be exploited based on factors like CVSS score, technical characteristics, exploit code availability, threat-actor interest, and broader CVE-context. The EPSS (Exploit Prediction Scoring System) is the standard open model; commercial alternatives include Tenable VPR and Rapid7's analysis.

Attack-likelihood modelling. Quantitative risk models that estimate probability of compromise given current controls and threat environment. Used in cyber-insurance pricing and in board-level reporting.

Anomaly forecasting. ML models that learn seasonal patterns and predict expected behaviour, flagging deviations. Closely related to UBA (Chapter 6.10).

Threat-actor trajectory analysis. Tracking specific threat actors over time and predicting next targets, sectors, or techniques.

The reliability of predictions varies. The field is genuinely useful for prioritisation but cannot give precise predictions about specific events.

Threat intelligence in Nepal

The current landscape:

  • npCERT is the national CERT and central coordinator. Issues advisories and consumes international feeds.
  • Major banks subscribe to commercial threat-intelligence services through their security operations.
  • The Nepali financial sector exchanges some informal intelligence through the Bankers' Association and informal CISO networks.
  • Cross-border sharing with regional CERTs (APCERT membership) provides regional context.
  • The lack of mature internal SOCs at most organisations limits the actionability of intelligence — having a threat feed is useful only if the organisation can detect and respond.

7.2 Quantum, post-quantum, and quantum-safe cryptography

Quantum cryptography is the broad field encompassing both quantum-key-distribution (QKD) systems that use quantum physics to detect eavesdropping on key exchange, and post-quantum cryptography (PQC) — classical algorithms designed to resist attacks by quantum computers — with the practical urgency driven by the threat that a sufficiently large quantum computer would break the public-key algorithms (RSA, ECC, classical Diffie-Hellman) on which most modern Internet security depends.

This section was previewed in the Next Generation Networks subject (Chapter 6) and partially covered in this subject's earlier chapters. Here we treat it as a comprehensive topic.

The quantum threat

The threat comes from two specific quantum algorithms.

Shor's algorithm (1994). Peter Shor's algorithm factors integers and computes discrete logarithms in polynomial time on a quantum computer. Direct consequence: RSA, classical Diffie-Hellman, ECDH, ECDSA, and EdDSA all become solvable in feasible time on a sufficiently large quantum computer.

A 2048-bit RSA modulus would take a classical computer essentially forever. Shor's algorithm on a quantum computer with millions of error-corrected qubits would factor it in hours. Recent resource estimates (Gidney and Ekerå 2021) suggest factoring 2048-bit RSA requires around 20 million physical qubits — far beyond current capability but plausibly reachable in 10-20 years.

Grover's algorithm (1996). A quantum search algorithm that provides a quadratic speed-up for unstructured search. Effect on symmetric cryptography: the effective security of an nn-bit key reduces to n/2n/2 bits. AES-128 against Grover has 64-bit equivalent security (no longer adequate); AES-256 has 128-bit equivalent security (still adequate).

Hash functions face similar partial speed-ups (pre-image to 2n/22^{n/2}, collisions to 2n/32^{n/3}).

The "harvest now, decrypt later" threat

The quantum threat to current systems is not just future-tense. Adversaries can capture and store encrypted traffic today, decrypt it years later when quantum computers exist. For long-lived secrets — government communications, business secrets, financial records, health data, identity information — the threat is effectively immediate.

This is the urgency driving post-quantum migration before sufficient quantum computers exist.

NIST post-quantum standardisation

NIST ran a public competition starting in 2016 to select post-quantum cryptographic algorithms. The process mirrored the AES competition of the late 1990s but was longer (eight years) because the algorithms are more complex and less well-understood.

The standardised algorithms (FIPS 203, 204, 205, published August 2024):

ML-KEM (Module Lattice-Based Key Encapsulation Mechanism), formerly known as Kyber. FIPS 203.

A key encapsulation mechanism — the equivalent of Diffie-Hellman key exchange in post-quantum. Based on the Module Learning With Errors (MLWE) problem on structured lattices. Variants:

  • ML-KEM-512: 128-bit security, public key 800 bytes, ciphertext 768 bytes.
  • ML-KEM-768: 192-bit security, public key 1184 bytes, ciphertext 1088 bytes.
  • ML-KEM-1024: 256-bit security, public key 1568 bytes, ciphertext 1568 bytes.

Compared to ECDH at 32-byte public keys, ML-KEM uses 24-50× more bandwidth — a significant cost for high-volume protocols.

ML-DSA (Module Lattice-Based Digital Signature Algorithm), formerly Dilithium. FIPS 204.

A digital signature algorithm. Lattice-based, similar mathematical foundation to ML-KEM.

  • ML-DSA-44: 128-bit security, public key 1312 bytes, signature 2420 bytes.
  • ML-DSA-65: 192-bit security, public key 1952 bytes, signature 3293 bytes.
  • ML-DSA-87: 256-bit security, public key 2592 bytes, signature 4595 bytes.

Compared to Ed25519 at 64-byte signatures, ML-DSA signatures are 38-72× larger.

SLH-DSA (Stateless Hash-Based Digital Signature Algorithm), formerly SPHINCS+. FIPS 205.

A hash-based signature scheme. Security depends only on hash function properties — no lattice assumptions, no algebraic structure that could be vulnerable to future algorithmic advances. Slower than ML-DSA, with larger signatures (around 8-50 KB), but mathematically conservative.

SLH-DSA is included specifically as a backup against potential cryptanalytic advances against lattice schemes.

HQC (Hamming Quasi-Cyclic). Selected as backup KEM, 2025.

Code-based key encapsulation. Different mathematical foundation from lattices — its security rests on the hardness of decoding random linear codes. Selected as the second KEM standard to provide diversity against the possibility of a lattice-specific cryptanalytic breakthrough. Standardisation expected through 2026-27.

Algorithms that did not make the cut

Several finalists in the competition were eliminated by cryptanalytic discoveries during the process:

  • SIKE (Supersingular Isogeny Key Encapsulation). A finalist for years. Broken in 2022 by Castryck and Decru in about an hour of computation. A dramatic late-stage break.
  • Rainbow. A multivariate signature scheme. Broken in 2022 by Beullens.
  • GeMSS. Another multivariate scheme, also broken.

The dramatic breaks during standardisation underscore that post-quantum algorithms are less mature than the classical algorithms they replace. Continued cryptanalytic scrutiny remains essential.

Deployment

Real-world deployments have begun, in 2024–25, primarily in hybrid modes — combining a classical algorithm with a post-quantum algorithm so that the combination is secure if either remains unbroken.

TLS hybrid key exchange.

  • Google has deployed X25519+ML-KEM-768 in Chrome and on Google services since 2024.
  • Cloudflare has deployed X25519+ML-KEM-768 for connections to its edge.
  • AWS has deployed hybrid KEM in s2n-tls.
  • Apple has announced PQC plans for iMessage (PQ3 protocol since 2024) and Safari.

SSH hybrid key exchange. OpenSSH 9.0+ supports sntrup761x25519, a hybrid using NTRU and X25519. ML-KEM hybrids are being added.

WireGuard. PQC variants are in development; not yet in mainline.

IPsec. IKEv2 hybrid extensions (RFC 9242) define how to combine classical and PQC key exchange.

Signal Protocol. The Signal-protocol-based messengers (Signal, WhatsApp) have begun moving to PQC. Signal's PQXDH protocol (2023) adds a Kyber-based exchange.

Code signing and certificates. First experimental PQC certificate authorities exist. Full deployment requires PQC algorithms in browser root stores, certificate-handling libraries, and protocol stacks — a multi-year effort.

QKD (Quantum Key Distribution)

A different approach. QKD uses quantum physics (typically polarised photons over fibre or free-space links) to distribute key material in a way that detects eavesdropping. Eavesdropping necessarily perturbs the quantum state and is therefore detectable.

QKD offers information-theoretic security for the key-distribution problem but has practical limitations:

  • Distance. Fibre-based QKD is limited to about 100 km without trusted-node relays.
  • Cost. Specialised hardware on both ends.
  • Throughput. Key rates are modest (kilobits per second over typical links).
  • Trust assumptions. QKD authenticates the channel but requires an authenticated classical channel for the protocol to work — usually authenticated with classical cryptography that is itself quantum-vulnerable.

NSA and similar agencies have publicly stated that PQC is the preferred approach over QKD for most use cases. QKD finds niche deployment in some financial-network links and government communications.

Migration challenges

The post-quantum transition involves several practical difficulties:

  • Algorithmic agility. Systems must support both classical and post-quantum algorithms during transition and must be able to switch as PQC algorithms mature or are replaced.
  • Performance impact. Larger keys and signatures slow handshakes and add bandwidth. Some constrained environments (IoT, embedded) struggle to fit PQC.
  • Standardisation lag. Beyond NIST, standards for using PQC in TLS, IPsec, S/MIME, code signing, and PKI take additional years.
  • Library and toolchain support. OpenSSL, BoringSSL, libsodium, and similar libraries are adding PQC support gradually.
  • Hardware support. HSMs and TPMs need to support new algorithms; existing hardware may need replacement.
  • Browser and OS support. Trust stores and TLS implementations across browsers, mobile OSes, and embedded devices need updates.
  • Cost. The migration is expected to take a decade and consume substantial engineering resources across the industry.

Most expert estimates suggest full transition through the early 2030s, with hybrid deployments dominant for several years before pure PQC takes over.

7.3 Homomorphic encryption

Homomorphic encryption is a class of encryption schemes that allow computations to be performed on ciphertexts such that the decrypted result equals the result of equivalent computations performed on the plaintexts, enabling computation on encrypted data without revealing the data to the computing party.

The idea is striking: a cloud server can compute on a customer's data without ever seeing the plaintext. The customer encrypts the data, sends it to the cloud, the cloud computes (on ciphertext), and the customer decrypts to get the answer. Privacy and utility coexist.

Long considered theoretically interesting but practically impossible — Craig Gentry's 2009 breakthrough showed it was achievable. Since then, performance has improved by many orders of magnitude, and limited deployments are now in production.

Classification

Homomorphic encryption schemes are classified by the operations they support:

Partially Homomorphic Encryption (PHE). Supports unlimited evaluation of one operation type. Examples:

  • RSA without padding is multiplicatively homomorphic: E(m1)E(m2)=E(m1m2)E(m_1) \cdot E(m_2) = E(m_1 \cdot m_2).
  • ElGamal is multiplicatively homomorphic.
  • Paillier (1999) is additively homomorphic.

PHE has been known for decades and used in narrow applications.

Somewhat Homomorphic Encryption (SHE). Supports both addition and multiplication, but only for a bounded number of operations. Each operation increases the ciphertext's "noise"; too much noise corrupts the result. SHE schemes can evaluate low-degree polynomials.

Fully Homomorphic Encryption (FHE). Supports arbitrary computation. Achieved by bootstrapping — a technique that refreshes ciphertexts to reduce noise, allowing unlimited operations. Gentry's 2009 thesis was the first FHE construction.

Modern FHE schemes

The current production-grade schemes:

BFV (Brakerski-Fan-Vercauteren). Operates on integer plaintexts. Supports exact arithmetic. Used in scenarios needing precise integer computation.

BGV (Brakerski-Gentry-Vaikuntanathan). Similar to BFV. Both are common choices for integer-style computation.

CKKS (Cheon-Kim-Kim-Song). Operates on approximate (floating-point-style) values. Suited to machine learning and statistical computation where small numerical errors are acceptable. The de facto standard for FHE-based ML.

TFHE (Torus FHE). Specialised for very fast Boolean-circuit evaluation. Suitable for arbitrary computation where the function is expressed as a circuit.

Performance

The current state of FHE performance, approximate as of 2026:

  • A single homomorphic multiplication: milliseconds to seconds depending on parameters.
  • Bootstrapping: hundreds of milliseconds.
  • Compared to plaintext computation, FHE imposes 4-6 orders of magnitude overhead.
  • A neural-network inference on FHE-encrypted input takes seconds to minutes for small networks; large modern networks remain impractical.

Performance has improved by 1000× since Gentry's original construction, but FHE remains expensive enough that it is used only when the privacy benefit clearly justifies the cost.

Major libraries and frameworks

  • Microsoft SEAL. Open-source. BFV, BGV, CKKS implementations. The dominant FHE library.
  • IBM HELib. BGV implementation.
  • PALISADE / OpenFHE. Open-source library supporting multiple schemes.
  • Zama TFHE-rs. Rust implementation of TFHE.
  • Concrete. Higher-level FHE compiler from Zama.
  • Google FHE Transpiler. Compiles ordinary code to FHE-evaluable form.

Applications

Privacy-preserving machine learning. Train or serve models on encrypted data. A bank can have an ML model evaluate encrypted customer features for fraud scoring without seeing the features. A healthcare provider can run analytics on encrypted patient data. Microsoft, IBM, and several startups have demonstrated production-scale FHE-based ML inference for restricted use cases.

Encrypted databases. Query encrypted data without the database server learning either the query or the result. CryptDB (MIT, 2011) was an early demonstration; commercial products like Duality SecurePlus operate in this space.

Secure aggregation. Sum encrypted values from many parties without revealing any individual value. Useful in surveys, voting, federated learning.

Private set intersection. Two parties find which elements they have in common without revealing their full sets. Used in contact-discovery, fraud-detection partnerships between banks, advertising-attribution computations.

Genomic privacy. Compute on encrypted genome data — DNA matching, variant frequency analysis — without revealing the genome.

Limitations

  • Performance. Still slow enough that FHE is impractical for general use.
  • Communication overhead. Ciphertexts are much larger than plaintexts (typically 10-100×).
  • Function restrictions. Some operations are easier than others in FHE; non-linear functions (activation functions in neural networks, conditional logic) often require polynomial approximations.
  • Key management. FHE keys are large and need careful management.
  • Composability. Building large applications from FHE primitives is non-trivial; libraries help but the engineering challenge is substantial.

The broader privacy-preserving computation landscape

FHE is one of several technologies for computing on private data:

  • Secure Multi-Party Computation (MPC). Multiple parties jointly compute without revealing inputs. Often faster than FHE for specific protocols.
  • Trusted Execution Environments (TEE). Hardware-based isolation. Much faster than FHE but trusts the hardware.
  • Differential privacy. Adds calibrated noise to outputs. Provides statistical privacy rather than computational.
  • Federated learning. Distributed training without centralising data. Section 7.5.
  • Zero-knowledge proofs. Prove statements about data without revealing the data.

The right choice depends on the threat model, performance requirements, and trust assumptions. FHE provides the strongest theoretical guarantees at the highest performance cost.

7.4 Supply-chain security

Supply-chain security is the discipline of securing the software, hardware, and services that an organisation depends on but does not build itself, recognising that a compromise in any upstream component can affect every downstream consumer.

A decade of incidents has made this discipline an industry priority. The pattern: attackers compromise a vendor, the vendor's product or service is widely deployed, and the compromise propagates to thousands of downstream organisations simultaneously.

Major supply-chain incidents

SolarWinds / SUNBURST (2020). Russian SVR-attributed attackers compromised SolarWinds' Orion network-monitoring software at the build-system level. Malicious code was inserted into legitimate signed updates. Around 18,000 organisations downloaded the compromised update; the attackers selected several hundred high-value targets (US government agencies, Microsoft, FireEye, cybersecurity firms) for active exploitation. The incident was discovered in December 2020 after FireEye detected its own compromise.

Kaseya VSA (2021). REvil ransomware operators compromised the Kaseya VSA managed-service-provider platform and used it to push ransomware to about 1,500 downstream businesses through their MSPs.

Codecov (2021). A bash uploader script used by thousands of CI/CD pipelines was modified to exfiltrate environment variables (which often contained credentials) from build environments.

3CX (2023). The 3CX desktop application was compromised. A trojanised version was distributed via the official update channel. The attack was attributed to North Korean Lazarus Group.

XZ Utils backdoor (March 2024). A previously-trusted maintainer of the open-source XZ Utils compression library introduced a sophisticated backdoor into the project. The backdoor would have given the attacker SSH access to systems running affected versions of the library. Discovered by chance by a Microsoft engineer noticing unusual SSH performance — likely the closest the open-source ecosystem has come to a Stuxnet-class supply-chain compromise. Demonstrated that long-term, patient social engineering against open-source maintainers is a real threat model.

Snowflake customer breaches (2024). A series of breaches at Snowflake customers (Ticketmaster, AT&T, Santander, and many others) caused by stolen credentials reused across multiple Snowflake-tenant accounts. Demonstrated how multi-tenant cloud platforms can amplify compromises across many organisations.

Software Bill of Materials (SBOM)

A Software Bill of Materials is a formal, machine-readable inventory of the software components — libraries, modules, dependencies — that compose a software product, including their versions, licences, and supplier information.

SBOMs let downstream consumers know exactly what is in the software they use. When a vulnerability is disclosed in a specific library (Log4j in December 2021 being the canonical example), every consumer needs to determine whether their software is affected. SBOMs make this answerable.

US Executive Order 14028 (May 2021) and subsequent implementation requirements mandated SBOMs for federal software acquisitions. The EU Cyber Resilience Act (CRA), in force from 2024 with full obligations from 2027, requires SBOMs for many products.

Standard SBOM formats:

  • SPDX (Software Package Data eXchange). Linux Foundation. ISO standard.
  • CycloneDX. OWASP-led. Widely used in security tools.
  • SWID (Software Identification). NIST format, less common.

SLSA — Supply-chain Levels for Software Artifacts

SLSA (pronounced "salsa") is a Google-led framework that defines progressive levels of supply-chain integrity controls for software artifacts, from source-code provenance through build-process integrity through hermetic builds, providing a maturity model for organisations to assess and improve their supply-chain security.

SLSA defines four levels (the current version 1.0 of 2023; an earlier 0.1 version had levels 1-4):

  • Build Level 1. The build process produces signed provenance — a record of how the artifact was built.
  • Build Level 2. The build runs on a hosted build platform that generates the provenance.
  • Build Level 3. The build platform isolates builds from each other and prevents interference.

Higher levels target reproducible builds, two-person reviews, and hermetic isolation.

Reproducible builds

A reproducible build is a build process in which, given the same source code, build environment, and instructions, anyone can produce an identical (bit-for-bit) output, enabling independent verification that a binary corresponds to its claimed source.

Reproducibility is the strongest defence against build-system compromise — an attacker who inserts malicious code at build time produces a binary that does not match the reproducible result, and anyone rebuilding can detect the discrepancy.

The Debian Reproducible Builds project, NixOS, and Bazel-based build systems have pushed reproducibility forward. Most major Linux distributions have substantial reproducible coverage now. Mobile-app stores still depend on signed builds from authoritative sources rather than reproducibility.

Signed artifacts

A standard technique: every artifact (package, container image, binary) is cryptographically signed by its producer. Consumers verify the signature before use.

Sigstore. Open-source project providing free signing for open-source software. Uses short-lived certificates from a transparency log. Has become the standard for npm, PyPI, Maven, and Linux distribution packages since its 2021 launch.

The Update Framework (TUF). Specification and reference implementation for secure software update systems. Used in Docker Content Trust, Notary, and several Linux distribution update systems.

Code-signing certificates. Traditional CA-issued certificates for signing executables and updates. Long history of theft (the 2010 RealTek and JMicron certificates used in Stuxnet); modern code-signing increasingly uses HSMs and tighter procedures.

Regulatory response

  • US Executive Order 14028 (May 2021). Improving the nation's cybersecurity. Created the requirement for SBOMs in federal acquisitions, mandated zero-trust architecture for federal systems, established a Cyber Safety Review Board.
  • EU Cyber Resilience Act (CRA, 2024). Imposes cybersecurity requirements on products with digital elements. Manufacturer obligations for vulnerability handling, SBOM provision, and incident reporting. Phased entry into force through 2027.
  • NIST SP 800-218 (Secure Software Development Framework). Practices for secure software development, increasingly referenced in compliance requirements.

Supply-chain security practices

For organisations consuming software:

  • Inventory. Know what software is in use. Use SBOMs.
  • Vulnerability management. Subscribe to vulnerability feeds. Patch promptly. Use SCA (Software Composition Analysis) tools to detect vulnerable dependencies.
  • Update verification. Cryptographically verify update signatures. Use repositories that support TUF or Sigstore.
  • Vendor risk management. Assess vendor security posture. Require SBOMs and security attestations. Include supply-chain clauses in contracts.
  • Network segmentation. Limit the blast radius of any compromised vendor system.
  • Monitoring. Detect anomalous behaviour from third-party software (the 3CX malware, for example, was eventually detected by behavioural EDR even though it bypassed signature detection).

For organisations producing software:

  • Source-code security. Code review, signed commits, branch protection.
  • Build-system security. Hardened CI/CD systems. SLSA-compliant build processes. Reproducible builds.
  • Dependency hygiene. Audit dependencies. Pin versions. Use lock files. Detect typosquatting (packages with names similar to legitimate ones).
  • Signing. Sign all releases. Publish SBOMs.
  • Incident readiness. Plan for the case where your build system is compromised. Have a way to revoke and re-issue.

7.5 Security in federated learning

Federated learning is a distributed machine-learning approach in which a model is trained across many clients (devices or organisations) holding local data, with each client computing updates locally and a central server aggregating those updates into a global model, so that the raw training data never leaves the clients.

Federated learning emerged from Google's research in 2016, motivated by the goal of training language models on mobile-device data (keyboard suggestions, smart-reply) without sending the data to Google's servers. It has since become a major paradigm for privacy-preserving ML.

How federated learning works

A standard federated-learning round:

  1. The server selects a subset of clients to participate in this round.
  2. The server sends the current global model to selected clients.
  3. Each client trains the model on its local data for some number of steps.
  4. Each client sends the resulting model update (the gradient or the changed weights) back to the server.
  5. The server aggregates the updates (typically by averaging) into a new global model.
  6. Repeat until convergence.

The raw data never leaves the client. Only the model updates do. This addresses concerns about data centralisation, regulatory restrictions on data transfer, and the security risk of holding large data centrally.

Use cases

Mobile keyboard prediction. Google's Gboard, Apple's QuickType. Models trained across millions of devices, learning from user typing patterns without uploading the text.

Health data analytics. Hospitals jointly train models on patient data without sharing the data. The MELLODDY project trained drug-discovery models across pharmaceutical companies without revealing their proprietary chemistry libraries.

Financial fraud detection. Banks jointly train fraud-detection models without sharing customer data. Several proof-of-concept and limited-production deployments exist.

Edge AI. Industrial IoT devices learn from their local sensor data without centralising the data.

Cross-silo collaboration. Organisations with complementary data jointly train models on the combined data without raw-data exchange.

Privacy threats in federated learning

Despite raw data not leaving the device, federated learning has its own privacy risks.

Model-update leakage. The gradients or weight updates a client sends back can leak information about the training data. A 2019 paper by Zhu et al. showed that individual training examples can be reconstructed from gradients in some configurations.

Membership inference attacks. An attacker who can submit queries to the trained model can determine whether specific data points were in the training set. Federated learning does not by itself prevent this; the global model still memorises training data to some extent.

Model inversion attacks. An attacker can use the model's outputs to reconstruct properties of the training data — for example, recovering face images from a face-recognition model.

Property inference. An attacker can determine aggregate properties of the training data — for example, whether the training data included examples of a particular minority group.

Poisoning attacks. A malicious client can submit deliberately-crafted updates to corrupt the global model — making it perform poorly, biased, or contain backdoors. Backdoor attacks in federated learning have been demonstrated in research.

Defences

Differential privacy in federated learning. Add calibrated noise to client updates before aggregation. Provides quantifiable privacy guarantees but reduces model accuracy. Google's federated training of Gboard uses differential privacy.

Secure aggregation. A cryptographic protocol (introduced by Bonawitz et al. 2017) where the server learns only the sum of client updates, not any individual update. Uses secret sharing and pairwise masking. Prevents the server from inspecting individual updates.

Homomorphic encryption. Client updates are encrypted before sending; server aggregates on ciphertext. Section 7.3 covered FHE. Practical for federated learning with small models.

Byzantine-robust aggregation. Aggregation rules (Krum, trimmed mean, median) that are resilient to a fraction of malicious clients. Replace simple averaging.

Anomaly detection on updates. Detect malicious or anomalous updates before aggregating.

Trusted execution. Run the aggregation in a TEE so even the server operator cannot inspect individual updates.

Federated learning frameworks

  • TensorFlow Federated (TFF). Google's framework. Research and production support.
  • PySyft. OpenMined's privacy-preserving ML library, including federated learning.
  • Flower. Framework-agnostic federated-learning library.
  • NVIDIA FLARE. Enterprise-focused, with privacy primitives.

Limitations

Federated learning is powerful but not a complete privacy solution:

  • Statistical learning still happens. The global model has learned from the aggregated data; its outputs reveal aggregate patterns. Whether this is acceptable depends on the use case.
  • Communication costs. Each round sends model updates between clients and server. For large models, this is substantial bandwidth.
  • Client heterogeneity. Different clients have different data distributions; the global model averages across them, sometimes producing a model that works less well for any specific client than a locally-trained model would.
  • Computation on clients. Edge devices must have enough computational capacity to train the model. Server-side computation is reduced; client-side computation is increased.

Federated learning works best when combined with differential privacy and secure aggregation — together they provide strong protection against most threat models, at the cost of accuracy and complexity.

7.6 AI/ML in cryptography and data security

The intersection of machine learning and security goes both ways: ML defends security systems, and security defends ML systems.

ML as a defensive tool

ML has become standard in security operations.

Network anomaly detection. Models learn baselines of network traffic and flag deviations. Used in NIDS, NDR (Network Detection and Response) platforms, and cloud-traffic security.

Endpoint malware detection. Classifiers identify malicious files based on features extracted from the binary, behaviour at execution, or both. Modern endpoint products (CrowdStrike, SentinelOne, Microsoft Defender) all use ML extensively.

Phishing detection. Models classify emails, URLs, and websites as phishing or legitimate. Common features: URL structure, domain age, page content, sender reputation.

User behaviour analytics. Covered in Chapter 6.10. ML baselines for user activity.

Spam filtering. One of the oldest and most successful ML-in-security applications. Bayesian filters and modern deep-learning models maintain near-zero false-positive rates while catching the vast majority of spam.

Insider threat detection. ML on user activity, file access patterns, and other behavioural signals to detect data-theft or sabotage by insiders.

Vulnerability prioritisation. Models predict which vulnerabilities are likely to be exploited.

Threat intelligence enrichment. ML extracts entities and relationships from threat reports, links indicators to campaigns, and predicts attack trajectories.

Generative AI in security operations

LLMs (Large Language Models) have entered security operations:

Alert triage and summarisation. LLM analyses an alert, gathers context from related logs, and presents a summarised explanation to the analyst. Microsoft Security Copilot, Google's Sec-PaLM, and similar products offer this.

Incident response assistance. Suggests next steps, generates queries, drafts incident reports.

Threat-hunting query generation. Analyst describes what they want to find; the LLM generates the SQL/KQL/Splunk query.

Code review for security. LLMs review code for security issues, complementing static analysis.

SOC playbook automation. Generate or refine SOAR playbooks based on incident patterns.

The benefits are speed and accessibility — junior analysts can produce work approaching senior-analyst quality, and senior analysts can move faster. The risks are LLM hallucinations (confident-but-wrong outputs), over-reliance, and the operational cost of validating LLM suggestions.

ML as an attack surface

Where ML is deployed, attacks on ML follow.

Adversarial examples. Inputs deliberately crafted to fool an ML model. The classic example: an image of a panda imperceptibly perturbed so that a classifier confidently labels it as a gibbon. Adversarial examples exist for nearly every ML system; defences are partial.

Evasion attacks. Generalisation of adversarial examples. Malware authors craft binaries to evade ML-based detection. Phishing operators craft emails that bypass ML filters.

Model extraction. An attacker with query access to a model can train a substitute model that approximates it, often with sufficient fidelity to be a competitive replacement. Threat to commercial ML services.

Model inversion. Reconstruct training data from a deployed model. Particularly concerning for models trained on sensitive data.

Membership inference. Already covered in federated-learning section. Determine whether specific data points were in the training set.

Data poisoning. Insert crafted training data to corrupt the model. Particularly relevant for systems that retrain on user-contributed data — spam filters, recommendation systems, content moderators. Backdoor attacks (model behaves normally except on specific trigger inputs) are an advanced form.

Prompt injection. Specific to LLMs. Attacker text instructs the LLM to ignore its prior instructions and do something different. The 2023-25 wave of incidents where LLM-based applications could be manipulated through user input — including indirect prompt injection through documents the LLM was asked to summarise — has made this an active area of research and defence.

Generative AI as a weapon

The same generative AI that helps defenders also helps attackers:

LLM-generated phishing. Native-quality language, easy customisation per target. The cost of crafting convincing phishing has dropped dramatically.

Voice and video deepfakes. Already covered. The Hong Kong CFO incident (2024) demonstrated production-scale fraud. Deepfake phishing calls and video calls are now within the budget of organised criminals.

Malware generation. LLMs can generate or modify malware code. The state of practice is mixed — LLMs are useful for prototyping malware but not yet writing top-tier malware. The bar is lowering.

Disinformation and influence operations. AI-generated content for political manipulation, market manipulation, harassment campaigns. Scaling that human-driven operations could not achieve.

Vulnerability research acceleration. Both attackers and defenders use LLMs to accelerate finding vulnerabilities in code. The arms race continues.

Defences for ML systems

Several emerging practices:

Adversarial training. Train models on adversarial examples to improve robustness. Partial protection.

Input validation. Detect anomalous inputs before they reach the model.

Output filtering. Detect anomalous model behaviour. For LLMs, filter outputs that suggest the model has been manipulated.

Differential privacy in training. Limits what can be recovered from the model about specific training data.

Federated learning + secure aggregation. Prevents centralisation of training data.

Model watermarking. Embed identifiable patterns so that extracted models can be detected.

Provenance for training data. Track where training data came from and verify its integrity.

LLM guardrails. Output classifiers, input filters, prompt-engineering best practices, retrieval-augmentation with verified sources.

AI/ML in cryptography research

Less mature but interesting:

ML for cryptanalysis. Researchers explore whether ML can find weaknesses in ciphers. Limited progress against modern ciphers; some success against simplified or historical ciphers.

Side-channel analysis. ML processes side-channel measurements (timing, power, EM) more effectively than older statistical methods. Has improved both attacks and defences.

Random-number testing. ML detects bias in RNGs.

Protocol verification. ML-assisted formal verification of protocols.

The interaction between ML and cryptography is still finding its shape. The defining trend of the late 2020s is the integration of AI throughout security operations, with the consequent need to secure the AI systems themselves.

A miscellany of important directions that did not fit other sections.

Zero-trust maturity

Zero trust (introduced in the Next Generation Networks subject) has moved from buzzword to enforced practice. The US OMB Memorandum M-22-09 (January 2022) mandated federal agencies move to zero-trust architecture; many private organisations have followed. The challenges in 2026 are operational — how to move from vision to fully-deployed architecture without breaking existing applications. The CISA Zero Trust Maturity Model (current version 2.0) is the leading reference.

Confidential computing

Already touched in Chapter 5. The use of hardware TEEs to protect data in use is moving from research to commercial deployment. Major cloud providers offer confidential VMs (Azure Confidential VMs, AWS Nitro Enclaves, GCP Confidential VM). The Confidential Computing Consortium maintains shared specifications. Critical use cases: regulated workloads, multi-party computation, sensitive ML inference.

Decentralised identity

Self-sovereign identity (SSI) — where individuals control their own credentials without depending on centralised authorities — has been a long-standing aspiration. W3C standards (Decentralized Identifiers, Verifiable Credentials) are now stable. Real deployments are starting:

  • EU Digital Identity Wallet (EUDIW) — rolling out from 2024 through 2027 under eIDAS 2.0.
  • Apple Wallet driver's license — selective disclosure of identity attributes.
  • Government-issued credentials with privacy-preserving presentation.

The privacy benefit is selective disclosure — proving "over 21" without revealing date of birth.

Regulatory convergence

The regulatory landscape is converging across jurisdictions:

  • GDPR as the global baseline.
  • US state laws building toward something like a federal privacy law.
  • Asia-Pacific frameworks (PDPA Singapore, APPI Japan, PIPL China, India DPDP Act) sharing concepts.
  • Sectoral regulations layering on top.

The compliance challenge is moving from "comply with the home country law" to "comply with all the jurisdictions where data subjects live."

AI governance

The EU AI Act (in force from 2024, full obligations through 2027) is the most comprehensive AI regulation. Risk-based classification:

  • Unacceptable risk (social scoring, exploitative AI) — prohibited.
  • High risk (employment, education, law enforcement, infrastructure) — extensive obligations.
  • Limited risk (chatbots, deepfakes) — transparency obligations.
  • Minimal risk — no specific obligations.

Other jurisdictions are following with their own frameworks. NIST AI Risk Management Framework provides a US-focused alternative. ISO/IEC 42001 is the AI management system standard.

The intersection of AI governance with security is rich:

  • Security requirements for high-risk AI systems.
  • Documentation requirements (model cards, data sheets).
  • Bias assessment and mitigation.
  • Human oversight requirements.

Quantum key distribution and the quantum internet

Already discussed briefly. QKD deployments continue in narrow niches; broader quantum-network research (entanglement-based protocols, quantum repeaters, quantum-network protocols) is moving forward in research labs but is not yet practical for general use.

Cybersecurity workforce

The persistent gap between cybersecurity demand and qualified personnel. Estimates suggest 3-4 million unfilled cybersecurity positions globally. Initiatives:

  • Government training programmes.
  • University curricula expansion.
  • Industry certification ecosystem (SANS, ISC2, ISACA, vendor-specific).
  • AI-assisted security operations (Section 7.6) as partial substitution.

The workforce challenge is particularly acute in countries like Nepal where the demand is rising rapidly but the supply of skilled professionals lags. IOE programmes like MSNCS exist specifically to address this.

Resilience and continuity

A shift in emphasis from "prevent breaches" to "be resilient to breaches." The recognition that determined attackers will eventually succeed leads to focus on:

  • Detection time (mean-time-to-detect).
  • Response time (mean-time-to-respond).
  • Recovery time (mean-time-to-recover).
  • Business continuity (operating during compromise).
  • Disaster recovery (rebuilding after compromise).

The increasing prevalence of ransomware has made backup integrity and rapid recovery operational priorities.

Privacy-enhancing computation as commodity

Several PETs are crossing from research to production:

  • Differential privacy in commercial analytics (Apple, Google, Census).
  • MPC in fraud-sharing among banks.
  • FHE in limited ML inference.
  • Zero-knowledge proofs in blockchain applications and selective disclosure.

The commodification trajectory parallels what happened to TLS in the 2000s — from specialised technology to default infrastructure.

Convergence of cyber, physical, and information security

The old separation between cyber security (digital), physical security (locks, guards), and information security (people, processes) is dissolving. Modern threats span the categories — a phishing email leads to an insider exfiltrating data on a USB drive that ends up in a competitor's hands. Modern defence integrates the categories. Many large organisations have unified "security" functions that report to a single executive.

What is not changing

Some things will not change:

  • Confidentiality, integrity, availability — the CIA triad endures.
  • Defence in depth — no single control is enough.
  • People remain the largest factor in security outcomes — for good and ill.
  • The economics of attack and defence — attackers go where the value is and where the defence is weakest.
  • Cryptographic algorithms have lifecycles, and migrations take longer than expected.

The discipline is in continuous motion, but the underlying logic is stable. A student finishing this course in 2026 is equipped with the foundations — the specific algorithms, frameworks, and threats will evolve, but the principles of how cryptography and data security work remain the substrate on which the future is built.

· min read