Skip to main content

Chapter 5 — Data Security

Cryptography protects data in motion through the network. But data spends most of its life not in motion — sitting on disks, in databases, in object stores, on backup tapes, in mobile devices, in cloud buckets. Securing data through its full lifecycle is the broader discipline of data security, and it draws on cryptography but extends well beyond it into classification, access control, monitoring, and operational practice. This chapter covers data security as an end-to-end concern: what the field is trying to protect, what threatens it, the standard techniques for protecting data at rest and in transit, how organisations classify and label their data, the supporting operations of obfuscation and tokenisation, the data-loss-prevention discipline, and the specific challenges of mobile and cloud environments.

5.1 Data security concepts, terminology, and principles

Data security is the set of policies, procedures, and technical controls that protect data from unauthorised access, disclosure, modification, destruction, or unavailability throughout its lifecycle from creation through use, storage, transmission, archival, and eventual deletion.

The discipline is broader than cryptography. Encryption is one tool; access control, network segmentation, monitoring, training, and incident response are others. A working data-security programme uses many tools coherently rather than relying on any single one.

The data lifecycle

Data goes through stages, and each stage has different security requirements:

Creation. Data is generated by users, applications, sensors, or external systems. At creation it must be classified, labelled, and assigned an owner.

Use. Data is being actively processed — read, modified, computed on. Encryption-in-use (homomorphic encryption, secure enclaves) is a developing area; most data is decrypted into clear form for processing.

Storage. Data sits in databases, file systems, object stores, backups. Encryption-at-rest is the standard control.

Transmission. Data moves between systems. Encryption-in-transit (TLS, IPsec) is the standard control.

Archival. Data is moved to long-term storage with limited access. Special attention to key rotation over multi-year lifetimes and to media-decay risks.

Destruction. Data reaches end-of-life and is destroyed. Secure deletion is its own discipline — overwriting, cryptographic erasure (destroying the key so the encrypted data becomes unreadable), and physical destruction of storage media.

A data-security programme covers all six stages.

Core security principles

Several principles inform every data-security decision.

The CIA triad. Already introduced in Chapter 3 of the Next Generation Networks subject. Confidentiality, Integrity, Availability — the three properties any security control aims to preserve.

Least privilege. A user, process, or system should have only the access necessary for its function. A database service does not need shell access; a developer should not have production read access by default; a backup system reads-only and never writes to production.

Defence in depth. Multiple layers of control. If encryption-at-rest fails, the database still has access control. If access control fails, the network still has segmentation. If segmentation fails, the application still has authentication. No single layer is trusted to be the last line.

Separation of duties. No single person can perform a high-impact action alone. The person who issues a payment is different from the person who approves it. The person who has the encryption key is different from the person who has the encrypted backup. The person who configures the firewall is different from the person who audits it.

Zero trust. Never trust based on network location. Always verify identity. Continuously re-verify. NIST SP 800-207 (2020) is the canonical zero-trust framework.

Need-to-know. Beyond least privilege — even authorised users see only the data they need for their current task. A customer-service representative resolves a billing query without seeing the customer's purchase history beyond the disputed transaction.

Privacy by design. Privacy considerations are built into system designs from the start, not bolted on. Data minimisation (collect only what's needed), purpose limitation (use data only for the disclosed purposes), retention limits (delete when no longer needed).

Accountability. Every access is logged. Every change is auditable. Every privileged action has a record that survives the actor and the moment.

Key terminology

The vocabulary of data security:

  • Data subject — the natural person to whom personal data relates.
  • Data owner / data steward — the organisational role responsible for a particular data set's classification, access, and lifecycle.
  • Data custodian — the technical role responsible for implementing controls on the data set.
  • Controller — under GDPR and similar laws, the entity that determines the purposes and means of processing personal data.
  • Processor — the entity that processes personal data on behalf of the controller.
  • PII (Personally Identifiable Information) — data that identifies or can identify an individual.
  • PHI (Protected Health Information) — health-related PII, regulated specially in many jurisdictions (HIPAA in the US, similar elsewhere).
  • Sensitive data — broader category covering financial, biometric, government-ID, religious, sexual-orientation, and similar especially-protected categories.
  • Encryption-at-rest — encryption applied to stored data.
  • Encryption-in-transit — encryption applied to data moving over networks.
  • Encryption-in-use — encryption that lets computation happen on the data without decrypting it (homomorphic encryption, secure enclaves).
  • DLP (Data Loss Prevention) — the discipline of detecting and preventing unauthorised data exfiltration.
  • SIEM (Security Information and Event Management) — the centralised log-and-alert system.
  • IRM (Information Rights Management) — control over how documents can be used after distribution (open, print, copy, screenshot restrictions).
  • Tokenisation — replacing sensitive values with surrogates while preserving format and referential integrity.
  • Pseudonymisation — replacing identifying values with placeholders, with the original-to-placeholder mapping kept separately. Under GDPR, pseudonymised data is still personal data.
  • Anonymisation — irreversibly removing identifying information so that the data no longer relates to an identifiable person.
  • Data residency — the requirement that data be physically stored within a specific geography.
  • Data sovereignty — the legal jurisdiction that has authority over data, often tied to where it is processed and stored.

5.2 Data security risks, challenges, and threats

Breach categories

Data breaches happen through several distinct mechanisms.

Cyber intrusion. An attacker gains access to internal systems and exfiltrates data. The largest category in numerical terms. Vectors include phishing, exploitation of unpatched vulnerabilities, compromised credentials, supply-chain attacks.

Insider threat. A legitimate user abuses their access. May be malicious (an employee selling customer data) or negligent (an administrator misconfiguring access controls). According to multiple industry reports, insiders cause roughly 30–35% of breaches in any given year.

Physical theft or loss. Stolen laptops, lost backup tapes, misplaced mobile devices. Encryption-at-rest reduces but does not eliminate this risk.

Misconfiguration. Servers, databases, cloud buckets exposed to the Internet without authentication. The 2017–2024 series of leaks via misconfigured AWS S3 buckets, MongoDB instances, and Elasticsearch clusters has exposed billions of records collectively.

Third-party / supply-chain breach. A vendor, partner, or service provider is compromised; the breach reaches the organisation's data through the connection. The 2013 Target breach (HVAC vendor compromised), the 2020 SolarWinds breach (software vendor compromised), the 2024 Snowflake-customer breaches (compromised credentials reused across many tenants) are major examples.

Threat actors

Different actors pose different risks:

  • Opportunistic criminals. Mass scanners, ransomware operators, commodity malware. Target whoever they can reach. Largely automated.
  • Targeted criminals. Organised crime groups with specific targets — banks, healthcare systems, large retailers. More patient, more sophisticated.
  • Nation-state attackers. Government-affiliated groups conducting espionage or destructive operations. Long timelines, multiple operations, custom tooling. Already discussed in the Next Generation Networks chapter.
  • Hacktivists. Politically motivated actors. Lower technical sophistication on average but unpredictable targets.
  • Insiders. Already discussed. Often the highest-impact category for confidentiality breaches because they bypass perimeter defences.
  • Competitors. Industrial espionage. Less common in dollar terms but significant for specific industries (defence, semiconductors, pharmaceuticals).

Specific threat patterns

Credential-based attacks. The most common entry point in 2025–26 breach reports. Categories:

  • Phishing and its variants (spear-phishing, whaling, smishing, vishing).
  • Credential stuffing — using leaked passwords from one site to log in to others where users have reused them.
  • Password spraying — trying a small number of common passwords against many accounts.
  • Pass-the-hash and pass-the-ticket — using stolen authentication artefacts within an internal network.
  • SIM swapping — taking over a victim's phone number to defeat SMS-based MFA.

Ransomware. Already discussed in the Next Generation Networks chapter. The 2023–25 wave of double-extortion ransomware (exfiltrate, encrypt, threaten publication) makes ransomware also a data-security event, not just an availability one.

Data exfiltration via DNS or other covert channels. Some malware exfiltrates data slowly through DNS queries, HTTPS requests to attacker-controlled domains, or steganographic techniques in image uploads. Detection requires careful traffic analysis.

API abuse. Increasingly common in 2026. Attackers find an API endpoint that returns more data than intended (lack of authorisation checks, IDOR vulnerabilities, missing rate limits) and scrape it. The 2023 23andMe genealogy data scrape (about 7 million users' relative-matching data extracted through credential stuffing into the regular UI) and the 2024 AT&T call-records breach via a compromised Snowflake account are examples.

Cloud misconfiguration. Public S3 buckets, world-readable Elasticsearch indexes, unauthenticated Kubernetes APIs. The Vianet ISP breach in Nepal in 2020 (about 170,000 customer records leaked) and the Foodmandu breach in 2020 (about 50,000 records) both appear to have involved web-application or database exposures that better hygiene would have prevented.

The Nepal context

Specific risks particularly visible in Nepal:

  • Banking and fintech. eSewa, Khalti, IME Pay, and the major banks hold sensitive financial data on millions of customers. Targeted by both cybercrime (banking trojans, phishing) and supply-chain (KYC-provider compromise).
  • Government data. The Government Integrated Data Centre hosts hundreds of .gov.np portals. The 2024 DDoS hit them; the July 2025 Ministry of Education portal compromise exposed student and employee PII; the reported late-2025 Nepal Police breach claimed 2 million-plus records.
  • Health data. Major hospitals (Bir, Patan, Teaching Hospital) digitise records gradually; security maturity lags adoption.
  • Telecom data. NTC and Ncell hold call records, location histories, and identity data on millions. The 2024–25 disclosed appearance of NRB internal documents on dark-web markets — and similar reports from other sectors — suggests insider exposure remains a recurring concern.
  • E-commerce. Daraz and the smaller players hold customer addresses, payment details, and order histories. The 2020 Foodmandu incident is the canonical Nepali e-commerce breach example.

Compliance and regulatory risks

Beyond the direct harm of a breach, organisations face regulatory consequences. The major frameworks:

  • GDPR (EU). Fines up to 4% of global annual turnover or €20 million, whichever is higher. Applies to any organisation processing EU residents' data, regardless of where the organisation is based.
  • CCPA / CPRA (California). US state-level privacy law with extraterritorial scope for businesses meeting size thresholds.
  • HIPAA (US). Health-data-specific regulation with civil and criminal penalties.
  • PCI-DSS. The payment-card industry's security standard. Not government-imposed but mandatory for processing card payments.
  • SOX (US). Financial controls for public companies — affects how financial data is secured.
  • Nepal Individual Privacy Act 2075 (2018). Establishes the right to privacy and creates obligations on data collectors. Enforcement has been limited so far.
  • Nepal Electronic Transactions Act 2063 (2008). Covers electronic records, digital signatures, computer-related offences. Most cybercrime prosecutions in Nepal currently use this Act.

The 2025 IT Bill in Nepal, still in legislative process as of mid-2026, would broaden the regulatory framework and introduce more specific cybersecurity obligations.

5.3 Securing data at rest and in transit

The two halves of cryptographic data protection. Each has standard mechanisms and standard pitfalls.

Encryption at rest

Encryption at rest is the application of cryptographic protection to data while it is stored on persistent media — disks, databases, backup tapes, object stores — so that the storage medium alone, without the decryption key, reveals no information about the data.

Several layers exist; serious deployments combine them.

Full-disk encryption (FDE). The entire storage device is encrypted. The data-encryption key is unlocked at boot time (by a passphrase, a TPM, or a network key server). Once unlocked, the OS sees a normal file system.

  • BitLocker (Windows) — uses AES-128 or AES-256 in XTS or CBC+Elephant Diffuser. TPM-bound. The default for Windows 11 Pro and Enterprise.
  • FileVault (macOS) — AES-XTS-128. Bound to a recovery key and to the user's password.
  • LUKS / dm-crypt (Linux) — AES-XTS or AES-CBC-ESSIV, with passphrase or keyfile unlock. The default for distributions like Ubuntu (when chosen during install) and most server deployments.
  • Hardware-based SED (Self-Encrypting Drives). The drive itself implements XTS-AES; the OS unlocks it via the OPAL protocol. Common in enterprise SSDs.

FDE protects against device theft. It does not protect against an attacker who can run code on the unlocked machine — once decrypted, the data is in clear inside the OS.

File-level encryption. Individual files are encrypted, often with per-file keys. Used in:

  • macOS FileVault (under the hood, per-file keys).
  • Windows EFS (Encrypting File System, less commonly deployed than BitLocker).
  • VeraCrypt, BoxCryptor, Cryptomator — third-party tools for encrypting specific folders or container files.

Database encryption. Several layers:

  • Transparent Data Encryption (TDE) — the database engine encrypts data files on disk. SQL Server, Oracle, MySQL Enterprise, PostgreSQL (via extensions) support TDE.
  • Column-level encryption — specific sensitive columns (credit-card numbers, salaries) are encrypted by the application before storage. The database stores ciphertext; only authorised application code can decrypt.
  • Searchable encryption — research-grade techniques that allow queries over encrypted data. Less common in production.
  • Application-managed encryption — the application maintains keys externally (in a KMS) and decrypts on the fly. Best practice for highly sensitive fields.

Cloud-storage encryption. Every major cloud provider offers encryption-at-rest by default:

  • AWS S3 server-side encryption (SSE-S3, SSE-KMS, SSE-C).
  • Azure Storage Service Encryption.
  • Google Cloud Storage default encryption.

The provider's default uses provider-managed keys. Customer-managed keys (in a KMS) give the customer more control over key rotation and access. Customer-supplied keys (CSE/SSE-C) give the customer full control, with the cost that losing the key loses the data permanently.

Backup encryption. Backups often live longer than primary data, on cheaper, less-protected media, in third-party offsite facilities. Encrypting backups is essential. AES-256 with keys managed in an HSM is standard.

Cryptographic erasure

A useful consequence of encryption-at-rest: cryptographic erasure — destroying the key effectively destroys the data without needing to overwrite or physically destroy the media. A cloud customer wanting to fully wipe their data deletes the KMS key; even though the ciphertext remains on the provider's disks, no one can recover the plaintext.

Encryption in transit

Encryption in transit is the application of cryptographic protection to data while it moves over networks — between client and server, between data centres, between services in a distributed system.

The dominant protocol is TLS. Already covered in the Next Generation Networks chapter (Chapter 6, where it appears as part of zero-trust architecture). Key points for data security:

TLS 1.3 (RFC 8446) is the current standard. Removes most legacy options (no static RSA, no CBC modes, no MD5/SHA-1, no RC4). Mandatory forward secrecy via (EC)DHE. Faster handshake than TLS 1.2.

TLS 1.2 is still widely deployed and is acceptable when configured well (only authenticated cipher suites, modern key exchange, no compression).

TLS 1.1 and earlier are deprecated; browsers and operating systems remove support over time.

Mutual TLS (mTLS). Both client and server present certificates. Standard in service-to-service authentication in microservice architectures, in modern IoT device authentication, and in zero-trust deployments.

HTTPS. The HTTP-over-TLS combination. By 2026 over 95% of web traffic is HTTPS. HSTS (HTTP Strict Transport Security, RFC 6797) tells browsers to use only HTTPS for a given domain.

Email transport encryption. SMTP STARTTLS upgrades a plaintext SMTP connection to TLS. Used between mail servers. Opportunistic (not strictly enforced) by default; MTA-STS (RFC 8461) and DANE add enforcement.

IPsec. Network-layer encryption, covered in the Next Generation Networks chapter. Used in site-to-site VPNs and in some specialised application scenarios.

WireGuard. Modern VPN protocol, simpler than IPsec, uses ChaCha20-Poly1305 by default. Increasingly common in operator-grade and consumer VPNs.

Securing data in use

The newest area — protecting data while it is being processed.

Trusted Execution Environments (TEEs). Intel SGX, ARM TrustZone, AMD SEV-SNP, AWS Nitro Enclaves. The CPU provides an isolated execution environment whose memory is encrypted and whose state is opaque to even the operating system. Sensitive operations can run inside the TEE; the outside world sees only encrypted requests and responses. SGX has had several side-channel weaknesses (Spectre-class attacks, Foreshadow, Plundervolt); the technology is improving but is not perfect.

Homomorphic encryption. Encryption that lets computation happen on the ciphertext, producing an encrypted result that decrypts to the answer that the same computation on plaintext would have produced. Mature for limited operations (additions, comparisons); fully homomorphic encryption supporting arbitrary computation exists but is currently slow. Chapter 7 covers homomorphic encryption.

Secure multiparty computation (MPC). Several parties jointly compute a function on their private inputs without any party learning the others' inputs. Used in joint analytics (banks computing fraud signals without sharing customer data), threshold signing (multiple parties jointly produce a signature without any one having the key), federated learning (training models without centralising data).

Confidential computing. The marketing term that bundles TEE-based approaches into a coherent industry direction. The Confidential Computing Consortium under the Linux Foundation governs the space.

5.4 Data classification and data labelling

Not all data is equal. Classifying data by sensitivity is the foundation of every other data-security decision.

Classification levels

Most organisations use 3–5 levels. A common scheme:

  • Public. Information that can be released to anyone — marketing materials, published research, anonymous statistics. Loss has no impact.
  • Internal. Information for organisation use that should not be public, but disclosure would cause minor harm. Internal directories, meeting minutes, non-financial reports.
  • Confidential. Information whose disclosure would cause significant harm — financial details, business plans, customer lists, internal designs.
  • Restricted. Information whose disclosure would cause severe harm — trade secrets, regulatory filings before disclosure, security architectures, PII, PHI, payment-card data.
  • Top Secret. The most sensitive. Government contexts and very specific corporate cases.

Government classification follows its own scheme (Unclassified / Restricted / Confidential / Secret / Top Secret in many countries). Military and intelligence systems extend this with compartments and caveats.

Why classify

Classification drives:

  • Which controls apply. Restricted data requires encryption at rest and in transit, strict access control, audit logging, geographic restrictions. Public data needs basic integrity protection (so it can't be defaced) but little confidentiality.
  • Who can access. Need-to-know controls map to classification levels.
  • How long to retain. Classification often correlates with retention requirements — financial records seven years, medical records longer, personal data only as long as needed.
  • How to dispose. Restricted data must be securely destroyed; public data can be ordinary-deleted.
  • Where it can go. Geographic restrictions, cross-border transfer rules, allowed third-party processing.

Labelling

Once data is classified, the classification must be visible — on documents, on database fields, on file system metadata, on transmission headers. Labelling is the practice of attaching the classification to the data itself.

Document labelling. Microsoft Information Protection (Azure Information Protection) lets users (or automated rules) apply a label like "Confidential" to a document. The label is embedded in the file's metadata, visible in document headers and footers, and can drive technical controls (encryption, sharing restrictions, watermarking).

Database column labelling. Modern data platforms (Snowflake, Databricks, BigQuery) support column-level tags. A tag like pii.email or confidential.financial is attached to columns; downstream queries can be controlled based on the tags.

Network labelling. Some enterprise networks tag traffic with the classification of the data it carries (using DSCP bits in the IP header, or VLAN tags). Switches and firewalls treat traffic differently based on labels.

File-system labelling. Linux's SELinux and AppArmor support mandatory access control labels on files and processes. Windows has MIC (Mandatory Integrity Control).

Automated classification

Manual classification does not scale. Modern data-security platforms include automated classifiers:

  • Pattern matching. Regular expressions and structured matchers detect formats — credit-card numbers (Luhn-validated), social-security numbers, Nepali citizenship-card numbers, passport numbers, IBAN, email addresses.
  • Dictionary matching. Lists of restricted terms — drug names for healthcare, project codenames for development, customer names for legal contexts.
  • Machine-learning classifiers. Models trained to recognise sensitive content from broader context. Used by enterprise DLP platforms.
  • Document-fingerprint matching. A hash or partial hash of a known restricted document; matching documents are flagged.

The 2024 generation of enterprise classification platforms — Microsoft Purview, Google Cloud DLP, AWS Macie, Varonis — apply these techniques across email, file shares, cloud storage, and SaaS applications.

5.5 Basic operations: obfuscation and tokenisation

Two related techniques for protecting sensitive data when encryption is impractical or insufficient.

Obfuscation (data masking)

Obfuscation, also called data masking, is the process of replacing sensitive data values with realistic-but-fake substitutes that preserve the data's format and statistical properties for testing, analytics, or display while removing the ability to recover the original values.

Use cases:

  • Development and test environments. Developers need realistic data to build and test, but should not see production customer details. Production data is masked on the way to the dev environment.
  • Display in user interfaces. A customer-service representative sees XXXX-XXXX-XXXX-1234 rather than the full credit-card number. The last four digits are useful for verification; the rest is unnecessary.
  • Analytics and BI. A business analyst studies aggregate trends without seeing individual identities.
  • Outsourced processing. A vendor processes data on the organisation's behalf without ever seeing the cleartext.

Static masking. Production data is copied to a non-production environment with sensitive fields replaced at the time of copy. The non-production database contains masked values permanently.

Dynamic masking. The database stores cleartext, but specific users see masked values when they query. Implemented in the database (Microsoft SQL Server Dynamic Data Masking, Oracle Data Redaction) or in a proxy.

Masking techniques.

  • Substitution — replace with a fixed value, or with a value from a lookup list (real-looking but unrelated names).
  • Shuffling — keep the values but reassign them to different rows. Statistical distributions preserved; identifying matches destroyed.
  • Generalisation — replace specific values with broader categories (exact age → age range, exact salary → salary band).
  • Perturbation — add small random noise to numeric values.
  • Truncation — show only the last few digits (the credit-card display example).
  • Pseudonymisation — replace identifiers with consistent fake identifiers; the same person always gets the same pseudonym.

Masking is irreversible in most uses. The original data, if needed later, must come from a separately protected source.

Tokenisation

Tokenisation is the process of replacing a sensitive data value with a randomly generated token that has the same format but no mathematical relationship to the original value, with the mapping between tokens and originals stored in a secure separate system (the token vault).

Tokenisation differs from obfuscation in that it is reversible — given a token, an authorised system can look up the original. The token vault is the trusted store; everything else only deals with tokens.

Standard use case: payment card data. PCI-DSS requires that systems handling cardholder data meet strict security controls. Tokenisation moves card numbers out of most systems:

  1. A customer's card number arrives at the payment processor.
  2. The processor stores the card number in a PCI-DSS-compliant vault and returns a token.
  3. The merchant's systems store and process only the token.
  4. When the merchant needs to charge the card again (recurring billing, refund), they submit the token to the processor, which dereferences to the actual card.

The merchant's systems no longer hold cardholder data — only tokens. The PCI-DSS audit scope shrinks dramatically.

Format-preserving tokenisation. The token has the same format as the original — a 16-digit string for a card number, a Nepali citizenship-card-style string for a citizenship number. This lets the token flow through legacy systems that validate the format.

Format-preserving encryption (FPE). A specific cryptographic technique (NIST SP 800-38G's FF1 and FF3-1) that encrypts a value to another value of the same format. FPE is one way to implement format-preserving tokenisation deterministically — the same input always produces the same token, useful when the same identifier must be recognised across systems.

Vault-based tokenisation. The mapping is stored in a database. Tokens are unrelated to originals; lookups go through the vault.

Vaultless tokenisation. The token is derived deterministically from the original (often via FPE). No central vault to compromise, but anyone with the FPE key can detokenise.

Tokenisation vs encryption

When to choose which:

EncryptionTokenisation
ReversibilityYes (with key)Yes (with vault lookup)
Format preservationNot by default; FPE preserves formatYes by design
Storage of originalEncrypted ciphertext is the dataOriginal is in the vault; everything else has tokens
Compliance scopeEncrypted data may still be in scopeTokens are typically out of scope
Best forBulk data, structured and unstructuredSpecific high-value fields (PAN, SSN)
Operational riskLose the key → lose all dataLose the vault → lose all originals

A mature deployment uses both. Bulk data at rest is encrypted. Specific high-value fields are tokenised in addition.

5.6 Data Loss Prevention (DLP)

Data Loss Prevention is the security discipline and set of technologies that detect, monitor, and prevent unauthorised transmission, copying, or storage of sensitive data, applied at endpoints, networks, and cloud services to keep data within organisational boundaries.

DLP is the operational answer to "how do we stop sensitive data from leaving?" It is one of the most mature data-security disciplines, with a well-developed product market and a well-known set of strengths and limitations.

DLP enforcement points

DLP runs at three points where data flows can be inspected:

Endpoint DLP. An agent runs on every laptop and workstation. It monitors:

  • File copies to USB drives, optical media, network shares.
  • Print operations.
  • Clipboard operations.
  • Screen captures.
  • Uploads via browsers and applications.
  • Email attachments.

The agent applies policies — block, warn, log — based on what is being moved and where.

Network DLP. A device on the network inspects traffic for sensitive content.

  • Email gateway DLP inspects outbound email.
  • Web gateway DLP inspects HTTP/HTTPS uploads (requires TLS interception).
  • Database activity monitoring detects unusual queries against sensitive tables.

Cloud DLP. Cloud-native DLP scans data stored in SaaS applications and cloud storage.

  • Microsoft 365 DLP inspects email, OneDrive, SharePoint, Teams.
  • Google Workspace DLP for Gmail, Drive, Docs.
  • CASB (Cloud Access Security Broker) products like Netskope, Zscaler, McAfee MVISION provide cross-cloud DLP.

What DLP can detect

DLP relies on the same classification techniques as automated classification (Section 5.4), applied to data in transit or about to leave:

  • Pattern matching for structured identifiers.
  • Dictionary matching for known restricted terms.
  • Document fingerprinting for known restricted documents.
  • Statistical and ML-based classifiers for less-structured content.
  • Exact data matching against known sensitive databases.

DLP actions

When DLP detects a potential leak, several responses are possible:

  • Block. Stop the transfer.
  • Warn. Show the user a message asking them to confirm or justify.
  • Log. Record the event for later review without interfering.
  • Encrypt. Apply encryption to the outgoing data automatically.
  • Quarantine. Hold the data for security-team review.
  • Notify. Alert the security team in real time.

Heavy-handed blocking causes user frustration and workarounds. Most mature deployments combine warning, logging, and selective blocking — blocking only the highest-confidence, highest-impact cases.

DLP limitations

DLP has known weaknesses:

  • Encryption defeats inspection. A user who encrypts a file before uploading it bypasses content-based DLP. Some products inspect for the act of encryption itself (a file that suddenly becomes high-entropy is suspicious) but it is a cat-and-mouse race.
  • Image-based exfiltration. Sensitive text screenshotted into an image bypasses text-based DLP. OCR-based DLP exists but is imperfect.
  • Slow exfiltration. A user copying small amounts of data over many weeks may stay below detection thresholds.
  • Insider creativity. An employee determined to steal data has many channels — personal email accounts, personal cloud storage, photographing the screen with a phone, memorising and re-typing. No DLP product covers all of them.
  • TLS interception costs. Inspecting HTTPS traffic requires intercepting it, which requires installing a custom root CA on every device. Operational and privacy costs.

DLP is part of a defence in depth strategy, not a complete answer. It catches mistakes and many casual leaks; it does not stop a determined attacker.

DLP in practice

The major DLP vendors as of 2026: Microsoft Purview DLP, Google Workspace DLP, Symantec DLP (Broadcom), Forcepoint DLP, Digital Guardian, Netskope, Zscaler, Trellix (formerly McAfee MVISION). The market has consolidated significantly in the last few years; most enterprise deployments use either the major cloud vendor's native DLP or a third-party CASB.

For Nepali enterprises, full DLP deployments are uncommon outside the largest banks and a few multinationals. Most organisations rely on email-gateway DLP at most. The 2024 government data centre incident and other recent breaches have raised awareness, and procurement of DLP solutions is increasing.

5.7 Mobile data security and cloud data security

Two areas where data security has its own dynamics distinct from traditional enterprise data centres.

Mobile data security

Mobile devices are now the primary computing endpoint for most users. They carry sensitive data — corporate email, financial information, photos, messages — through unsecured networks (public Wi-Fi, hotel networks) and physical environments (cafés, airports, taxis) outside organisational control.

Device-level controls.

  • Full-disk encryption. iOS and Android both encrypt user data by default. iOS uses AES with a hardware-bound key; Android uses File-Based Encryption (FBE) since Android 7, with per-file keys derived from credentials.
  • Screen lock. Mandatory passcode, biometric (fingerprint, face), or pattern. Determines the trust boundary for everything else.
  • Remote wipe. The user (or the organisation, for managed devices) can remotely erase the device's data if it is lost or stolen.
  • Tracking. Apple's Find My and Google's Find My Device let owners locate (and wipe) lost devices.
  • Secure boot. Verifies the firmware and OS at startup; refuses to boot if tampered.

Application-level controls.

  • App sandboxing. Each app runs in its own isolated container. Apps cannot read each other's data without explicit permission.
  • Permission model. Apps must request permission for sensitive resources (camera, microphone, location, contacts). The user grants or denies per-app.
  • Keychain / Keystore. Hardware-backed storage for cryptographic keys and credentials. iOS Keychain, Android Keystore.
  • App-specific encryption. Banking apps, secure messengers, and similar apps add their own encryption on top of OS-level protections. WhatsApp's end-to-end encryption (Signal Protocol) is the most-deployed example.

Mobile Device Management (MDM).

For enterprise-issued devices, MDM platforms let the organisation enforce policies:

  • Mandatory passcode complexity.
  • Forced encryption.
  • Application allowlists/blocklists.
  • Containerisation — corporate data in a separate, managed container that the user cannot exfiltrate to personal apps.
  • Selective wipe — erase corporate data while leaving personal data intact.

Major MDM platforms: Microsoft Intune, Jamf, VMware Workspace ONE, Google Workspace mobile management, Apple Business Manager + MDM provider.

Mobile-specific threats.

  • Malicious apps. Side-loaded apps (outside the official store) or compromised apps in the store. Android is more susceptible because of side-loading; iOS's curated App Store limits this. The 2025 wave of banking trojans targeting eSewa and Khalti customers in Nepal — distributed through Facebook Messenger and Telegram — illustrates the threat.
  • Phishing on mobile. Smaller screens, hidden URLs, push-notification-based phishing. Higher click-through rates than desktop.
  • Network attacks. Public Wi-Fi MITM. TLS protects most traffic; users should still use VPN on untrusted networks.
  • Physical access. Stolen or lost devices. Encryption + remote wipe is the defence.
  • App permissions creep. Apps with more permissions than they need. Each granted permission is potential leakage.

Best practices.

  • Use the latest OS version (security patches lag less; iOS especially gets long support, Android historically less).
  • Use device encryption (default on modern devices; not always default on cheap Android devices).
  • Use a strong screen lock.
  • Enable remote wipe.
  • Install apps only from the official store.
  • Review and minimise app permissions.
  • Avoid sideloading and jailbreaking/rooting for security-sensitive use.
  • Use end-to-end encrypted messaging for sensitive conversations.

Cloud data security

Cloud security is its own discipline with several distinct concerns.

The shared-responsibility model.

Every major cloud provider publishes a shared-responsibility model. The provider is responsible for the security of the cloud; the customer is responsible for security in the cloud.

Provider responsibleCustomer responsible
IaaS (VMs, storage)Hypervisor, hardware, physical, networkOS, applications, configurations, data
PaaS (managed databases, runtimes)Underlying OS, runtime, patchesApplication code, data, access control
SaaS (Office 365, Salesforce)Almost everythingAccess control, data classification, user behaviour

Misunderstanding the model is the single biggest cause of cloud breaches. Customers assume the provider protects something the provider does not; the configuration that requires customer action is left at insecure defaults.

Cloud-native data security controls.

  • Encryption at rest by default. Every major provider encrypts stored data by default with provider-managed keys. Customer-managed keys add control.
  • Encryption in transit. All API calls go over TLS. Inter-service traffic in the provider's network is typically encrypted.
  • IAM (Identity and Access Management). Cloud-specific permission systems (AWS IAM, Azure RBAC, GCP IAM). Critical and complex; the source of many misconfiguration breaches.
  • Network controls. VPCs, security groups, network ACLs, private endpoints. Limiting exposure of cloud resources to the internet.
  • Logging and monitoring. CloudTrail (AWS), Activity Log (Azure), Cloud Audit Logs (GCP). Centralised logging of every API call.

Common cloud data-security failures.

  • Publicly exposed storage. S3 buckets, Azure Blob containers, GCS buckets configured for public read. Decades of breaches involving this pattern.
  • Over-privileged IAM. Service accounts with broader permissions than they need. Compromise of any one becomes catastrophic.
  • Lost or compromised access keys. API keys leaked through Git commits, exposed configuration files, compromised developer laptops.
  • Inadequate logging. When something goes wrong, the records to investigate are missing.
  • Cross-tenant attacks. Rare but increasingly documented — vulnerabilities in the provider's multi-tenancy isolation allow one customer to access another's data. The 2021 "ChaosDB" Azure Cosmos DB vulnerability is an example.
  • Supply-chain through cloud. The 2024 Snowflake-customer breaches — attackers reused credentials harvested from one customer to access many — show how shared cloud platforms can amplify compromises.

Cloud-specific security technologies.

  • CSPM (Cloud Security Posture Management). Continuously scans cloud configurations for misconfigurations. Wiz, Lacework, Prisma Cloud, Microsoft Defender for Cloud.
  • CIEM (Cloud Infrastructure Entitlement Management). Analyses cloud permissions and detects over-privilege. Often merged with CSPM products now.
  • CWPP (Cloud Workload Protection Platforms). Endpoint-style protection for cloud VMs and containers.
  • CNAPP (Cloud-Native Application Protection Platforms). The 2023-onward marketing umbrella for combined CSPM + CIEM + CWPP + code-scanning.
  • CASB (Cloud Access Security Broker). Sits between users and SaaS applications, applying DLP, logging, and policy.

Cloud key management.

A central decision in cloud data security is who holds the keys.

  • Provider-managed keys. Simplest. The provider holds and rotates the keys. Convenient, lower control. The provider can technically read the data (though contractually does not).
  • Customer-managed keys (CMK). The customer creates and manages keys in the cloud KMS (AWS KMS, Azure Key Vault, GCP Cloud KMS). The customer controls rotation and revocation; the provider can still see plaintext in their service planes.
  • Hold-Your-Own-Key (HYOK) / External Key Management. The customer holds the master key outside the cloud, often in an on-premise HSM. The cloud KMS can request key operations from the external HSM but cannot store the key. Strongest control; operationally complex.
  • Bring-Your-Own-Key (BYOK). A variation: the customer generates the key, exports it (under wrap), and imports it into the cloud KMS. Once imported, similar to CMK.

For very sensitive workloads, HYOK with confidential computing (TEE-based processing) gives the cloud customer end-to-end protection — the cloud provider never sees the data or the key in plaintext.

Multi-cloud and hybrid

Most large enterprises use multiple clouds and combine cloud with on-premise. This raises additional data-security considerations:

  • Consistent classification across environments. Data classified Restricted in one cloud should be treated the same in another.
  • Consistent identity. Federated identity (Azure AD / Entra ID, Okta, Ping) bridges identities across clouds.
  • Encryption-key portability. Keys managed externally to any single cloud (in an enterprise HSM or in a multi-cloud KMS like HashiCorp Vault) avoid cloud lock-in for cryptographic operations.
  • Data sovereignty. Different clouds have different jurisdictional implications. EU-based customers may require EU-only data residency; certain government workloads may require national-borders-only deployment.

For Nepali organisations using cloud services, the practical landscape:

  • Major banks and fintechs use a mix of AWS Mumbai, Azure regions in Singapore/Mumbai, and on-premise infrastructure.
  • Government workloads remain heavily on-premise (Government Integrated Data Centre), with limited cloud adoption pending data-residency policy clarification.
  • SaaS (Microsoft 365, Google Workspace) is widely adopted by enterprises.
  • The lack of in-country cloud regions means most data sits in India or Singapore, creating cross-border data-protection considerations.

The unifying theme of this chapter: data security is a discipline broader than any single technology. Cryptography is essential — encryption at rest, in transit, in use — but only one of many tools. Classification, tokenisation, DLP, mobile management, cloud configuration, and operational practices all contribute. A descriptive answer should reflect this breadth — listing AES and TLS without mentioning classification, DLP, and key management misses what makes a data-security programme actually work.

· min read