Cloud Storage Archives

A

Access Control

Mechanisms that define who can access data and what actions they can perform, ensuring data security and compliance.

Access Control & IAM

Fine-grained permissioning using Identity and Access Management. Example: AWS IAM policies for bucket-level access.

Access Patterns / Analytics-Optimized Storage

Storage tuned for workload patterns: sequential reads, random access, big data analytics, or AI training.

AI/ML Optimized Storage

High-throughput, low-latency storage designed for training/inference pipelines with GPUs or HPC clusters.

API Access

Application Programming Interfaces that allow developers to interact programmatically with cloud storage services.

Asynchronous Replication

Data is first written to primary location, then propagated to replicas. Reduces latency but introduces a small risk of data loss if primary fails.

Audit & Compliance Logging

Tracks access and modifications for regulatory adherence. Example: S3 access logs, CloudTrail, Azure Monitor.

Availability

The uptime and accessibility of data, expressed as a percentage. SLAs differ by storage class; e.g., S3 Standard offers 99.99% availability.

Attach/detach latency & mount semantics

Time to attach volumes to nodes and mount behavior (lazy vs immediate mounts); impacts pod startup time.

B

Backup as a Service (BaaS)

Cloud-based service that provides backup solutions, allowing businesses to back up their data to the cloud.

Block Storage

Data storage where data is stored in fixed-size blocks, each with its own address, commonly used for databases and virtual machines.

Bloom filters / probabilistic filters

Memory-efficient membership tests to reduce disk lookups for absent keys (used in object stores / indexes).

Background rebalancing / auto-heal

Automatic redistribution of data when nodes join/leave and self-healing after corruption or disk failures.

Benchmarking tools (fio, rados bench, cosbench)

Standard tools to measure IOPS, throughput, and latency under controlled workloads.

Backup operators and orchestration (Velero, Restic, Kopia)

Tools/operators for orchestrated backups, restores, and policy-driven protection in Kubernetes environments.

C

Checksum / Data Integrity Verification

Validates stored data using checksums or hash algorithms to detect corruption.

Cloud Storage

A model of data storage where digital data is stored in logical pools across multiple servers, often in different locations, managed by a hosting company.

Cloud-Native Integrations

Integration with compute, analytics, AI/ML, and serverless services. Example: S3 + EMR or GCS + Vertex AI.

Cloud-Native Storage Integrations

Integration with compute, analytics, and AI/ML services. Example: AWS S3 + EMR for big data processing, GCP Storage + Vertex AI for ML training.

Cold Storage

A storage tier optimized for infrequently accessed data, offering lower costs but higher retrieval times, often used for archival purposes.

Coldline Storage

A storage class designed for data that is rarely accessed, offering low-cost storage with higher retrieval times, suitable for archival data.

Compliance

Adherence to regulatory requirements and standards (e.g., GDPR, HIPAA) governing data storage and protection.

Compression

Reducing data size to save storage space and bandwidth. Applied in block or object storage for efficiency.

Consistency Models

Rules governing how and when data updates are visible across distributed systems. Types: strong consistency, eventual consistency, read-after-write consistency.

Cross-Zone vs Cross-Region Replication

Cross-zone: within same region (low latency), Cross-region: across regions for DR and geo-redundancy.

Consensus protocols (Raft / Paxos)

Algorithms that ensure distributed metadata/state agreement (used by etcd, Ceph MONs) for leader election and consistent cluster state.

Chunking / object chunk size

How large objects are split into chunks/segments—affects parallelism, repair, and I/O performance.

Compaction / merge

Background process that rewrites data structures to reduce fragmentation and reclaim space (common with LSM).

Copy-on-write (CoW) / redirect-on-write

Snapshot/clone implementation patterns that avoid full copies by sharing/redirecting blocks on write.

CSI driver (controller/node plugins)

Controller (control-plane ops like create/delete) and node (attach/mount) side components implementing CSI spec.

Client-side caching / metadata caching

Local caches (LRU, TTL-based) for hot data/metadata to reduce backend IOPS and lower latency.

Checksum algorithms (CRC32C, xxHash, SHA256)

Choice impacts CPU cost and collision risk; CRC32C common for perf, SHA256 for cryptographic integrity.

Chaos testing / fault injection

Inject faults (disk loss, network partition) to validate healing, redundancy, and SRE runbooks.

Catalog / metadata services (search, audit)

Indexing/search and audit stores for governance, discovery, and compliance across the object namespace.

D

Data Archiving

The process of moving data that is no longer actively used to a separate storage device for long-term retention.

Data Compression

Reducing the size of data to save storage space and improve transfer speeds, often used in cloud storage solutions.

Data Deduplication

Eliminating duplicate copies of repeating data to save storage space and improve efficiency.

Data Integrity

Ensuring that data is accurate, consistent, and unaltered during storage and transmission.

Data Migration

The process of transferring data between storage systems or locations, often during cloud adoption or infrastructure upgrades.

Data Migration / Ingest Tools

Transferring data to the cloud or between storage systems. Example: AWS DataSync, Google Transfer Service.

Data Redundancy

The practice of storing copies of data across multiple locations to ensure availability and durability in case of hardware failure.

Data Scrubbing

Periodic process to verify and correct data integrity across storage nodes.

Data Sovereignty

The concept that data is subject to the laws and regulations of the country in which it is stored, impacting cloud storage decisions.

Data Tiering

The practice of moving data between different storage types based on access patterns and cost considerations.

Data Transfer / Egress Costs

Charges for moving data out of cloud regions; critical for global enterprise workloads.

Deduplication

Removing duplicate data to save storage space and reduce costs. Common in backup and archival systems.

Disaster Recovery

Strategies and tools to recover data and applications in the event of a catastrophe, ensuring business continuity.

Disaster Recovery / Failover & Failback

Strategies to switch operations to backup systems during failure (failover) and revert after recovery (failback).

Distributed / Scale-Out Storage

Storage system that spreads data across multiple nodes for horizontal scalability and high availability. Used in cloud-native and big data workloads. Example: Ceph, Amazon FSx.

Durability

Probability that data remains intact and uncorrupted over time. Enterprise-grade storage targets 11–16 nines of durability. Example: S3 Standard offers 99.999999999% durability.

Data scrubbing / integrity checks

Periodic verification of checksums to detect silent corruption and trigger repairs.

Data lake formats (Parquet, ORC, Delta Lake, Iceberg)

Columnar / table formats enabling efficient analytics on object storage with partitioning and ACID semantics.

Data locality & affinity-aware scheduling

Scheduling pods or jobs near data replicas to reduce network hops and maximize throughput.

E

Edge Storage

Storing data closer to the location where it is needed to reduce latency and bandwidth usage, often used in IoT applications.

Edge Storage / CDN Integration

Caches data closer to users/devices for low-latency access. Used in IoT, streaming, and AI inference.

Encryption

The process of converting data into a coded format to prevent unauthorized access, both at rest and in transit.

Encryption (At-Rest & In-Transit)

Protecting data using cryptographic methods. At-rest uses AES256/KMS keys; in-transit uses TLS/SSL. Required for compliance standards (HIPAA, PCI, GDPR).

Erasure Coding

Advanced redundancy technique splitting data into fragments with parity across nodes. Reduces storage overhead while maintaining durability.

Event Notifications / Object Lifecycle Hooks

Trigger actions when objects are created, updated, or deleted. Example: S3 events triggering Lambda.

Erasure coding (k+m)

Redundancy scheme that slices data into k data + m parity shards to reduce storage overhead vs replication while allowing reconstruction.

Eventual / causal consistency

Weaker models where replicas converge over time; useful for high-availability geo-replication.

Erasure coding repair locality

Reconstructing lost shards optimally using local parity to minimize cross-rack/regional traffic.

F

File Storage

Hierarchical storage system using directories and files. Accessed via protocols like NFS or SMB. Ideal for shared enterprise applications or CI/CD pipelines.

fsync / durability semantics

Whether write calls are flushed to stable storage immediately—critical for databases (fsync cost vs durability).

G

Geo-Redundancy / Multi-Region Storage

Data is stored in multiple geographic regions to prevent data loss due to regional disasters. Critical for global SaaS and DR strategies. Example: AWS S3 Cross-Region Replication.

H

Hot Storage

A storage tier designed for frequently accessed data, providing high performance and low latency, suitable for active applications.

Hybrid Cloud Storage

Combines on-premises and cloud storage for latency-sensitive or regulatory-bound workloads. Example: Azure Arc-enabled storage.

Heterogeneous media tiers (HDD/SSD/NVMe/PMEM)

Multi-media architectures where media class influences latency, throughput and tiering policies.

I

Immutable Storage

Prevents accidental or malicious changes. Used for compliance logs, blockchain storage, or critical archives.

Immutable Storage / WORM

Prevents modification/deletion after writing. Used for compliance, financial, or audit data.

Inline compression (lz4, zstd)

On-the-fly compression reduces storage but adds CPU; choose algorithm as tradeoff between ratio and latency.

Inline deduplication / content-addressing

Removing duplicate blocks/objects (fingerprinting via hashes) to save capacity—requires metadata index.

J

K

Key Management Service (KMS)

Centralized management of encryption keys for enterprise-grade storage. Example: AWS KMS, Azure Key Vault.

L

Latency

The time delay between a user’s request and the data’s response, crucial for performance-sensitive applications.

Latency / IOPS / Throughput

Key performance metrics: Latency = response time; IOPS = read/write operations/sec; Throughput = data volume/sec.

Lifecycle Management

Automating the movement of data between different storage classes based on predefined rules, optimizing cost and access speed.

LSM tree / B-tree

Storage engine data structures used for indexing (LSM for write-heavy workloads; B-tree for balanced reads/writes).

M

Multi-Region Deployment

Distributing data across multiple geographic locations to enhance availability and disaster recovery capabilities.

Metadata service / metadata sharding

Dedicated service or sharded store for object/file metadata (namespaces, inodes, object indices) to scale lookups.

Multipart upload / parallel streaming

Breaking large uploads into parts for parallel transmission and resumability (S3 multipart semantics).

Metadata scaling strategies (cache + index)

Architectures combining fast in-memory caches and persistent sharded index for scalable metadata ops.

N

Nearline Storage

A storage tier that balances cost and access speed, used for data that is accessed less frequently but still requires quick retrieval.

NFS (Network File System)

A protocol that allows file access over a network, enabling a system to share directories and files with others over a network.

NVMe-oF / RDMA / SPDK

High-performance transport stacks (NVMe over Fabrics, RDMA, SPDK) to reduce latency and CPU overhead for NVMe devices.

NVMe namespaces / device partitioning

Low-level NVMe constructs used to slice devices for isolation or QoS.

Namespace & tenant isolation (cgroups, IO sched)

OS/kernel-level and service-level controls to isolate noisy tenants and enforce fairness.

O

Object Storage

A storage architecture that manages data as objects, each containing the data itself, metadata, and a unique identifier, ideal for unstructured data like media files.

Observability: storage exporters / metrics

Prometheus exporters and metrics (ops/sec, avg latency, queue depth, rebuild backlog) for monitoring.

Object lifecycle / retention locks / legal hold

Policy primitives for retention, legal holds, and immutable retention (WORM) with enforcement at store level.

P

Pay-per-Use / Consumption-Based Pricing

Billing model based on actual usage instead of provisioned capacity.

Performance Metrics (IOPS, Throughput, Latency)

Measures storage performance: IOPS for block storage, throughput for bulk transfer, latency for real-time apps.

Persistent vs Ephemeral Storage

Persistent storage retains data after system shutdown (e.g., EBS, S3), ephemeral storage is temporary and deleted with the instance (e.g., EC2 instance store).

Placement groups / placement policies

Logical groupings that control how objects/chunks are distributed across racks/hosts to avoid correlated failures.

POSIX vs object semantics

POSIX (strong namespace, byte-level updates) vs object (immutable objects with PUT/GET) — choose based on app needs.

Placement groups / PG rebalance

Placement group mechanics (e.g., Ceph PGs) for mapping objects to OSDs and how rebalancing is triggered.

Persistent memory (PMEM / SCM)

Byte-addressable storage (e.g., Intel Optane) for ultra-low latency persistence or fast metadata stores.

p95 / p99 latency SLOs & SLIs

Tail-latency metrics and objectives (percentile-based) used to drive SLOs and error budgets.

Q

Quorum

Minimum set of replicas required to accept reads/writes to guarantee consistency under failures.

QoS & throttling (IOPS, BW reservations)

Mechanisms to cap or reserve throughput and IOPS per tenant/volume to guarantee SLAs.

R

Regulatory Standards

Storage compliance with HIPAA, PCI DSS, SOC2, GDPR, etc.

Replication

The process of copying data from one location to another to enhance data availability and fault tolerance.

RESTful API

An architectural style for designing networked applications, using HTTP requests to access and use data, commonly used in cloud storage services.

RPO (Recovery Point Objective)

Maximum tolerable data loss measured in time. Defines how frequently backups or replication should occur. Example: 15-minute RPO means only 15 minutes of data can be lost.

RTO (Recovery Time Objective)

Maximum tolerable downtime before systems must be restored. Example: 2-hour RTO means systems must be recoverable within 2 hours.

Replication factor

Number of full copies maintained for data; affects durability, read throughput, and storage overhead.

Read-after-write consistency

Guarantee that a write is immediately visible to subsequent reads—important for many transactional workloads.

Reclaim policy (Retain / Delete)

Kubernetes PV reclaim behavior after PVC deletion—controls data lifecycle and accidental deletion protection.

Reconstruction bandwidth / repair rate

Network and disk bandwidth available for reconstructing lost shards—affects recovery time (RTO).

Rate limiting / token-bucket

Traffic shaping algorithm often used to enforce egress limits or burst control.

S

S3 Compatibility

The ability of a storage service to support Amazon S3’s API, enabling interoperability with S3 tools and applications.

SLA (Service Level Agreement)

Formal guarantee on storage uptime, durability, and support response. Crucial for mission-critical enterprise apps.

SMB (Server Message Block)

A network file sharing protocol that allows applications to read and write to files and request services from server programs.

Snapshot / Point-in-Time Copy

Read-only copies of storage volumes at a specific time. Used for backups, versioning, and disaster recovery.

Snapshots / Point-in-Time Copy

Read-only copies of storage at a specific time. Used for backups, rollback, or DR.

SOAP (Simple Object Access Protocol)

A protocol for exchanging structured information in the implementation of web services, used in some cloud storage APIs.

Storage as a Service (STaaS)

A cloud computing model where a service provider rents out storage resources to customers on a subscription basis.

Synchronous Replication

Data is written simultaneously to multiple locations before confirming success. Ensures zero data loss, used for mission-critical workloads.

Strong consistency / linearizability

Read-after-write guarantees where operations appear instantaneous and globally ordered.

Snapshot / clone primitives (VolumeSnapshotClass)

Block-level snapshot and cloning APIs; implementation can be hardware-assisted or software-copy-on-write.

Small-object optimization

Techniques (inline metadata, packed objects) to reduce overhead for many small objects and improve performance.

SLOs, error budgets & runbooks

Engineering-level objectives with documented remediation steps and playbooks for breaches.

S3 Select / query-in-place

Server-side querying of objects (e.g., Parquet/CSV) to reduce data egress and speed analytics.

T

Throughput

The amount of data transferred over a network in a given time period, affecting the speed of data access and transfer.

Tiered Storage

Storage architecture that automatically moves data between hot, warm, and cold tiers based on access patterns and lifecycle policies. Optimizes cost and performance.

Tombstones & garbage collection

Marking deleted objects and later reclaiming storage; GC latency impacts storage footprint and consistency.

Thin vs thick provisioning

Thin: allocate on-demand (saves capacity); Thick: allocate up-front (predictable performance).

Tracing I/O paths / request tracing

Distributed tracing of storage operations (client → metadata → data shards) to debug performance issues.

U

Ultra-Fast / HPC Storage

High-performance storage optimized for low latency and high IOPS, often backed by NVMe SSDs for AI/ML and HPC workloads.

V

Versioning

The ability to keep multiple versions of an object or file, allowing recovery of previous states and protection against accidental deletions.

VolumeBindingMode / topology awareness

Kubernetes storage topology rules (Immediate vs WaitForFirstConsumer) and node/zone-aware provisioning for locality.

W

Warm Storage

Storage tier for moderately accessed data. Balanced between cost and access speed. Example: weekly reports or infrequently queried logs.

WORM (Write-Once-Read-Many)

Storage type that prevents modification/deletion after writing. Used for compliance or financial archives.

Write amplification / read amplification

Extra physical I/O incurred for logical writes/reads due to metadata, replication, or compaction—key perf metric.

Write-back vs write-through caching

Write-back buffers writes and ack earlier (higher perf, more risk); write-through writes synchronously for safety.

X

Y

Z

Cloud Storage Glossary

Get in Touch