Database Glossary
Guarantees that transactions are Atomic, keep data Consistent, are Isolated from each other, and changes are Durable.
A DR pattern where multiple regions or sites are serving traffic simultaneously with replication between them. It offers lower RPO/RTO but is more complex for conflict resolution and consistency.
A DR pattern where one environment is primary and a secondary environment stays on warm or cold standby, used only during failover in a disaster. The passive side usually has lower ongoing cost but higher RTO.
A backup taken with coordination from the database or application (e.g., flushing logs, quiescing writes) to ensure no in-flight transactions result in corruption or lengthy recovery.
A low-cost storage tier for long-term backup retention (months/years) where access is infrequent and restore times are longer, used for compliance and historic recovery.
Replication where the primary confirms the write first and ships changes to replicas afterwards, improving performance but allowing a small data-loss window.
Recording connection attempts, queries, schema changes, and permission changes for forensics and compliance reporting.
Verifying who or what is connecting to the database (users, apps, services), using passwords, IAM identities, tokens, or certificates.
Defining what an authenticated principal is allowed to do (e.g., SELECT on table A, no DELETE on table B).
Provider-managed scheduled backups of your database, retained for a configurable period, often enabling PITR.
Automatically adjusting compute capacity based on load within configured min–max limits.
Automatically growing storage as the database size approaches thresholds, often without downtime.
Applying encryption specifically to backup artifacts (snapshots, archive copies, offsite storage) to protect data if backup media or locations are compromised.
How long backups and logs are kept before deletion or archival, driven by RPO, DR, and compliance requirements.
The time period during which backups are taken (e.g., nightly 01:00–03:00), often chosen to reduce impact on peak workloads and align with compliance rules.
Running a new database version or cluster in parallel with the old one and switching traffic over when validated.
In-memory cache of recently accessed pages, used to avoid going to disk for every read.
A database that runs on cloud infrastructure and is accessed over the network, often offering elastic capacity, built-in backups, and managed availability.
A set of database nodes working together as a single logical database, often including a primary and multiple replicas.
An index that defines the physical order of rows in storage, often aligned with the primary key.
A DR strategy where infrastructure is provisioned only after a disaster is declared, using backups and templates. It is cheaper day-to-day but has higher RTO.
Managed databases that have been audited against standards like SOC 2, ISO 27001, PCI DSS, or HIPAA for regulated workloads.
Reusing established database connections rather than creating new ones per request, improving performance and resource usage.
How quickly and in what order data changes become visible to clients across replicas-strong, eventual, session, or tunable.
The managed layer that handles provisioning, configuration, patching, scaling, and monitoring of databases via APIs, UIs, and automation.
A backup that captures data as if the power were suddenly cut-storage is consistent, but in-flight operations may require normal crash recovery on restore; common with storage-level snapshots.
Storing backups in a different region from the primary to survive regional failures or meet regulatory needs.
Using encryption keys that you create and control (often via a KMS), which the DBaaS uses to encrypt your data.
Hiding or obfuscating sensitive fields (e.g., masking PANs, hashing emails) especially in non-prod or shared environments.
The actual database instances or storage nodes that hold customer data and process queries.
Software that stores and organizes data so it can be queried, updated, and managed reliably. It underpins things like user accounts, orders, logs, or telemetry.
Network-level rules that control which IPs, ranges, or networks can initiate connections to a database.
Moving a database from one environment or engine to another (on-prem to cloud, self-managed to DBaaS), often including schema conversion and data sync.
A managed tool that automates data movement and ongoing replication from source to target during migration.
A cloud service model that delivers databases as an on-demand service, abstracting hardware, install, and routine admin so teams consume a database endpoint and API rather than managing servers.
The set of capabilities to recover databases in another region or environment after a major outage, using replicas, snapshots, and backups.
A database whose data and processing are spread across multiple nodes or locations, usually for scalability, availability, or geo-distribution.
Relational databases that keep SQL and strong consistency but scale horizontally across nodes and regions (e.g., Spanner-like or modern distributed SQL engines).
A database that stores flexible, semi-structured documents (often JSON), letting each record differ in fields.
A planned exercise where teams intentionally simulate a disaster (e.g., region failure) and test the end-to-end DR plan-failover, restore, verification, and communication-to validate RPO/RTO.
A documented, step-by-step procedure specifically for disaster scenarios, detailing how to declare a disaster, fail over, validate data, communicate status, and fail back.
A shared pool of compute resources for many small databases, used so they burst as needed without each one being sized individually.
Encrypting data stored on disk and in backups so raw storage is unreadable without keys, often enabled by default in managed databases.
Encrypting traffic between clients and database endpoints using TLS/SSL to prevent eavesdropping.
The step-by-step strategy the database uses to run a query; analyzing it is core to performance tuning.
The process of returning traffic from the DR or standby environment back to the original primary environment once it is healthy, usually after validation and data resync.
The process of switching database traffic from a failed or degraded primary to a standby instance, replica, or secondary site, often automated by the platform.
A small, limited managed database offering with no or low cost, used for learning, dev, or trials.
Storing copies of data in multiple geographically separated locations or regions so that a regional disaster doesn’t cause total data loss.
A table or database automatically replicated across regions so applications can read and write locally in multiple geographies.
A database that stores entities and their relationships as nodes and edges, tuned for traversals like recommendations or fraud graphs.
Migrating to a different engine (e.g., Oracle to PostgreSQL or MySQL), requiring schema and code changes plus deeper testing.
The ability of a database to remain accessible through hardware or node failures, usually via redundancy and automatic failover.
Migrating to the same database engine in the cloud (e.g., SQL Server to managed SQL Server), usually simpler and lower risk.
Adding or removing nodes (shards, replicas) to handle more load or larger datasets.
Hybrid workloads that mix transactional and analytical queries on the same platform, often using modern distributed databases or columnstore features.
A data structure that accelerates lookups and sorting on certain columns at the cost of extra storage and write overhead.
A shared host pool for multiple managed instances, typically used to consolidate smaller workloads and reduce cost.
Paying per hour/month for a fixed-size database instance (vCPU, RAM) while it’s running, regardless of actual utilization.
A measure of how many read/write operations the storage layer can sustain, particularly important for I/O-bound workloads.
The rules that define how concurrent transactions see each other’s changes (e.g., read committed, snapshot, serializable).
A database that stores data as key–value pairs, ideal for fast lookups, sessions, and simple aggregates.
The time between sending a query and receiving a response, including network and execution time.
Mechanisms that block other operations from reading or writing certain rows or tables until a transaction completes.
A control-plane container that groups databases under a common endpoint, configuration, and security context.
A database where the provider handles infrastructure, patching, backups, and base availability; you manage schema, data, and queries. Typical examples are AWS RDS, Azure SQL, and Cloud SQL.
A managed configuration where the provider synchronously replicates data across availability zones in a region and fails over automatically on failure.
A database engine that supports more than one data model (e.g., relational + JSON documents, key–value + graph) under one service.
A database that replicates data across regions to reduce latency for global users and provide disaster recovery.
A database that serves multiple customers or teams from a shared engine, isolating them logically (schemas, DBs, row-level) instead of by instance.
Placing managed databases inside private cloud networks, reachable only from specific subnets or peered networks.
An index stored separately from the data that points back to rows, used to speed up alternative access paths.
A family of non-relational databases (key–value, document, wide-column, graph) designed for scale, flexibility, or specific workloads like time-series or recommendations.
The combination of metrics, logs, and traces that gives insight into database health, performance, and query behavior.
Workloads running large, complex, read-heavy queries for BI and reporting, often on a separate analytical engine or warehouse.
Workloads with many small, latency-sensitive reads and writes (e.g., checkout flows, banking transactions).
A strategy where transactions proceed without locks but are checked for conflicts at commit, retrying when conflicts occur.
A DR pattern where only the critical core services (like databases and minimal control plane) are always running in the DR region, and the rest of the environment is scaled up when needed.
Restoring a database to a chosen timestamp within the backup window, often by replaying transaction logs on top of a full backup.
The phase after moving to DBaaS where you tune sizing, indexes, and configuration and adopt more managed features (HA, read replicas, serverless tiers).
The main read-write node or endpoint of a database where all authoritative updates are applied.
A private IP endpoint that exposes a managed database inside your VPC/VNet, instead of a public internet address.
Default encryption keys managed entirely by the cloud provider, with minimal customer involvement.
Paying for a fixed IOPS level to guarantee storage performance, independent of instance size.
A database endpoint reachable from the internet, typically restricted by firewalls and authentication but more exposed than private endpoints.
The engine component that turns a SQL statement into an execution plan, choosing join order, indexes, and algorithms.
An asynchronously updated copy used to scale reads, run heavy reports, or serve regional traffic without hitting the primary.
Regularly performing test restores of backups into a non-production environment to ensure that backups are usable and that recovery procedures actually work.
A database that stores data in tables with rows and columns, uses schemas and constraints, and is typically queried with SQL (e.g., PostgreSQL, MySQL, SQL Server).
A copy of data kept in sync with the primary (sync or async), typically used for read-only traffic, analytics, or DR.
The time difference between data committed on the primary and visible on a replica; critical for read-after-write expectations.
Pricing based on a unit that represents the cost of reads/writes or operations (e.g., “request units”), decoupling capacity from instance size.
Lower pricing for committing to a certain instance size or spend over 1–3 years, compared to pure on-demand.
A formal policy specifying how long different classes of backups and logs are kept (daily/weekly/monthly copies), when they are archived, and when they are deleted, balancing cost, DR, and compliance.
Grouping permissions into roles (e.g., db_reader, db_owner) and assigning those roles to users or services.
Policies that restrict which rows a given user or role can see or modify, even within the same table.
The maximum acceptable amount of data loss measured in time. It answers: “How much data (in minutes/hours) can we afford to lose if we have to restore?” and is driven by backup and replication strategy.
The maximum acceptable time it should take to restore service after an outage. It answers: “How long can this database be down before it hurts the business?” and guides DR architecture and automation.
A documented, step-by-step guide for routine tasks such as failover, restore, scale-up, or upgrade.
Applying controlled changes to a database schema using tools or migration scripts, typically versioned alongside application code.
A database you install and operate yourself on VMs or bare metal (on-prem or cloud), fully owning OS, patches, HA, backup, and performance tuning.
Paying for compute used over time (e.g., per second or per request unit) instead of fixed instances, with automatic scale-to-zero or pause on idle.
Splitting data across multiple shards, each responsible for a subset of keys or tenants, to scale capacity and throughput horizontally.
A design where each node owns its own compute and storage, coordinating via the network; common in horizontally scalable and sharded databases.
A database instance dedicated to one customer or workload, used for stronger isolation or predictable performance.
The provider’s formal commitment on availability and sometimes durability (e.g., “99.99% available”), which architects use when designing HA and DR.
Features that record slow or heavy queries with stats, helping teams find bottlenecks and tune performance.
A point-in-time capture of the database or its storage volume, used for fast restores or cloning.
Charges based on GB-month for database storage plus GB-month or tiered pricing for backups and long-term snapshots.
Replication that only acknowledges a write after it is safely stored on both primary and synchronous replicas, minimizing data loss at the cost of extra latency.
How many operations a database can handle over time, such as queries per second or transactions per second.
A database specialized for timestamped data such as metrics, IoT events, or telemetry, optimized for windowed queries over time.
A group of one or more operations treated as a single unit that either fully commits or fully rolls back.
Pricing based on virtual cores and memory, sometimes separate from storage, common in modern managed SQL offerings.
Increasing or decreasing CPU, RAM, or IOPS on a single instance (e.g., moving from a medium to a large class).
A DR setup where a smaller or partially scaled-down copy of the production environment is running and kept in sync, allowing faster recovery than a cold backup but at lower cost than full active–active.
A database optimized for huge, sparse tables with flexible columns, often used for time-series, logs, and big data pipelines.
No matching data found.