Latency Glossary
Delay between sending an API request and receiving a response.
Delay caused by application logic and processing paths.
Latency where processing happens in the background.
Delay during identity verification.
Delay while validating permissions.
Mean response time across all requests.
Delay acceptable in non-interactive workloads.
Operation that waits for completion, increasing perceived latency.
Time required for systems or VMs to become operational.
Time required to retrieve data from CPU caches (L1/L2/L3).
Latency measured at the client boundary, including DNS, connection setup, network, and server processing, as observed by the end user or calling service.
Delay introduced when services or functions initialize after being idle.
Delay caused by CPU execution, scheduling, or contention.
Reducing latency by reusing existing network connections.
Delay incurred while distributed systems reach agreement.
Time taken by the CPU to switch between tasks.
Measurement error that hides true tail latency.
Longest dependency chain determining total response time.
Delay between cloud regions.
Time taken for database queries or transactions to complete.
Delay introduced by storage hardware such as HDDs or SSDs.
Latency between services inside a data center or cluster.
Reduced latency achieved by serving requests closer to users.
Additional latency introduced by encryption and decryption.
Total delay from request initiation to final response, including all network and processing stages.
Delay caused by forcing data to be persisted to storage.
Latency caused by geographic distance between regions.
Delay where one slow request blocks others behind it.
Reduced latency by sending multiple requests over a single connection.
Delay caused by VM scheduling on physical CPUs.
Delay associated with input/output operations.
Delay between a hardware interrupt and its handling by the CPU.
Variation in packet latency affecting real-time workloads.
Time taken for a request to travel from source to destination and receive a response.
Small delays causing disproportionately large end-to-end latency.
Component that dominates end-to-end delay.
Maximum acceptable latency allocated across system components.
Spread of latency values across requests.
Visual representation of latency distribution over time.
Masking latency using parallelism or prefetching.
Continuous tracking of response times.
Techniques used to reduce response time.
Latency value below which a given percentage of requests complete (e.g., p50, p90, p95, p99), used to characterize typical and tail behavior beyond averages.
Measuring latency contributions of individual components.
Performance degradation introduced by changes.
Contractual guarantee for maximum response time.
Target latency threshold defined for reliability.
Sudden increase in response time due to contention or failures.
Balancing latency against cost, consistency, or throughput.
Fluctuation in latency over time.
Application where small delays significantly impact performance.
Application that can handle higher response times.
Time required to select a new leader after failure.
Delay added by traffic routing and health checks.
Middle value of observed latency measurements.
Time required to access data from main memory (RAM).
Intermediate device or service that adds latency to a request path.
Delay introduced while data travels across a network.
Operation that allows processing to continue without waiting.
Latency between external users and internal services.
Additional delay when accessing memory attached to another NUMA node.
Latency incurred while waiting in NVMe submission and completion queues.
Time taken for data to travel in one direction only.
Delay caused when data must be fetched from disk into memory.
Cost or output achieved per millisecond of latency.
Overlapping execution stages to reduce total latency.
Loading data in advance to reduce perceived latency.
Time spent executing logic at any system component.
Latency caused by the physical distance data must travel.
Time required to allocate infrastructure resources.
Delay added by intermediate proxy layers.
Time a request waits in a queue before being processed.
Lower connection setup latency enabled by QUIC over UDP.
Delay before newly written data can be read consistently.
Application requiring consistently low latency.
Latency observed under production workloads.
Delay between primary and replica data synchronization.
User-perceived time to receive a response, including queuing and processing delays.
Delay incurred while scaling systems up or down.
Delay introduced by OS task scheduling decisions.
Latency measured at the server boundary (from request arrival to response send), excluding network transit, often used to isolate application and storage performance.
Response time of a backend service.
Time a system actively spends processing a request, excluding waiting time.
Reduced network latency using direct device access.
Time taken to complete a read or write operation on storage.
Delay introduced when memory pages are swapped to disk.
Latency experienced when callers wait for completion.
Artificial test used to measure baseline latency.
Latency experienced by the slowest requests.
Delay incurred during TCP connection establishment.
Initial phase where TCP gradually increases transmission rate, adding latency.
Elapsed time from a client sending a request until the first response byte is received, capturing connection setup, server think time, and initial network latency.
Elapsed time from a client sending a request until the entire response body is received, representing full end-to-end response latency for that request.
Maximum waiting time before a request is considered failed.
Time taken to establish an encrypted connection.
Time required to push data onto the network link.
Latency introduced by hypervisors or virtual machines.
Latency impact when a VM waits for CPU time due to contention.
Lower latency when execution environments are already initialized.
No matching data found.