Performance Bottleneck Archives

A

Admission Control

Limiting incoming traffic to avoid overload.

Apdex Score

A standardized index (0–1) that converts response time distributions into a single user-satisfaction score based on “satisfied”, “tolerating” and “frustrated” thresholds.

API Rate Limiting

Provider-imposed request throttling.

Application Bottleneck

Inefficient application logic limiting performance.

Autoscaling Lag

Delay between load increase and resource availability.

B

Backpressure

Signaling upstream systems to slow down.

Bandwidth Saturation

Network links operating at maximum throughput.

Baseline Performance

Expected normal system behavior used for comparison during troubleshooting.

Batch Size Limitation

Performance capped due to memory or latency constraints.

Blocking I/O

Threads waiting on slow I/O operations.

Burst Credit Exhaustion

Temporary performance credits being depleted.

C

Cache Hit Ratio

The proportion of requests served from cache vs backing store; low hit ratio often correlates with higher latency and backend saturation.

Cache Miss

Required data not found in cache, forcing slower access.

Cascading Failure

Failure spreading across dependent services.

Chatty Interface

Excessive small requests between services.

Checkpointing Overhead

Performance impact of frequent state persistence.

Circuit Breaker Trigger

Automated cutoff preventing overload propagation.

Cloud Quota Limit

Provider-enforced caps restricting scaling.

Cold Cache

Cache that has not yet been populated.

Cold Resource Provisioning

Delay in spinning up new cloud resources.

Cold Start

Delay when an application instance starts from zero state.

Concurrency Level

The number of in-flight requests, sessions or operations being processed at a given time; tightly linked to throughput and latency via Little’s Law.

Connection Churn

Excessive connection creation and teardown.

Connection Pool Exhaustion

No available connections for new requests.

Connection Pool Saturation

All database or service connections in use.

Control Plane Bottleneck

Kubernetes API server limiting cluster operations.

Control Plane Rate Limits

Cloud or Kubernetes control plane limiting operations.

Coordinated Omission

Measurement bias that occurs when load generators stop sending new requests while waiting on slow ones, underreporting true latency.

Cost Bottleneck

Performance constrained intentionally to control spend.

CPU Bottleneck

When CPU utilization limits system performance.

CPU Steal Time

Time when a virtual CPU waits because the hypervisor is busy.

CPU Throttling

Intentional reduction in CPU performance due to quotas or thermal limits.

CPU Throttling (CFS)

Kubernetes limiting CPU beyond allocated quota.

Critical Path

The longest dependency chain that determines minimum execution time.

D

Data Loading Bottleneck

GPU waiting on slow data ingestion.

Database Bottleneck

Database limiting application throughput or latency.

Disk Latency

Delay in completing storage operations.

Distributed Lock Bottleneck

Centralized locking limiting scalability.

Distributed Tracing

Tracking request flow across services to identify slow spans.

E

East-West Traffic Bottleneck

Congestion between internal services.

Error Rate

The percentage of requests that fail (typically 4xx/5xx or application-level failures), used to detect performance-related faults and SLO violations.

Exponential Backoff with Jitter

Retry strategy that progressively increases wait times and randomizes delays to reduce retry storms and synchronized thundering herds.

F

Fan-In Bottleneck

A scalability limit caused when many upstream services or clients depend on a single downstream service or resource.

Fan-Out Bottleneck

Latency amplified by multiple downstream calls.

Flame Graph

Visualization of execution hotspots.

Full Table Scan

Inefficient query scanning entire tables.

G

Garbage Collection (GC) Pause

Application stall caused by memory cleanup.

GC Pressure

High object allocation rate increasing GC frequency.

Golden Signals

Latency, traffic, errors, and saturation metrics.

GPU Bottleneck

GPU compute or memory limiting ML workloads.

GPU Underutilization

Idle GPU cycles due to inefficient pipelines.

H

Headroom

Available spare capacity before a system hits a bottleneck.

Heap Fragmentation

Inefficient memory layout reducing usable heap.

Heap Promotion Failure

Objects prematurely moving to old generation, causing long pauses.

High Context Switching

Excessive task switching that wastes CPU cycles.

Hot Disk

Disk receiving disproportionate I/O traffic.

Hot Key

A single key or small set of keys that receive disproportionate traffic, creating localized hotspots in caches, databases or partitions.

Hot Partition

Uneven data access concentrating load on a subset of data.

I

I/O Bottleneck

Performance limited by disk read/write operations.

Image Pull Latency

Delay caused by slow container image downloads.

Interrupt Storm

Excessive hardware or network interrupts consuming CPU.

IOPS Saturation

Storage hitting maximum operations per second.

J

Jank

UI stutter caused by backend or rendering delays.

K

Kernel Time Saturation

CPU time dominated by kernel operations instead of application work.

L

Latency

The time taken to complete a single operation or request.

Latency Budget

The maximum allowable end-to-end latency for a request, subdivided across services on a critical path for performance budgeting.

Latency Percentiles (p50/p95/p99)

Metrics describing how long a given percentage of requests take to complete, used to understand median vs tail latency behavior.

Little’s Law

Relationship between latency, throughput, and concurrency.

Load Shedding

Intentionally dropping requests to protect stability.

Lock Contention

Multiple transactions competing for database locks.

Lock Contention (OS-level)

Threads blocked waiting for mutexes or spinlocks.

M

Manual Scaling Bottleneck

Delays caused by human-driven scaling.

Memory Bottleneck

Insufficient RAM limiting workload execution.

Memory Leak

Gradual memory consumption due to unreleased objects.

Memory Thrashing

Constant swapping between RAM and disk due to pressure.

Metadata Bottleneck

File system or object store metadata becoming the limiting factor.

Microburst Traffic

Very short, intense spikes in traffic or I/O that briefly exceed capacity and cause packet loss, queue buildup or jitter.

Model Inference Latency

Delay during real-time prediction workloads.

Multi-AZ Latency

Cross-zone communication overhead.

N

N+1 Query Problem

Excessive queries due to inefficient data access patterns.

NAT Exhaustion

Port exhaustion preventing new connections.

Network Bottleneck

Performance constrained by bandwidth, latency, or packet loss.

Node Pressure

Resource exhaustion at the node level.

Noisy Neighbor (Containers)

One container consuming disproportionate resources.

North-South Traffic Bottleneck

Congestion between users and backend systems.

O

Out-of-Memory (OOM) Kill

Termination of a process or container by the OS or runtime due to memory exhaustion, often triggered by leaks or undersized limits.

Over-Provisioning

Excess resources causing inefficiency without gains.

Over-Synchronization

Excessive locking reducing parallelism.

P

Packet Loss

Dropped packets causing retransmissions and delays.

Page Fault Storm

Frequent page faults causing CPU stalls.

PCIe Bottleneck

Limited data transfer between CPU and GPU.

Perceived Latency

What users experience, not just what metrics show.

Performance Bottleneck

The component or constraint that limits overall system performance, regardless of optimization elsewhere.

Performance Profiling

Analyzing system behavior to locate bottlenecks.

Pod Resource Limits

CPU or memory caps restricting container performance.

Pod Scheduling Delay

Pods waiting due to insufficient cluster resources.

Priority Inversion

A condition where low-priority work holds a shared resource needed by high-priority work, degrading performance and responsiveness.

Process Bottleneck

Organizational delays impacting system performance.

Provisioning Bottleneck

Slow infrastructure creation blocking scale-out.

Q

Queue Depth

Number of requests waiting to be processed.

Queueing Delay

Time a request or job spends waiting in a queue (thread pool, message queue, DB connection pool) before being executed.

R

Replication Lag

Delay between primary and replica databases.

Resource Saturation

A state where a resource is fully utilized and cannot serve additional load.

Response Time

End-to-end time between a request being sent and a response being received.

Retry Amplification

Retries increasing system pressure.

Retry Storm

Excessive retries worsening load during failures.

Retry Storm Anti-Pattern

Aggressive retries worsening outages.

Run Queue Saturation

Too many runnable threads waiting for CPU time.

Runtime Warm-Up

Reduced performance before JIT or runtime optimizations stabilize.

S

Serialization Overhead

Cost of encoding or decoding data formats.

Service Bottleneck

A slow service limiting overall system throughput.

Service Discovery Latency

Delay in locating service endpoints.

Service Mesh Latency

Added latency from sidecars or proxies.

Service-Level Indicator (SLI)

A precise, measurable metric (e.g., p95 latency, success rate) used to quantify system performance or reliability.

Service-Level Objective (SLO)

The agreed target for an SLI (e.g., p95 latency < 200 ms, 99.9% of the time) that drives performance and capacity decisions.

Shared Infrastructure Bottleneck

Performance impact from shared cloud hardware.

Single-Threaded Limitation

Performance capped because workload cannot parallelize.

SLA-Induced Bottleneck

Performance capped to meet contractual guarantees.

Slow Query

Query consuming excessive time or resources.

Slow Start Penalty

TCP ramp-up delay impacting short-lived connections.

Small I/O Penalty

Performance loss from many small read/write operations.

Stop-the-World (STW) Pause

Full runtime pause halting all application threads.

Swap Thrashing

Severe degradation caused by heavy swap usage.

Synchronous Fan-Out

Serial downstream calls multiplying latency.

Synchronous Processing

Blocking operations delaying downstream execution.

T

Tail Latency

High-percentile latency (p95, p99) that often defines user experience.

TCP Head-of-Line Blocking

One lost packet delaying all following packets.

TCP Retransmission

Re-sending packets due to loss or congestion.

Thread Affinity Issues

Threads moving across cores, causing cache inefficiency.

Thread Pool Exhaustion

No threads available to handle new requests.

Thread Starvation

Situation where there are not enough runnable threads to handle incoming work, causing requests to wait indefinitely or time out.

Throughput

The amount of work a system can process per unit of time.

Throughput-Limited Storage

Bottleneck caused by bandwidth limits rather than IOPS.

Thundering Herd Problem

Many processes waking simultaneously and overwhelming systems.

Time to First Byte (TTFB)

Time until the first byte of a response is received.

Tracing Bottleneck

The slowest span in a distributed trace.

Transaction Contention

Overlapping transactions blocking progress.

U

Under-Provisioning

Insufficient resources limiting performance.

V

W

Warm Cache

State where frequently accessed data is already populated in cache, leading to lower latency and higher throughput vs a cold cache.

Write Contention

Concurrent writes blocking each other.

Write Stall

Writes blocked due to internal storage backpressure.

X

Y

Z

Performance Bottleneck Glossary