NUMA Architecture Archives

A

Automatic NUMA Balancing

Kernel-driven dynamic adjustment of memory placement.

Amdahl’s Law

Limits throughput scaling due to serial workload portions.

API Throughput

Number of API requests handled per second.

B

C

Cache Coherency

Mechanism ensuring CPU caches remain consistent across cores.

Cache Line

Smallest unit of data transferred between memory and cache.

Cache Line Bounce

Performance penalty caused by cache invalidation across NUMA nodes.

Cluster-on-Die (COD)

CPU mode that splits a socket into multiple NUMA domains.

Compute-Bound Workload

Workload limited by CPU execution.

Core

Individual execution unit within a CPU socket.

CPU Affinity

Binding processes or threads to specific CPU cores.

CPU Manager Static Policy

Kubernetes policy providing exclusive CPU allocation.

cpuset

Linux mechanism restricting CPU and memory node usage.

Cross-Socket Traffic

Data movement between CPU sockets, increasing latency.

D

Device Locality

Aligning CPUs, memory, and accelerators in the same NUMA node.

E

F

False Sharing

Performance issue when threads modify data on the same cache line.

First-Touch Policy

Memory allocated on the NUMA node of the CPU that first accesses it.

G

GPU Direct NUMA Path

Direct CPU-GPU data path avoiding cross-node traffic.

GPU NUMA Affinity

Aligning GPU workloads with nearby CPUs and memory.

H

Home NUMA Node

Preferred NUMA node for a process or thread, typically determined by its CPU affinity and initial memory allocation policy (e.g., first-touch).

Huge Pages

Large memory pages that reduce TLB pressure and NUMA overhead.

Hypervisor NUMA Awareness

Hypervisor’s ability to align vCPUs and memory with physical nodes.

I

Interconnect

High-speed link connecting NUMA nodes (e.g., UPI, Infinity Fabric).

J

K

L

Live Migration NUMA Impact

Loss of NUMA locality after VM migration.

Local Memory Access

Memory access to RAM attached to the same NUMA node as the executing CPU.

Local vs Remote Access Ratio

Metric describing the proportion of local memory accesses versus remote memory accesses for a NUMA node, often exposed via perf counters and used to evaluate NUMA tuning effectiveness.

M

Memory Ballooning (NUMA Impact)

Memory reclaim that can break NUMA locality.

Memory Channel Interleaving

Distribution of memory accesses across DIMM channels for bandwidth.

Memory Contention

Competition for memory bandwidth within or across nodes.

Memory Controller

Hardware component managing memory access, often integrated per socket.

Memory Locality

Degree to which a thread or process accesses memory that is local to its NUMA node; high locality implies few remote accesses and lower average memory latency.

Memory Page

Fixed-size unit of memory managed by the OS.

Memory Policy (mbind)

Explicit binding of memory regions to specific NUMA nodes.

Memory-Bound Workload

Workload limited by memory latency or bandwidth.

Multi-Socket System

Server with multiple CPU sockets, usually implementing NUMA.

N

NUMA (Non-Uniform Memory Access)

Memory architecture where CPUs access local memory faster than memory attached to other CPUs.

NUMA Affinity

Binding workloads to CPUs and memory within a specific NUMA node.

NUMA and GPUs

Dependency of GPU performance on CPU and memory proximity.

NUMA Anti-Pattern

Ignoring NUMA topology in multi-socket systems.

NUMA Architecture

System design that groups CPUs and memory into nodes to improve scalability while accepting non-uniform access latency.

NUMA Awareness

Ability of software to detect and optimize for NUMA topology.

NUMA Balancing

OS feature that migrates memory to improve locality.

NUMA Bandwidth

Memory throughput available within or across NUMA nodes.

NUMA Benchmarking

Measuring performance impact of NUMA configurations.

NUMA Best Practices

Guidelines for optimizing applications on NUMA systems.

NUMA BIOS Configuration

Firmware settings that enable, disable, or modify NUMA behavior.

NUMA Boundary Crossing

Performance penalty when VMs cross physical NUMA limits.

NUMA Contention

Performance degradation caused by excessive remote memory access.

NUMA Diagnostics

Tools and methods to analyze NUMA behavior.

NUMA Distance

Relative cost metric representing latency between NUMA nodes.

NUMA Distance Matrix

OS-exposed matrix of NUMA distances between nodes (e.g., /sys/devices/system/node/node*/distance), used by schedulers and tooling to reason about relative latency and placement decisions.

NUMA for Databases

Database performance optimization using local memory access.

NUMA for HPC

NUMA optimization for high-performance computing workloads.

NUMA for In-Memory Databases

Critical tuning for low-latency memory access.

NUMA for ML Training

Importance of locality for large-model training.

NUMA Hotspot

Overloaded NUMA node while others remain underutilized.

NUMA Imbalance

Uneven memory usage across NUMA nodes.

NUMA in Cloud Instances

NUMA behavior in large cloud VMs and bare-metal instances.

NUMA in Containers

Managing CPU and memory locality for containerized workloads.

NUMA Interleaving

Policy that spreads memory pages across nodes to balance bandwidth.

NUMA Latency

Additional delay introduced by remote memory access.

NUMA Memory Reclaim

Kernel reclaim of remote memory pages.

NUMA Migration

Moving memory pages between NUMA nodes at runtime.

NUMA Monitoring

Tracking memory locality, latency, and bandwidth.

NUMA Node

Logical unit consisting of CPUs and directly attached memory.

NUMA Page Migration Cost

Performance overhead incurred during memory relocation.

NUMA Passthrough

Direct mapping of physical NUMA nodes to VMs.

NUMA Performance Penalty

Latency and throughput loss due to remote memory access.

NUMA Policy

OS rule defining how memory is allocated across NUMA nodes.

NUMA Scaling Ceiling

Point beyond which adding cores degrades performance.

NUMA Scheduling

OS scheduling that keeps workloads close to their memory.

NUMA Spanning

Hypervisor feature allowing VMs to span multiple NUMA nodes.

NUMA Thrashing

Excessive memory page migrations between nodes.

NUMA Topology

Physical and logical layout of NUMA nodes, CPUs, memory, and interconnects.

NUMA Trade-off

Balancing scalability, flexibility, and memory latency.

NUMA Tuning

Manual optimization of CPU and memory placement.

NUMA-Aware Application

Application designed to allocate memory close to executing CPUs.

NUMA-Aware Device Plugin

Plugin exposing NUMA topology for accelerators.

NUMA-Aware Kubernetes

Kubernetes configurations that respect NUMA topology.

NUMA-Aware MPI

MPI implementations optimized for NUMA locality.

NUMA-Bound Workload

Application limited by NUMA memory access patterns.

numactl

Tool for controlling NUMA placement at runtime.

NUMA-Optimized Infrastructure

Systems designed to minimize remote memory access.

NUMA-Related TLB Pressure

Increased TLB misses due to cross-node memory access.

NUMA-Sensitive Workload

Application whose performance heavily depends on locality.

NUMA-Unaware Application

Application that ignores NUMA topology, often causing performance loss.

O

P

PCIe Locality

Placement of PCIe devices relative to NUMA nodes.

Pinned Memory

Memory locked to avoid migration and latency spikes.

Pod Topology Constraints

Scheduling rules based on hardware topology.

Process Pinning

Fixing processes to cores to maintain locality.

Q

R

RDMA Locality

NUMA proximity requirements for optimal RDMA performance.

Remote Memory Access

Memory access to RAM attached to a different NUMA node, incurring higher latency.

Remote Memory Storm

Sudden surge in remote memory access causing latency spikes.

Remote Page Fault

Page fault resolved by fetching memory from another NUMA node.

S

Socket

Physical CPU package typically associated with one or more NUMA nodes.

Socket Locality

Relationship between CPU sockets and attached memory/controllers.

Sub-NUMA Clustering (SNC)

Feature that creates smaller NUMA nodes within a single socket.

T

Thread Pinning

Binding threads to specific cores.

TLB (Translation Lookaside Buffer)

Cache storing virtual-to-physical address mappings.

TLB Miss

Event where address translation is not found, increasing latency.

Topology Manager

Kubernetes component aligning CPU, memory, and devices.

Transparent Huge Pages (THP)

OS feature that automatically uses large pages.

U

Uniform Memory Access (UMA)

Architecture where all CPUs access memory with equal latency.

V

vCPU to pCPU Mapping

Mapping determining NUMA locality in VMs.

vNUMA

Virtual NUMA topology exposed to virtual machines.

W

X

Y

Z

NUMA Architecture Glossary