NUMA Architecture Glossary
Kernel-driven dynamic adjustment of memory placement.
Limits throughput scaling due to serial workload portions.
Number of API requests handled per second.
Mechanism ensuring CPU caches remain consistent across cores.
Smallest unit of data transferred between memory and cache.
Performance penalty caused by cache invalidation across NUMA nodes.
CPU mode that splits a socket into multiple NUMA domains.
Workload limited by CPU execution.
Individual execution unit within a CPU socket.
Binding processes or threads to specific CPU cores.
Kubernetes policy providing exclusive CPU allocation.
Linux mechanism restricting CPU and memory node usage.
Data movement between CPU sockets, increasing latency.
Aligning CPUs, memory, and accelerators in the same NUMA node.
Performance issue when threads modify data on the same cache line.
Memory allocated on the NUMA node of the CPU that first accesses it.
Direct CPU-GPU data path avoiding cross-node traffic.
Aligning GPU workloads with nearby CPUs and memory.
Preferred NUMA node for a process or thread, typically determined by its CPU affinity and initial memory allocation policy (e.g., first-touch).
Large memory pages that reduce TLB pressure and NUMA overhead.
Hypervisor’s ability to align vCPUs and memory with physical nodes.
High-speed link connecting NUMA nodes (e.g., UPI, Infinity Fabric).
Loss of NUMA locality after VM migration.
Memory access to RAM attached to the same NUMA node as the executing CPU.
Metric describing the proportion of local memory accesses versus remote memory accesses for a NUMA node, often exposed via perf counters and used to evaluate NUMA tuning effectiveness.
Memory reclaim that can break NUMA locality.
Distribution of memory accesses across DIMM channels for bandwidth.
Competition for memory bandwidth within or across nodes.
Hardware component managing memory access, often integrated per socket.
Degree to which a thread or process accesses memory that is local to its NUMA node; high locality implies few remote accesses and lower average memory latency.
Fixed-size unit of memory managed by the OS.
Explicit binding of memory regions to specific NUMA nodes.
Workload limited by memory latency or bandwidth.
Server with multiple CPU sockets, usually implementing NUMA.
Memory architecture where CPUs access local memory faster than memory attached to other CPUs.
Binding workloads to CPUs and memory within a specific NUMA node.
Dependency of GPU performance on CPU and memory proximity.
Ignoring NUMA topology in multi-socket systems.
System design that groups CPUs and memory into nodes to improve scalability while accepting non-uniform access latency.
Ability of software to detect and optimize for NUMA topology.
OS feature that migrates memory to improve locality.
Memory throughput available within or across NUMA nodes.
Measuring performance impact of NUMA configurations.
Guidelines for optimizing applications on NUMA systems.
Firmware settings that enable, disable, or modify NUMA behavior.
Performance penalty when VMs cross physical NUMA limits.
Performance degradation caused by excessive remote memory access.
Tools and methods to analyze NUMA behavior.
Relative cost metric representing latency between NUMA nodes.
OS-exposed matrix of NUMA distances between nodes (e.g., /sys/devices/system/node/node*/distance), used by schedulers and tooling to reason about relative latency and placement decisions.
Database performance optimization using local memory access.
NUMA optimization for high-performance computing workloads.
Critical tuning for low-latency memory access.
Importance of locality for large-model training.
Overloaded NUMA node while others remain underutilized.
Uneven memory usage across NUMA nodes.
NUMA behavior in large cloud VMs and bare-metal instances.
Managing CPU and memory locality for containerized workloads.
Policy that spreads memory pages across nodes to balance bandwidth.
Additional delay introduced by remote memory access.
Kernel reclaim of remote memory pages.
Moving memory pages between NUMA nodes at runtime.
Tracking memory locality, latency, and bandwidth.
Logical unit consisting of CPUs and directly attached memory.
Performance overhead incurred during memory relocation.
Direct mapping of physical NUMA nodes to VMs.
Latency and throughput loss due to remote memory access.
OS rule defining how memory is allocated across NUMA nodes.
Point beyond which adding cores degrades performance.
OS scheduling that keeps workloads close to their memory.
Hypervisor feature allowing VMs to span multiple NUMA nodes.
Excessive memory page migrations between nodes.
Physical and logical layout of NUMA nodes, CPUs, memory, and interconnects.
Balancing scalability, flexibility, and memory latency.
Manual optimization of CPU and memory placement.
Application designed to allocate memory close to executing CPUs.
Plugin exposing NUMA topology for accelerators.
Kubernetes configurations that respect NUMA topology.
MPI implementations optimized for NUMA locality.
Application limited by NUMA memory access patterns.
Tool for controlling NUMA placement at runtime.
Systems designed to minimize remote memory access.
Increased TLB misses due to cross-node memory access.
Application whose performance heavily depends on locality.
Application that ignores NUMA topology, often causing performance loss.
Placement of PCIe devices relative to NUMA nodes.
Memory locked to avoid migration and latency spikes.
Scheduling rules based on hardware topology.
Fixing processes to cores to maintain locality.
NUMA proximity requirements for optimal RDMA performance.
Memory access to RAM attached to a different NUMA node, incurring higher latency.
Sudden surge in remote memory access causing latency spikes.
Page fault resolved by fetching memory from another NUMA node.
Physical CPU package typically associated with one or more NUMA nodes.
Relationship between CPU sockets and attached memory/controllers.
Feature that creates smaller NUMA nodes within a single socket.
Binding threads to specific cores.
Cache storing virtual-to-physical address mappings.
Event where address translation is not found, increasing latency.
Kubernetes component aligning CPU, memory, and devices.
OS feature that automatically uses large pages.
Architecture where all CPUs access memory with equal latency.
Mapping determining NUMA locality in VMs.
Virtual NUMA topology exposed to virtual machines.
No matching data found.