Memory Bandwidth Glossary
Machine learning training tasks that rely heavily on memory bandwidth to process large datasets.
Ratio of compute operations to memory operations in a workload.
Encryption approach that ensures both confidentiality and data integrity.
Effective increase in usable bandwidth through caching or compression.
Memory bandwidth available to a CPU socket in multi-socket systems.
State where memory bandwidth is fully utilized and additional requests cannot be served efficiently.
Performance ceiling reached when memory bandwidth becomes the limiting factor.
Workload that requires very high memory data transfer rates.
Performance penalty when multiple accesses target the same memory bank.
Multi-level cache structure (L1, L2, L3) used to reduce memory latency.
When requested data is found in cache memory.
Small, high-speed memory located close to the CPU that stores frequently used data.
When requested data is not found in cache and must be retrieved from main memory.
GPU memory-access pattern in which neighboring threads access addresses that can be combined into fewer memory transactions, improving effective bandwidth.
Use of Compute Express Link (CXL) to attach additional memory capacity to a host while maintaining coherency semantics. (Compute Express Link -)
Use of CXL fabric/switching to aggregate memory capacity into a pool that can be allocated across multiple hosts or logical devices. (Compute Express Link -)
Applications that continuously process large volumes of data from memory.
Double Data Rate memory technology that transfers data twice per clock cycle.
Fourth generation DDR memory standard widely used in servers.
Latest DDR memory standard providing higher bandwidth and efficiency.
Primary system memory used in servers and computers.
Periodic operation required to maintain stored data in DRAM cells.
Memory configuration using two parallel channels to increase bandwidth.
Memory with error-correcting capability that can detect and correct certain bit errors, improving reliability for servers, HPC, and AI systems.
Practical bandwidth available after accounting for system overheads and inefficiencies.
Graphics memory used in GPUs designed for high throughput data access.
Main GPU memory accessible by all GPU threads.
Rate at which data moves between GPU processors and GPU memory.
CPU unit that predicts future memory access patterns.
Advanced stacked memory technology used in GPUs and accelerators for extremely high bandwidth.
Second-generation high bandwidth memory used in many GPU architectures.
Latest HBM generation offering extremely high bandwidth for AI workloads.
Enhanced HBM3 generation that increases capacity and/or bandwidth over baseline HBM3 while keeping the stacked-memory model. (Samsung Semiconductor Global)
Next-generation High Bandwidth Memory succeeding HBM3e, designed for higher interface width and higher bandwidth in next-generation accelerators. (NVIDIA Developer)
3D stacked layers of DRAM used to deliver very high memory throughput.
High-performance computing tasks requiring large-scale parallel memory access.
Memory controller built directly into the CPU for lower latency and higher bandwidth.
Substrate layer used to connect stacked memory and processor/accelerator dies with very wide, high-density signaling.
CPU component responsible for executing memory read and write operations.
Time required for memory hardware to respond to a request.
Ability to process multiple memory requests simultaneously.
The way applications read and write data in memory.
Binding applications to specific NUMA nodes to reduce latency and increase bandwidth efficiency.
Process of assigning memory to applications or processes.
Hypervisor technique used to reclaim memory from virtual machines.
The rate at which data can be transferred between system memory and the processor, typically measured in GB/s.
Calculation used to estimate theoretical bandwidth based on bus width and clock speed.
Portion of available memory bandwidth allocated to each CPU core.
Percentage of total memory bandwidth actively used by workloads.
Independent DRAM section that allows parallel access to memory rows.
Subdivision of memory banks designed to improve parallel access.
Performance limitation caused when memory bandwidth cannot keep up with compute demand.
Hardware pathway that transfers data between the CPU and system memory.
Number of bits that can be transferred simultaneously across the memory bus.
Independent communication path between the memory controller and RAM modules that allows parallel memory access.
Frequency at which memory modules operate.
GPU optimization technique that merges multiple memory accesses into fewer transactions.
Situation where multiple processes compete for limited memory bandwidth.
Hardware component that manages communication between the CPU and system memory.
Identifying identical memory pages across workloads and storing only one copy.
Individual semiconductor layer within stacked memory.
Performance inefficiency caused when threads access unrelated memory addresses.
High-speed interconnect system connecting processors and memory modules.
Inefficient memory usage caused by scattered allocations.
Structured arrangement of storage layers including registers, cache, RAM, and persistent storage.
Technique that distributes memory accesses across channels to increase bandwidth efficiency.
The delay between requesting data from memory and receiving it.
Principle describing how frequently nearby memory locations are accessed together.
Techniques used to improve memory utilization and performance.
Allocating more virtual memory than physically available.
Performance cost introduced by memory management and access operations.
Running multiple memory operations simultaneously to increase throughput.
Tracking metrics to evaluate memory bandwidth usage.
Hardware pipeline responsible for processing memory requests efficiently.
Aggregating memory resources across systems for flexible allocation.
Technique where processors load data into cache before it is requested.
Analysis of application memory usage to identify inefficiencies.
Group of memory chips accessed simultaneously by a memory controller.
Number of memory requests generated by a processor or workload.
Ability of systems to maintain performance as memory capacity increases.
Time required for memory hardware to complete a request.
Group of vertically integrated memory dies connected through high-speed interconnects.
CPU idle cycles caused by waiting for memory access.
Combined hardware components responsible for memory operations.
Moving data between RAM and disk when memory capacity is exceeded.
The effective data transfer rate achieved during real workloads accessing memory.
Using multiple memory types such as DRAM and persistent memory for efficiency.
Configuration parameters controlling DRAM access speed and latency.
A single read or write request issued to memory hardware.
Number of data transfers memory can perform per second.
Performance limit where CPU speed improvements outpace memory bandwidth growth.
Application whose performance is limited by memory bandwidth rather than CPU performance.
Ability of a processor or accelerator to keep multiple memory operations in flight simultaneously to improve throughput and hide latency.
Memory architecture using multiple channels (quad, octa, etc.) to improve parallel memory access.
Memory architecture where access latency varies depending on which CPU socket owns the memory.
Logical grouping of CPU cores and memory in NUMA systems.
Event where requested memory is not present in RAM and must be loaded from disk.
Maximum theoretical data transfer rate supported by memory hardware.
Non-volatile memory technology that bridges the gap between RAM and storage.
Access pattern where memory addresses are accessed unpredictably.
Proportion of read traffic versus write traffic in a workload, which can materially affect achieved memory throughput and bank/row behavior.
Performance model that relates compute capability to memory bandwidth limits.
DRAM structure that temporarily stores an active row of memory.
Situation where a new row must replace the currently open row in DRAM.
Memory access to an already open DRAM row.
Memory access requiring activation of a new DRAM row.
Access pattern where memory addresses are read or written in order.
On-chip GPU memory shared by threads within a processing block for faster access.
Memory configuration using one channel between CPU and RAM.
Access pattern where neighboring memory addresses are accessed frequently.
Access pattern in which successive memory operations are separated by a fixed stride, often reducing cache efficiency and effective bandwidth when poorly aligned.
Bandwidth achieved during long-running workloads under real conditions.
Access pattern where recently accessed data is likely to be reused.
Peak bandwidth calculated from transfer rate, bus width, and channel/interface count, assuming ideal conditions. (Micron Technology)
Vertical electrical interconnect passing through silicon dies, used in stacked-memory technologies such as HBM. (Samsung Semiconductor Global)
Cache that stores recent virtual-to-physical address translations to reduce address-translation overhead.
Event where a required virtual-to-physical address translation is not found in the TLB, forcing a page-table walk and increasing memory-access overhead.
Unified Memory
Memory management technique using disk storage to extend RAM capacity.
Pattern in which groups of GPU threads access memory simultaneously.
No matching data found.