Memory Bandwidth Archives

A

AI Training Workload

Machine learning training tasks that rely heavily on memory bandwidth to process large datasets.

Arithmetic Intensity

Ratio of compute operations to memory operations in a workload.

Authenticated Encryption

Encryption approach that ensures both confidentiality and data integrity.

B

Bandwidth Amplification

Effective increase in usable bandwidth through caching or compression.

Bandwidth per Socket

Memory bandwidth available to a CPU socket in multi-socket systems.

Bandwidth Saturation

State where memory bandwidth is fully utilized and additional requests cannot be served efficiently.

Bandwidth Wall

Performance ceiling reached when memory bandwidth becomes the limiting factor.

Bandwidth-Intensive Workload

Workload that requires very high memory data transfer rates.

Bank Conflict

Performance penalty when multiple accesses target the same memory bank.

C

Cache Hierarchy

Multi-level cache structure (L1, L2, L3) used to reduce memory latency.

Cache Hit

When requested data is found in cache memory.

Cache Memory

Small, high-speed memory located close to the CPU that stores frequently used data.

Cache Miss

When requested data is not found in cache and must be retrieved from main memory.

Coalesced Memory Access

GPU memory-access pattern in which neighboring threads access addresses that can be combined into fewer memory transactions, improving effective bandwidth.

CXL Memory Expansion

Use of Compute Express Link (CXL) to attach additional memory capacity to a host while maintaining coherency semantics. (Compute Express Link -)

CXL Memory Pooling

Use of CXL fabric/switching to aggregate memory capacity into a pool that can be allocated across multiple hosts or logical devices. (Compute Express Link -)

D

Data Streaming Workload

Applications that continuously process large volumes of data from memory.

DDR Memory

Double Data Rate memory technology that transfers data twice per clock cycle.

DDR4

Fourth generation DDR memory standard widely used in servers.

DDR5

Latest DDR memory standard providing higher bandwidth and efficiency.

DRAM (Dynamic Random Access Memory)

Primary system memory used in servers and computers.

DRAM Refresh

Periodic operation required to maintain stored data in DRAM cells.

Dual-Channel Memory

Memory configuration using two parallel channels to increase bandwidth.

E

ECC Memory

Memory with error-correcting capability that can detect and correct certain bit errors, improving reliability for servers, HPC, and AI systems.

Effective Memory Bandwidth

Practical bandwidth available after accounting for system overheads and inefficiencies.

F

G

GDDR Memory

Graphics memory used in GPUs designed for high throughput data access.

Global Memory

Main GPU memory accessible by all GPU threads.

GPU Memory Bandwidth

Rate at which data moves between GPU processors and GPU memory.

H

Hardware Prefetcher

CPU unit that predicts future memory access patterns.

HBM (High Bandwidth Memory)

Advanced stacked memory technology used in GPUs and accelerators for extremely high bandwidth.

HBM2

Second-generation high bandwidth memory used in many GPU architectures.

HBM3

Latest HBM generation offering extremely high bandwidth for AI workloads.

HBM3e

Enhanced HBM3 generation that increases capacity and/or bandwidth over baseline HBM3 while keeping the stacked-memory model. (Samsung Semiconductor Global)

HBM4

Next-generation High Bandwidth Memory succeeding HBM3e, designed for higher interface width and higher bandwidth in next-generation accelerators. (NVIDIA Developer)

High Bandwidth Memory Stack

3D stacked layers of DRAM used to deliver very high memory throughput.

HPC Workload

High-performance computing tasks requiring large-scale parallel memory access.

I

Integrated Memory Controller (IMC)

Memory controller built directly into the CPU for lower latency and higher bandwidth.

Interposer

Substrate layer used to connect stacked memory and processor/accelerator dies with very wide, high-density signaling.

J

K

L

Load/Store Unit

CPU component responsible for executing memory read and write operations.

M

Memory Access Latency

Time required for memory hardware to respond to a request.

Memory Access Parallelism

Ability to process multiple memory requests simultaneously.

Memory Access Pattern

The way applications read and write data in memory.

Memory Affinity

Binding applications to specific NUMA nodes to reduce latency and increase bandwidth efficiency.

Memory Allocation

Process of assigning memory to applications or processes.

Memory Ballooning

Hypervisor technique used to reclaim memory from virtual machines.

Memory Bandwidth

The rate at which data can be transferred between system memory and the processor, typically measured in GB/s.

Memory Bandwidth Formula

Calculation used to estimate theoretical bandwidth based on bus width and clock speed.

Memory Bandwidth per Core

Portion of available memory bandwidth allocated to each CPU core.

Memory Bandwidth Utilization

Percentage of total memory bandwidth actively used by workloads.

Memory Bank

Independent DRAM section that allows parallel access to memory rows.

Memory Bank Group

Subdivision of memory banks designed to improve parallel access.

Memory Bottleneck

Performance limitation caused when memory bandwidth cannot keep up with compute demand.

Memory Bus

Hardware pathway that transfers data between the CPU and system memory.

Memory Bus Width

Number of bits that can be transferred simultaneously across the memory bus.

Memory Channel

Independent communication path between the memory controller and RAM modules that allows parallel memory access.

Memory Clock Speed

Frequency at which memory modules operate.

Memory Coalescing

GPU optimization technique that merges multiple memory accesses into fewer transactions.

Memory Contention

Situation where multiple processes compete for limited memory bandwidth.

Memory Controller

Hardware component that manages communication between the CPU and system memory.

Memory Deduplication

Identifying identical memory pages across workloads and storing only one copy.

Memory Die

Individual semiconductor layer within stacked memory.

Memory Divergence

Performance inefficiency caused when threads access unrelated memory addresses.

Memory Fabric

High-speed interconnect system connecting processors and memory modules.

Memory Fragmentation

Inefficient memory usage caused by scattered allocations.

Memory Hierarchy

Structured arrangement of storage layers including registers, cache, RAM, and persistent storage.

Memory Interleaving

Technique that distributes memory accesses across channels to increase bandwidth efficiency.

Memory Latency

The delay between requesting data from memory and receiving it.

Memory Locality

Principle describing how frequently nearby memory locations are accessed together.

Memory Optimization

Techniques used to improve memory utilization and performance.

Memory Overcommitment

Allocating more virtual memory than physically available.

Memory Overhead

Performance cost introduced by memory management and access operations.

Memory Parallelism

Running multiple memory operations simultaneously to increase throughput.

Memory Performance Monitoring

Tracking metrics to evaluate memory bandwidth usage.

Memory Pipeline

Hardware pipeline responsible for processing memory requests efficiently.

Memory Pooling

Aggregating memory resources across systems for flexible allocation.

Memory Prefetching

Technique where processors load data into cache before it is requested.

Memory Profiling

Analysis of application memory usage to identify inefficiencies.

Memory Rank

Group of memory chips accessed simultaneously by a memory controller.

Memory Request Rate

Number of memory requests generated by a processor or workload.

Memory Scalability

Ability of systems to maintain performance as memory capacity increases.

Memory Service Time

Time required for memory hardware to complete a request.

Memory Stack

Group of vertically integrated memory dies connected through high-speed interconnects.

Memory Stall

CPU idle cycles caused by waiting for memory access.

Memory Subsystem

Combined hardware components responsible for memory operations.

Memory Swapping

Moving data between RAM and disk when memory capacity is exceeded.

Memory Throughput

The effective data transfer rate achieved during real workloads accessing memory.

Memory Tiering

Using multiple memory types such as DRAM and persistent memory for efficiency.

Memory Timing

Configuration parameters controlling DRAM access speed and latency.

Memory Transaction

A single read or write request issued to memory hardware.

Memory Transfer Rate

Number of data transfers memory can perform per second.

Memory Wall

Performance limit where CPU speed improvements outpace memory bandwidth growth.

Memory-Bound Workload

Application whose performance is limited by memory bandwidth rather than CPU performance.

Memory-Level Parallelism (MLP)

Ability of a processor or accelerator to keep multiple memory operations in flight simultaneously to improve throughput and hide latency.

Multi-Channel Memory

Memory architecture using multiple channels (quad, octa, etc.) to improve parallel memory access.

N

NUMA (Non-Uniform Memory Access)

Memory architecture where access latency varies depending on which CPU socket owns the memory.

NUMA Node

Logical grouping of CPU cores and memory in NUMA systems.

O

P

Page Fault

Event where requested memory is not present in RAM and must be loaded from disk.

Peak Memory Bandwidth

Maximum theoretical data transfer rate supported by memory hardware.

Persistent Memory

Non-volatile memory technology that bridges the gap between RAM and storage.

Q

R

Random Memory Access

Access pattern where memory addresses are accessed unpredictably.

Read/Write Ratio

Proportion of read traffic versus write traffic in a workload, which can materially affect achieved memory throughput and bank/row behavior.

Roofline Model

Performance model that relates compute capability to memory bandwidth limits.

Row Buffer

DRAM structure that temporarily stores an active row of memory.

Row Conflict

Situation where a new row must replace the currently open row in DRAM.

Row Hit

Memory access to an already open DRAM row.

Row Miss

Memory access requiring activation of a new DRAM row.

S

Sequential Memory Access

Access pattern where memory addresses are read or written in order.

Shared Memory

On-chip GPU memory shared by threads within a processing block for faster access.

Single-Channel Memory

Memory configuration using one channel between CPU and RAM.

Spatial Locality

Access pattern where neighboring memory addresses are accessed frequently.

Strided Memory Access

Access pattern in which successive memory operations are separated by a fixed stride, often reducing cache efficiency and effective bandwidth when poorly aligned.

Sustained Memory Bandwidth

Bandwidth achieved during long-running workloads under real conditions.

T

Temporal Locality

Access pattern where recently accessed data is likely to be reused.

Theoretical Memory Bandwidth

Peak bandwidth calculated from transfer rate, bus width, and channel/interface count, assuming ideal conditions. (Micron Technology)

Through-Silicon Via (TSV)

Vertical electrical interconnect passing through silicon dies, used in stacked-memory technologies such as HBM. (Samsung Semiconductor Global)

TLB (Translation Lookaside Buffer)

Cache that stores recent virtual-to-physical address translations to reduce address-translation overhead.

TLB Miss

Event where a required virtual-to-physical address translation is not found in the TLB, forcing a page-table walk and increasing memory-access overhead.

U

Unified Memory

V

Virtual Memory

Memory management technique using disk storage to extend RAM capacity.

W

Warp Memory Access

Pattern in which groups of GPU threads access memory simultaneously.

X

Y

Z

Memory Bandwidth Glossary