Indeed, CSI snapshots are storage-level, point-in-time copies exposed through Kubernetes APIs. They only work for PersistentVolumes provisioned by CSI drivers; in-tree volume plugins cannot use the CSI snapshot APIs.
But reliable recovery also needs workload consistency and repeatable restore testing. As per CNCF’s annual survey, Kubernetes production use reached 80% in 2024, which means snapshot mistakes now affect many real customer workloads.
Therefore, before you automate anything, you should inventory your StatefulSets and map every PVC that must be protected together per application boundary. Additionally, you should record the owner, criticality tier and data dependency order for each StatefulSet because restore steps usually follow those relationships.
What is a CSI Volume Snapshot, and how is it Different from a Backup?
A CSI snapshot gives you a consistent API surface for asking storage systems to capture volume content at a specific moment. In Kubernetes, a VolumeSnapshot is a namespaced CRD that represents a request for a snapshot of a CSI-backed PersistentVolume. The actual backend snapshot reference lives in a cluster-scoped VolumeSnapshotContent object, and the snapshot API standardizes the request shape, not the storage implementation.
However, snapshots are not full backups because they usually lack portability guarantees, long-term retention workflows and immutability governance outside the storage backend.
Additionally, most snapshot implementations only cover the volume blocks, while application objects, secrets and cluster state require separate protection and restore plans. Therefore, you should treat snapshots as one building block in a recovery strategy that also includes metadata backup, access control and routine restore drills.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: pgdata-snap-2025-12-15
namespace: myapp
spec:
volumeSnapshotClassName: csi-snapshots-retain
source:
persistentVolumeClaimName: pgdata
Uptime Institute reports 54% of operators said their most recent significant outage costs more than $100,000, which raises the stakes for recoverability decisions.
Action step: Write down per-application RPO and RTO targets, then tie snapshot frequency and restore procedures directly to those targets.
How CSI Snapshots Work in Kubernetes?
CSI snapshotting works well when you understand the Kubernetes objects involved and the controller sidecars that reconcile them.
- VolumeSnapshot is the namespaced request
- VolumeSnapshotContent is the cluster-scoped backing record
- VolumeSnapshotClass defines the administrator policy
Kubernetes documents that these snapshot API objects are CRDs rather than core APIs, which means your cluster must include the snapshot CRDs and controllers.
Additionally, snapshot support is only available for CSI drivers and the snapshot controller plus csi-snapshotter sidecar drive CreateSnapshot and DeleteSnapshot calls.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapshots-retain
annotations:
snapshot.storage.kubernetes.io/is-default-class: “true”
driver:
deletionPolicy: Retain
parameters:
# Driver-specific parameters go here
# e.g. snapshotType: “crash-consistent”
The CNCF Annual Survey reports Helm is preferred by 75% of respondents, which matters because snapshot CRDs and controllers are often installed through packaged deployments.
Pro Tip: List your CSI drivers per cluster and confirm snapshot support, then document the matching VolumeSnapshotClass for each driver.
What are Snapshot Lifecycle States like readyToUse and restoreSize?
Snapshot creation is asynchronous and you should gate every restore and promotion workflow on observable status fields. A snapshot request can exist before the backend snapshot is actually created, which is why you should monitor status before relying on the snapshot for recovery.
The readyToUse field indicates whether the snapshot is ready for restoring, while restoreSize represents the complete size reported by the snapshotter. Additionally, you should capture any reported errors in your runbook because failed snapshots often look like success until you check bound content and events.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: pgdata-snap-2025-12-15
namespace: myapp
spec:
volumeSnapshotClassName: csi-snapshots-retain
source:
persistentVolumeClaimName: pgdata
status:
readyToUse: true
creationTime: “2025-12-15T10:12:35Z”
restoreSize: 20Gi
boundVolumeSnapshotContentName: snapcontent-1234abcd
The CNCF Annual Survey reports 71% of organizations check in code multiple times per day, which increases the chance snapshots occur during changes and migrations.
Action Step: Add a runbook rule that no restore proceeds unless readyToUse is true and the bound content exists.
How CSI Snapshots Provide Consistency for StatefulSets and Databases?
CSI snapshots usually reflect what the storage system captured and you should plan for crash consistency unless you actively coordinate database I/O.
Crash-consistent vs Application-consistent
Many storage systems aim for crash-consistent snapshots, which means data resembles an abrupt power loss rather than a clean application checkpoint.
Kubernetes does not automatically pause your database, which means you should quiesce, flush or checkpoint using database-native commands before taking snapshots.
Additionally, you should document post-restore validation queries because crash-consistent recovery can succeed technically while leaving application-level corruption undetected.
Use a short-lived Job to run a safe checkpoint or flush command before snapshot creation, since each database family requires different steps. In practice, you should trigger the VolumeSnapshot only after this Job has completed successfully (for example, via an operator or CI pipeline), otherwise the snapshot may be taken before the checkpoint reaches disk.
apiVersion: batch/v1
kind: Job
metadata:
name: postgres-checkpoint
namespace: myapp
spec:
template:
spec:
restartPolicy: Never
containers:
– name: psql
image: postgres:16
env:
– name: PGHOST
value: postgres.myapp.svc.cluster.local
– name: PGUSER
valueFrom:
secretKeyRef:
name: pg-secret
key: username
– name: PGPASSWORD
valueFrom:
secretKeyRef:
name: pg-secret
key: password
command: [“bash”,”-lc”]
args:
– |
psql -c “CHECKPOINT;”
Veeam reports roughly seven out of ten organizations experienced a cyber-attack and among those attacked only 10% recovered more than 90% of their data.
Pro Tip: For each database, document the quiesce command and at least one validation query that proves business tables and indexes are usable.
How to Snapshot Multi-PVC Apps Safely with VolumeGroupSnapshot?
Multi-PVC applications need coordinated recovery points and group snapshots can reduce the risk of cross-volume inconsistency for stateful designs.
Independent per-PVC snapshots can capture different write orders, which is risky when you split data, WAL and logs across multiple volumes.
Additionally, an operator can restore “the right snapshot” for one PVC and still boot a broken system because related PVCs may be from different moments.
Introduce Group Snapshots
Kubernetes v1.32 moved volume group snapshots to beta and the design uses a label selector to group multiple PVCs for snapshotting.
However, you should confirm driver support because group snapshots are only supported for CSI volume drivers that implement the group snapshot extension APIs; having the CRDs installed is not sufficient on its own.
Label your PVCs with a shared selector that represents an application-consistency group.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pgdata
namespace: myapp
labels:
snapshot-group: myapp-db
spec:
# …
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshotClass
metadata:
name: csi-groupsnap-retain
driver:
deletionPolicy: Retain
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshot
metadata:
name: myapp-db-groupsnap-2025-12-15
namespace: myapp
spec:
volumeGroupSnapshotClassName: csi-groupsnap-retain
source:
selector:
matchLabels:
snapshot-group: myapp-db
The CNCF Annual Survey reports that 60% of organizations use CI/CD for most or all applications, which increases the value of coordinated snapshot automation.
Pro Tip: Identify every StatefulSet with more than one PVC, then decide whether it needs group snapshots based on recovery dependencies.
How to Restore from Snapshots and Prove Recovery Works?
Restoring from snapshots is only trustworthy when you can repeat it, validate it and measure the time required under realistic constraints.
Restore mechanics
Kubernetes restore typically means creating a new PVC that references a VolumeSnapshot through spec.dataSource, then mounting it into verification workloads.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pgdata-restore
namespace: myapp
spec:
storageClassName:
dataSource:
name: pgdata-snap-2025-12-15
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 20Gi
You should verify the restored data before promotion, since validation isolates storage restore issues from application rollout mistakes during incidents.
apiVersion: v1
kind: Pod
metadata:
name: restore-verify
namespace: myapp
spec:
restartPolicy: Never
containers:
– name: verify
image: busybox:1.36
command: [“sh”,”-c”,”ls -lah /data && sleep 3600″]
volumeMounts:
– name: data
mountPath: /data
volumes:
– name: data
persistentVolumeClaim:
claimName: pgdata-restore
Uptime Institute reports four in five respondents said their most recent serious outage could have been prevented with better management, processes and configuration.
Action Tip: Schedule restore drills, then track time-to-restore and validation outcomes per tier because those metrics drive practical improvements.
What Policies Keep Snapshots Safe, Cheap and Compliant?
Snapshot safety depends on policy, because the same API can create recoverable history or delete the only usable restore point.
Keep deletion and retention intentional
deletionPolicy controls whether backend snapshots are preserved when Kubernetes snapshot objects are deleted, which directly affects retention and incident survivability.
Additionally, you should scope permissions tightly because snapshot create, delete and restore operations can expose sensitive data or destroy recovery options.
apiVersion: v1
kind: ServiceAccount
metadata:
name: snapshot-operator
namespace: myapp
—
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: snapshot-operator
namespace: myapp
rules:
– apiGroups: [“snapshot.storage.k8s.io”]
resources: [“volumesnapshots”]
verbs: [“create”,”get”,”list”,”watch”,”delete”]
– apiGroups: [“groupsnapshot.storage.k8s.io”]
resources: [“volumegroupsnapshots”]
verbs: [“create”,”get”,”list”,”watch”,”delete”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: snapshot-operator
namespace: myapp
subjects:
– kind: ServiceAccount
name: snapshot-operator
roleRef:
kind: Role
name: snapshot-operator
apiGroup: rbac.authorization.k8s.io
The Uptime Institute reports that 54% of significant outages exceed $100,000, which supports least-privilege controls and audit trails for snapshot operations.
Action Step: Lock down snapshot create, delete and restore permissions per namespace, then audit usage events as part of your regular resilience review.
Key Takeaways
CSI snapshots provide a standardized API, yet recovery reliability comes from consistency planning, permissions hygiene and frequent restore drills. If an application spans multiple volumes, you should prefer group snapshots when supported by your CSI driver and cluster version.
Need more information related to snapshot storage usage for Kubernetes? Simply connect with our friendly cloud experts using your free consultation and ask all your burning questions. Together, we’ll make cloud computing easy to understand for you and your team!
Frequently Asked Questions
It is a namespaced request for a point-in-time volume snapshot and it requires a Bound CSI-backed PVC on a StorageClass whose CSI driver supports snapshots.
VolumeSnapshot is the user request, while VolumeSnapshotContent is the cluster-scoped record that binds to the underlying snapshot.
You need snapshot CRDs plus the snapshot controller and the CSI driver must ship the csi-snapshotter sidecar integration.
They are often crash-consistent. Therefore, you should run DB-native flush or checkpoint steps and validate after restore with queries.
You create a new PVC from the snapshot using spec.dataSource, then mount it into a verification Pod or Job.
You should use it when multiple PVCs need one recovery point and Kubernetes groups claims using a label selector.