On-Demand vs Reserved vs Spot GPU Pricing in India: Which Costs Less?

Jason Karlin

Last Updated: Jun 26, 2026

10 Minute Read

20 Views

On-Demand vs Reserved vs Spot GPU Pricing in India: Which Costs Less?

A GPU priced at ₹125 per hour looks easy to budget. We multiply the rate by the expected runtime and assume we know what the workload will cost. Then the bill arrives with idle commitments, checkpoint storage, interrupted jobs, network charges, and a few notebooks that someone forgot to stop before the weekend.

The real difference between on-demand, reserved, and Spot GPU pricing in India is not simply the advertised hourly rate.

Quick Answer:

On-demand GPUs usually cost less for uncertain or short-term workloads. Reserved or committed GPUs become economical when usage stays consistently high. Spot GPUs can offer the lowest compute rate, but interruptions, reruns, and engineering overhead can reduce the saving.

The best pricing model is the one that produces the lowest cost per completed training run, inference request, rendered frame, or simulation. The lowest hourly price does not always win.

What is the Difference Between On-Demand, Reserved, and Spot GPUs?

Each pricing model makes us pay for a different kind of certainty.

Pricing model	What we pay for	Main benefit	Main risk
On-demand GPU	Actual running time without a long commitment	Maximum flexibility	Highest standard hourly rate
Reserved or committed GPU	A fixed level of usage or spending for a defined term	Lower and more predictable pricing	Paying for unused capacity
Spot GPU	Spare capacity that the provider may reclaim	Lowest potential compute rate	Interruptions and uncertain availability

On-demand pricing lets us start and stop capacity without committing to months or years of use.
Reserved or committed pricing gives the provider predictable revenue. In return, we receive a lower rate for making a longer commitment.
Spot pricing gives us temporary access to unused capacity. The provider can reclaim that GPU when the capacity is needed elsewhere.

Spot is not simply cheaper on-demand compute. It requires a workload that can pause, restart, or move without creating serious disruption.

In this article, reserved GPU pricing refers broadly to term-based discounts such as reservations, Savings Plans, committed-use discounts, and monthly commitments. These products are not identical. A lower rate also does not always include guaranteed GPU capacity.

What Maximum Discounts Do Cloud Providers Advertise?

Cloud providers advertise substantial discounts, but the maximum percentages are not direct quotes for a particular GPU or Indian region.

Provider model	Provider-wide advertised maximum	Main condition
AWS Savings Plans	Up to 72 percent	One-year or three-year spending commitment
AWS Spot Instances	Up to 90 percent	Interruptible workloads using spare capacity
Azure Reserved VM Instances	Up to 72 percent	One-year or three-year commitment
Azure Spot Virtual Machines	Up to 90 percent	Price and availability vary by region and VM type
Google Cloud GPU commitments	Up to 55 percent	One-year or three-year commitment
Google Cloud Spot VMs	Up to 91 percent	Interruptible capacity with variable availability
AceCloud Spot Instances	Up to 80 percent	Flexible AI, batch, rendering, and compute workloads

The phrase ‘up to’ matters.

A 90 percent Spot discount does not mean every GPU in every Indian location will always be available at one-tenth of its on-demand price. A 72 percent commitment discount does not mean every GPU model qualifies for the maximum reduction.

A useful comparison keeps these variables consistent:

GPU model and memory
Cloud region
CPU and RAM allocation
Storage
Network capacity
Runtime
Operating system
Software stack
Support level
Tax treatment

Without that consistency, we may be comparing three different products that happen to use the same GPU name.

How do the Three GPU Cost Models Compare?

The pricing models become easier to compare when we focus on effective cost rather than advertised rates.

Model	Effective cost formula
On demand	Hourly rate multiplied by actual runtime, plus supporting infrastructure
Reserved or committed	Monthly commitment divided by actual used hours, plus supporting infrastructure
Spot	Spot rate multiplied by total runtime including reruns, plus checkpointing, restart, storage, and engineering costs

These formulas measure the cost of consumed or useful work rather than the price shown at the top of a product page.

Where Reserved GPU Pricing Reaches Break-Even?

Reserved GPU pricing becomes cheaper only when we use enough of the capacity we committed to buy.

We have listed a one-GPU NVIDIA A100 80GB configuration in our Noida data centre at ₹125 per hour or ₹90,000 per month, excluding taxes.

The same page lists starting monthly prices of:

₹85,500 with a six-month term
₹81,000 with a twelve-month term

The six-month price is 5 percent lower than the listed monthly rate. The twelve-month price is 10 percent lower.

The more useful calculation is the number of on-demand hours required to reach the same monthly cost.

Pricing option	Monthly cost	Break-even hours at ₹125 per hour	Equivalent daily usage over 30 days
Six-month term	₹85,500	684 hours	22.8 hours per day
Twelve-month term	₹81,000	648 hours	21.6 hours per day

The example uses a simplified 30-day month and assumes that the hourly and monthly configurations include the same resources.

If we use the GPU for only 400 hours, the listed on-demand cost is approximately ₹50,000. A twelve-month commitment still costs ₹81,000 for the month.

The committed rate is lower, but the bill is higher because unused capacity absorbs the discount.

Reserved or committed GPU pricing works best when predictable monthly usage remains above the break-even level.

Where the Cost Difference Appears?

The final cost difference usually appears in four places. These are idle commitments, interruption overhead, capacity availability, and supporting infrastructure.

1. Idle Commitment Cost

A reserved GPU produces savings only while we use enough of the capacity.

Steady training pipelines, persistent inference endpoints, and always-on virtual workstations may use a commitment efficiently.

Early experiments, irregular fine-tuning jobs, seasonal projects, and uncertain product launches may not.

A reserved GPU sitting idle is not discounted compute. It is fully paid capacity producing no useful output.

The practical comparison is therefore: Reserved monthly cost divided by actual used hours

2. Spot Interruption Cost

Spot pricing lowers the compute rate, but interruptions create work that never appears in the headline price.

AWS generally provides a two-minute Spot interruption notice. Azure Spot VMs may be evicted with 30 seconds of notice and do not receive an availability guarantee.

Checkpointing can make Spot practical, but it also creates costs.

We may pay for:

Time spent saving model state
Persistent checkpoint storage
Reloading models and optimiser state
Repeated preprocessing
Lost work since the previous checkpoint
Engineering for automatic restart
Monitoring and orchestration

Suppose an on-demand A100 costs ₹125 per hour.

At a 50 percent Spot discount, the listed Spot rate becomes ₹62.50 per hour. If interruptions create 20 percent additional compute work, one useful hour costs approximately ₹75.

The realized compute saving is 40 percent rather than 50 percent.

At a 30 percent Spot discount with the same 20 percent recompute overhead, one useful hour costs approximately ₹105.

The realized compute saving falls to 16 percent.

The numerical example measures only additional compute. Actual savings can be lower after checkpoint storage, data loading, monitoring, and engineering time are included.

A more complete formula is: Effective Spot cost per useful hour equals Spot rate multiplied by one plus recompute overhead, plus checkpoint storage, orchestration, and engineering cost

3. Capacity Certainty

A reserved price does not necessarily reserve a GPU.

Some products reduce the bill. Separate capacity-reservation products guarantee that infrastructure will be available.

Microsoft states that Azure Reserved VM Instances provide a pricing benefit but do not guarantee capacity. AWS offers separate Capacity Blocks for ML for customers that need scheduled access to accelerated instances.

This distinction matters when a project requires eight identical GPUs to start together.

A low-priced GPU that cannot be provisioned on time may delay training, product releases, customer delivery, or research deadlines.

For time-sensitive workloads, availability can be more valuable than the maximum advertised discount.

4. Supporting Infrastructure Cost

The GPU is only one part of the bill.

We may also pay for:

CPU and RAM
Persistent disks
Checkpoint storage
Object storage
Dataset transfer
Public IP addresses
Network traffic
Data egress
Kubernetes resources
Monitoring
Support
Software licenses
Taxes

Storage can continue generating charges after a Spot GPU is reclaimed. Multi-GPU training may also require faster networking and higher storage throughput.

We provide GPU clusters for Kubernetes workloads and supporting cloud infrastructure, but those resources still belong in the full estimate.

A useful summary formula is: Total GPU workload cost equals GPU compute plus supporting infrastructure plus idle time plus interruption overhead

What Changes When We Buy GPU Infrastructure in India?

GPU pricing in India depends on more than whether a workload uses on-demand, reserved, or Spot capacity.

We also need to check:

Whether billing is in INR or USD
Whether the required GPU is available in an Indian location
Whether taxes are excluded from the advertised rate
How data transfer and egress are billed
Whether local latency matters
Whether data-location requirements apply
What support response is included
Whether CPU, RAM, and storage are bundled

Our published A100 pricing is specific to the Noida data center and excludes taxes. That makes it useful for an India-based comparison, but not automatically comparable with a hyperscaler quote that bundles or separates resources differently.

INR billing can reduce foreign exchange uncertainty. It does not remove utilization risk, taxes, storage charges, or capacity constraints.

Which GPU Pricing Model Fits Each Workload?

The best starting model depends on workload predictability and tolerance for interruption.

Workload	Likely starting model	Why
Early experiments	On demand	Runtime and GPU requirements remain uncertain
Short fine-tuning jobs	On demand or Spot	Flexible timing can make Spot economical
Hyperparameter sweeps	Spot	Independent jobs can be retried
Batch inference	Spot or mixed capacity	Tasks can often be queued and restarted
Production inference	Reserved with on-demand fallback	Stable baseline with burst protection
Distributed training	Reserved capacity or capacity blocks	Deadlines and simultaneous GPU access matter
Rendering and simulation	Spot	Frames and tasks are usually restartable
Persistent AI workstations	Reserved or monthly commitment	Usage remains consistent

Spot works best when the workload can pause or restart.
Reserved pricing works best when the GPU remains busy.
On-demand pricing works best when flexibility costs less than unused commitment.

Why a Mixed GPU Strategy Often Costs Less?

Many teams can reduce cost by combining all three models.

A practical architecture may use:

Reserved GPUs for predictable production demand
On-demand GPUs for experiments and urgent bursts
Spot GPUs for checkpointed training, simulations, and batch work

This reduces idle commitment without making production depend entirely on interruptible capacity.

GPU selection also affects the result.

An NVIDIA L40S GPU may provide a better price-performance fit for inference, visual computing, and graphics workloads. An A100 or H100 may suit heavier training and large-memory workloads.

Useful comparison metrics include:

Cost per completed training run
Cost per million tokens
Cost per inference request
Cost per generated image
Cost per rendered frame
Cost per simulation
Cost per successful experiment

Cost per GPU hour is only an input. Cost per completed result is the business measure.

The clearest comparison comes from running the real workload. New AceCloud customers can use ₹20,000 in free GPU credits for up to 30 days, without a credit card, to test training, inference, utilization, and supporting infrastructure.

The Five-Question GPU Pricing Reality Test

Five questions usually reveal the true economics of GPU pricing in India.

How many GPU hours will we reliably consume each month?
What percentage of the workload can restart after interruption?
Does the commitment include guaranteed capacity or only a lower rate?
Which storage, network, CPU, support, and tax costs sit outside the GPU price?
What is the cost per completed result after idle time and reruns?

A lower hourly rate matters only when it reduces the cost of useful work.

Where the Lowest GPU Cost Usually Comes From?

On-demand, reserved, and Spot GPUs solve different cost problems. On-demand protects us from uncertain usage. Reserved pricing rewards predictable demand. Spot rewards workloads that can survive interruption.

For GPU pricing in India, the largest saving rarely comes from selecting the biggest advertised discount.

It usually comes from matching the pricing model to actual utilization, accounting for interruption and infrastructure overhead, and measuring the cost of completed work.

Frequently Asked Questions

Does Reserved GPU Pricing Guarantee Capacity?

Not always. Some commitments provide a lower rate without reserving physical capacity. Guaranteed access may require a separate capacity reservation or capacity block.

Are Reserved GPUs Always Cheaper Than On-Demand GPUs?

No. On-demand pricing can cost less when monthly usage remains below the commitment break-even point.

Can Spot GPUs Be Used for LLM Training?

Yes. Spot works best when the framework supports regular checkpoints, automatic restarts, and flexible scheduling.

Are Spot GPUs Suitable for Real-Time Inference?

Usually not as the only capacity source. Production inference commonly needs reserved or on-demand fallback.

What Happens to Storage When a Spot GPU Stops?

Persistent disks, snapshots, and checkpoints may continue generating charges after the GPU instance stops.

How Often Should We Checkpoint a Spot Training Job?

The best interval depends on checkpoint duration, interruption frequency, model size, storage cost, and acceptable repeated work.

Does INR Billing Remove GPU Pricing Risk?

INR billing reduces foreign exchange uncertainty. It does not remove utilization, tax, storage, networking, or availability risk.

Is Monthly GPU Pricing the Same as On-Demand Pricing?

Not necessarily. Monthly pricing may assume continuous access or a term commitment. On-demand charges only for consumed runtime.

Do Reserved GPU Prices Include Storage, CPU, RAM, and Networking?

Not always. A term discount may apply only to eligible compute. Storage, networking, licenses, support, and taxes can remain separate.

What Is the Best Metric for Comparing GPU Cost?

Cost per completed training run, token, inference request, image, or rendered frame is more useful than hourly price alone.

Jason Karlin

author

Industry veteran with over 10 years of experience architecting and managing GPU-powered cloud solutions. Specializes in enabling scalable AI/ML and HPC workloads for enterprise and research applications. Former lead solutions architect for top-tier cloud providers and startups in the AI infrastructure space.