fifa-world-cup-football
The Big Match Cloud OFFER
Kick off for the Big Stage with ₹20,000 in GPU credits
fifa-world-cup-footballs
fifa-world-cup-football
Kick off with ₹20,000 in Free GPU credits

On-Demand vs Reserved vs Spot GPU Pricing in India: Which Costs Less?

Jason Karlin's profile image
Jason Karlin
Last Updated: Jun 26, 2026
10 Minute Read
6 Views

A GPU priced at ₹125 per hour looks easy to budget. We multiply the rate by the expected runtime and assume we know what the workload will cost. Then the bill arrives with idle commitments, checkpoint storage, interrupted jobs, network charges, and a few notebooks that someone forgot to stop before the weekend.

The real difference between on-demand, reserved, and Spot GPU pricing in India is not simply the advertised hourly rate.

Quick Answer:

On-demand GPUs usually cost less for uncertain or short-term workloads. Reserved or committed GPUs become economical when usage stays consistently high. Spot GPUs can offer the lowest compute rate, but interruptions, reruns, and engineering overhead can reduce the saving.

The best pricing model is the one that produces the lowest cost per completed training run, inference request, rendered frame, or simulation. The lowest hourly price does not always win.

What is the Difference Between On-Demand, Reserved, and Spot GPUs?

Each pricing model makes us pay for a different kind of certainty.

Pricing modelWhat we pay forMain benefitMain risk
On-demand GPUActual running time without a long commitmentMaximum flexibilityHighest standard hourly rate
Reserved or committed GPUA fixed level of usage or spending for a defined termLower and more predictable pricingPaying for unused capacity
Spot GPUSpare capacity that the provider may reclaimLowest potential compute rateInterruptions and uncertain availability
  • On-demand pricing lets us start and stop capacity without committing to months or years of use.
  • Reserved or committed pricing gives the provider predictable revenue. In return, we receive a lower rate for making a longer commitment.
  • Spot pricing gives us temporary access to unused capacity. The provider can reclaim that GPU when the capacity is needed elsewhere.

Spot is not simply cheaper on-demand compute. It requires a workload that can pause, restart, or move without creating serious disruption.

In this article, reserved GPU pricing refers broadly to term-based discounts such as reservations, Savings Plans, committed-use discounts, and monthly commitments. These products are not identical. A lower rate also does not always include guaranteed GPU capacity.

What Maximum Discounts Do Cloud Providers Advertise?

Cloud providers advertise substantial discounts, but the maximum percentages are not direct quotes for a particular GPU or Indian region.

Provider modelProvider-wide advertised maximumMain condition
AWS Savings PlansUp to 72 percentOne-year or three-year spending commitment
AWS Spot InstancesUp to 90 percentInterruptible workloads using spare capacity
Azure Reserved VM InstancesUp to 72 percentOne-year or three-year commitment
Azure Spot Virtual MachinesUp to 90 percentPrice and availability vary by region and VM type
Google Cloud GPU commitmentsUp to 55 percentOne-year or three-year commitment
Google Cloud Spot VMsUp to 91 percentInterruptible capacity with variable availability
AceCloud Spot InstancesUp to 80 percentFlexible AI, batch, rendering, and compute workloads

The phrase ‘up to’ matters.

A 90 percent Spot discount does not mean every GPU in every Indian location will always be available at one-tenth of its on-demand price. A 72 percent commitment discount does not mean every GPU model qualifies for the maximum reduction.

A useful comparison keeps these variables consistent:

  • GPU model and memory
  • Cloud region
  • CPU and RAM allocation
  • Storage
  • Network capacity
  • Runtime
  • Operating system
  • Software stack
  • Support level
  • Tax treatment

Without that consistency, we may be comparing three different products that happen to use the same GPU name.

How do the Three GPU Cost Models Compare?

The pricing models become easier to compare when we focus on effective cost rather than advertised rates.

ModelEffective cost formula
On demandHourly rate multiplied by actual runtime, plus supporting infrastructure
Reserved or committedMonthly commitment divided by actual used hours, plus supporting infrastructure
SpotSpot rate multiplied by total runtime including reruns, plus checkpointing, restart, storage, and engineering costs

These formulas measure the cost of consumed or useful work rather than the price shown at the top of a product page.

Where Reserved GPU Pricing Reaches Break-Even?

Reserved GPU pricing becomes cheaper only when we use enough of the capacity we committed to buy.

We have listed a one-GPU NVIDIA A100 80GB configuration in our Noida data centre at ₹125 per hour or ₹90,000 per month, excluding taxes.

The same page lists starting monthly prices of:

  • ₹85,500 with a six-month term
  • ₹81,000 with a twelve-month term

The six-month price is 5 percent lower than the listed monthly rate. The twelve-month price is 10 percent lower.

The more useful calculation is the number of on-demand hours required to reach the same monthly cost.

Pricing optionMonthly costBreak-even hours at ₹125 per hourEquivalent daily usage over 30 days
Six-month term₹85,500684 hours22.8 hours per day
Twelve-month term₹81,000648 hours21.6 hours per day

The example uses a simplified 30-day month and assumes that the hourly and monthly configurations include the same resources.

If we use the GPU for only 400 hours, the listed on-demand cost is approximately ₹50,000. A twelve-month commitment still costs ₹81,000 for the month.

The committed rate is lower, but the bill is higher because unused capacity absorbs the discount.

Reserved or committed GPU pricing works best when predictable monthly usage remains above the break-even level.

Where the Cost Difference Appears?

The final cost difference usually appears in four places. These are idle commitments, interruption overhead, capacity availability, and supporting infrastructure.

1. Idle Commitment Cost

A reserved GPU produces savings only while we use enough of the capacity.

Steady training pipelines, persistent inference endpoints, and always-on virtual workstations may use a commitment efficiently.

Early experiments, irregular fine-tuning jobs, seasonal projects, and uncertain product launches may not.

A reserved GPU sitting idle is not discounted compute. It is fully paid capacity producing no useful output.

The practical comparison is therefore: Reserved monthly cost divided by actual used hours

2. Spot Interruption Cost

Spot pricing lowers the compute rate, but interruptions create work that never appears in the headline price.

AWS generally provides a two-minute Spot interruption notice. Azure Spot VMs may be evicted with 30 seconds of notice and do not receive an availability guarantee.

Checkpointing can make Spot practical, but it also creates costs.

We may pay for:

  • Time spent saving model state
  • Persistent checkpoint storage
  • Reloading models and optimiser state
  • Repeated preprocessing
  • Lost work since the previous checkpoint
  • Engineering for automatic restart
  • Monitoring and orchestration

Suppose an on-demand A100 costs ₹125 per hour.

At a 50 percent Spot discount, the listed Spot rate becomes ₹62.50 per hour. If interruptions create 20 percent additional compute work, one useful hour costs approximately ₹75.

The realized compute saving is 40 percent rather than 50 percent.

At a 30 percent Spot discount with the same 20 percent recompute overhead, one useful hour costs approximately ₹105.

The realized compute saving falls to 16 percent.

The numerical example measures only additional compute. Actual savings can be lower after checkpoint storage, data loading, monitoring, and engineering time are included.

A more complete formula is: Effective Spot cost per useful hour equals Spot rate multiplied by one plus recompute overhead, plus checkpoint storage, orchestration, and engineering cost

3. Capacity Certainty

A reserved price does not necessarily reserve a GPU.

Some products reduce the bill. Separate capacity-reservation products guarantee that infrastructure will be available.

Microsoft states that Azure Reserved VM Instances provide a pricing benefit but do not guarantee capacity. AWS offers separate Capacity Blocks for ML for customers that need scheduled access to accelerated instances.

This distinction matters when a project requires eight identical GPUs to start together.

A low-priced GPU that cannot be provisioned on time may delay training, product releases, customer delivery, or research deadlines.

For time-sensitive workloads, availability can be more valuable than the maximum advertised discount.

4. Supporting Infrastructure Cost

The GPU is only one part of the bill.

We may also pay for:

  • CPU and RAM
  • Persistent disks
  • Checkpoint storage
  • Object storage
  • Dataset transfer
  • Public IP addresses
  • Network traffic
  • Data egress
  • Kubernetes resources
  • Monitoring
  • Support
  • Software licenses
  • Taxes

Storage can continue generating charges after a Spot GPU is reclaimed. Multi-GPU training may also require faster networking and higher storage throughput.

We provide GPU clusters for Kubernetes workloads and supporting cloud infrastructure, but those resources still belong in the full estimate.

A useful summary formula is: Total GPU workload cost equals GPU compute plus supporting infrastructure plus idle time plus interruption overhead

What Changes When We Buy GPU Infrastructure in India?

GPU pricing in India depends on more than whether a workload uses on-demand, reserved, or Spot capacity.

We also need to check:

  • Whether billing is in INR or USD
  • Whether the required GPU is available in an Indian location
  • Whether taxes are excluded from the advertised rate
  • How data transfer and egress are billed
  • Whether local latency matters
  • Whether data-location requirements apply
  • What support response is included
  • Whether CPU, RAM, and storage are bundled

Our published A100 pricing is specific to the Noida data center and excludes taxes. That makes it useful for an India-based comparison, but not automatically comparable with a hyperscaler quote that bundles or separates resources differently.

INR billing can reduce foreign exchange uncertainty. It does not remove utilization risk, taxes, storage charges, or capacity constraints.

Which GPU Pricing Model Fits Each Workload?

The best starting model depends on workload predictability and tolerance for interruption.

WorkloadLikely starting modelWhy
Early experimentsOn demandRuntime and GPU requirements remain uncertain
Short fine-tuning jobsOn demand or SpotFlexible timing can make Spot economical
Hyperparameter sweepsSpotIndependent jobs can be retried
Batch inferenceSpot or mixed capacityTasks can often be queued and restarted
Production inferenceReserved with on-demand fallbackStable baseline with burst protection
Distributed trainingReserved capacity or capacity blocksDeadlines and simultaneous GPU access matter
Rendering and simulationSpotFrames and tasks are usually restartable
Persistent AI workstationsReserved or monthly commitmentUsage remains consistent
  • Spot works best when the workload can pause or restart.
  • Reserved pricing works best when the GPU remains busy.
  • On-demand pricing works best when flexibility costs less than unused commitment.

Why a Mixed GPU Strategy Often Costs Less?

Many teams can reduce cost by combining all three models.

A practical architecture may use:

  • Reserved GPUs for predictable production demand
  • On-demand GPUs for experiments and urgent bursts
  • Spot GPUs for checkpointed training, simulations, and batch work

This reduces idle commitment without making production depend entirely on interruptible capacity.

GPU selection also affects the result.

An NVIDIA L40S GPU may provide a better price-performance fit for inference, visual computing, and graphics workloads. An A100 or H100 may suit heavier training and large-memory workloads.

Useful comparison metrics include:

  • Cost per completed training run
  • Cost per million tokens
  • Cost per inference request
  • Cost per generated image
  • Cost per rendered frame
  • Cost per simulation
  • Cost per successful experiment

Cost per GPU hour is only an input. Cost per completed result is the business measure.

The clearest comparison comes from running the real workload. New AceCloud customers can use ₹20,000 in free GPU credits for up to 30 days, without a credit card, to test training, inference, utilization, and supporting infrastructure.

The Five-Question GPU Pricing Reality Test

Five questions usually reveal the true economics of GPU pricing in India.

  1. How many GPU hours will we reliably consume each month?
  2. What percentage of the workload can restart after interruption?
  3. Does the commitment include guaranteed capacity or only a lower rate?
  4. Which storage, network, CPU, support, and tax costs sit outside the GPU price?
  5. What is the cost per completed result after idle time and reruns?

A lower hourly rate matters only when it reduces the cost of useful work.

Where the Lowest GPU Cost Usually Comes From?

On-demand, reserved, and Spot GPUs solve different cost problems. On-demand protects us from uncertain usage. Reserved pricing rewards predictable demand. Spot rewards workloads that can survive interruption.

For GPU pricing in India, the largest saving rarely comes from selecting the biggest advertised discount.

It usually comes from matching the pricing model to actual utilization, accounting for interruption and infrastructure overhead, and measuring the cost of completed work.

Frequently Asked Questions

Not always. Some commitments provide a lower rate without reserving physical capacity. Guaranteed access may require a separate capacity reservation or capacity block.

No. On-demand pricing can cost less when monthly usage remains below the commitment break-even point.

Yes. Spot works best when the framework supports regular checkpoints, automatic restarts, and flexible scheduling.

Usually not as the only capacity source. Production inference commonly needs reserved or on-demand fallback.

Persistent disks, snapshots, and checkpoints may continue generating charges after the GPU instance stops.

The best interval depends on checkpoint duration, interruption frequency, model size, storage cost, and acceptable repeated work.

INR billing reduces foreign exchange uncertainty. It does not remove utilization, tax, storage, networking, or availability risk.

Not necessarily. Monthly pricing may assume continuous access or a term commitment. On-demand charges only for consumed runtime.

Not always. A term discount may apply only to eligible compute. Storage, networking, licenses, support, and taxes can remain separate.

Cost per completed training run, token, inference request, image, or rendered frame is more useful than hourly price alone.

Jason Karlin's profile image
Jason Karlin
author
Industry veteran with over 10 years of experience architecting and managing GPU-powered cloud solutions. Specializes in enabling scalable AI/ML and HPC workloads for enterprise and research applications. Former lead solutions architect for top-tier cloud providers and startups in the AI infrastructure space.

Get in Touch

Explore trends, industry updates and expert opinions to drive your business forward.

    We value your privacy and will never share your information with any third-party vendors. See Privacy Policy