GPU cost calculator.

Drag the slider or pick a model. We'll show how much VRAM the workload needs, how many GPUs that maps to, and what the cloud bill looks like.

Pick a domain
Team size5people× 2.6
1255075100

Scales costs for parallel dev, staging, and experimentation environments. Assumes sub-linear growth (team^0.6) — shared infrastructure absorbs some of the headcount.

Model size70B params
10M100M1B10B100B1T
Workload

Full-precision serving. The default for most production stacks. ~2 GB / B params.

GPUhourly rate*

* On-demand cloud rates as of 2026 — actual prices vary by provider, region, and reservation term.

Pick a model

How the math works

VRAM = params × bytes-per-param × 1.25 (the 1.25 is overhead for activations, KV cache, and fragmentation). Bytes-per-param: 2 FP16 inference, 1 INT8, 0.5 INT4, ~1.2 QLoRA, ~2.5 LoRA, ~16 full fine-tuning, ~20 pretraining.

Non-compute costs are modeled as a fraction of compute. Inference: storage ~2%, network ~10%, IO ~1%. LoRA/QLoRA: storage ~5%, network ~3%, IO ~2%. Full fine-tuning: storage ~8%, network ~5%, IO ~3%. Pretraining: storage ~12%, network ~8%, IO ~5%. Real bills depend heavily on dataset size, egress patterns, checkpoint frequency, and parallel-filesystem choices.

GPU prices shown are typical on-demand cloud rates as of 2026 and may vary by provider, region, and reservation term. Reserved or spot capacity is typically 30–80% cheaper.