The Complete Guide to Kubernetes GPU Scheduling
How to configure Kubernetes for GPU workloads — from device plugins to topology-aware scheduling. Plus, where default K8s scheduling falls short and how to fix it.
GPU Scheduling in Kubernetes: The Basics
Kubernetes supports GPU scheduling through device plugins. NVIDIA's k8s-device-plugin is the most common, but AMD and Intel have their own implementations.
The basic flow:
resources:
limits:
nvidia.com/gpu: 2
This requests 2 GPUs for a pod. Kubernetes finds a node with 2 available GPUs and schedules the pod there. Simple — and insufficient.
Where Default Scheduling Fails
No Topology Awareness
Default K8s scheduling doesn't consider GPU topology. Two GPUs on the same NVLink bridge perform dramatically differently than two GPUs connected via PCIe. For distributed training, this matters enormously.
No Fractional GPU Support
A pod requesting 1 GPU gets an entire GPU, even if it only needs 4GB of a 80GB A100. Multi-Instance GPU (MIG) helps, but configuration is manual and static.
No Workload-Aware Placement
K8s doesn't know that your inference workload would perform identically on a cheaper GPU, or that your training job needs high-bandwidth interconnect. All GPUs are treated as equivalent.
No Cross-Node Optimization
Scheduling decisions are per-pod. There's no global view of fleet utilization or ability to rebalance running workloads.
Better GPU Scheduling
DeepLM's scheduler integrates with K8s as a secondary scheduler:
schedulerName: deeplm-scheduler
It adds:
- Topology-aware placement — respects NVLink, PCIe, and cross-node fabric
- Workload profiling — learns GPU utilization patterns per job type
- Dynamic rebalancing — suggests or executes workload migration
- Multi-vendor support — schedules across NVIDIA, AMD, Intel from one API
Getting Started
Install the DeepLM scheduler on any K8s cluster with GPU nodes:
helm install deeplm-scheduler deeplm/scheduler
Or start with DeepLM Insights for observability before switching schedulers.