Splitting One GPU Across Multiple Kubernetes Pods — Without MIG, Without Enterprise Licenses
A years-old GPU frustration, a conference discovery, and a 2AM PoC that actually worked The Problem I've Been Carrying for Years If you work with AI or video at scale and you're not at one of the b...

Source: DEV Community
A years-old GPU frustration, a conference discovery, and a 2AM PoC that actually worked The Problem I've Been Carrying for Years If you work with AI or video at scale and you're not at one of the big hyperscalers, you've probably hit this wall before: you have GPUs, and you're wasting them. Not because your workloads don't need GPU — they do. But because individually, each workload is small. AI inference services rarely saturate a whole card. Processing jobs spin up, eat some compute, and die. Embedding models, classifiers, lightweight LLMs — they each need a slice of a GPU, not the whole thing. None of them come close to maxing out the hardware on their own. And yet, in a typical Kubernetes setup, each one claims an entire GPU card and sits there hoarding it while the rest goes to waste. I've been building a platform that runs multiple AI and video processing workloads in parallel — inference services, enrichment pipelines, on-demand processing jobs. The kind of system where a lot of