Ollama in Docker Compose with GPU and Persistent Model Storage
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is...

Source: DEV Community
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not. This post focuses on one goal: a reproducible local or single-node Ollama "server" using Docker Compose, with GPU acceleration and persistent model storage. It intentionally skips generic Docker and Compose basics. When you need a compact list of the commands you reach for most often (images, containers, volumes, docker compose), the Docker Cheatsheet is a good companion. When you want HTTPS in front of Ollama, correct streaming and WebSocket proxying, and edge controls (auth, timeouts, rate limits), see Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming. For how Ollama fits alongside vLLM, Docker Model Runner, LocalAI, and cloud hosting trade-offs, see LLM Hosting in 2026: Local, Self-Hosted & Cloud Infrastructure Compared. When Compose beats a bare metal install