Demystifying GPU Sizing for LLMs
The Context Large Open Source (OSS) language models are springing up everywhere – Mistrals, Llama, surprise releases like Kimi K2 and, this month, OpenAI OSS models. Each of them promises open‑source autonomy and fine‑tuned intelligence, but there’s a very practical question lurking in the background: how much hardware does it take to run them? The answer isn’t just about having “some GPUs”; it hinges on understanding the interplay between model size, memory bandwidth and inference throughput. One of my hobby projects is a GPU infrastructure estimator that addresses exactly this question. Initially built as an unwieldy spreadsheet, it has evolved into a simple web tool designed to answer a surprisingly common question: “Can I run the latest open‑source models on‑prem for tens or hundreds of thousands of euros?” . What the tool does At its core, the estimator calculates three things: memory requirements , compute throughput , and latency . To make the experience approachable, it...