Gemma 4’s smallest model runs on 3GB of VRAM, and it’s the one I actually reach for
Local AI used to ask one big question of you: what can your hardware actually take? The answer determined which models you could run, the context length you could push, and whether vision or audio were even on the table. The question still remains, but it’s not as black and white anymore because smaller models…
