You could potentially run some smaller MoE models as they don’t take up too much memory while running. I’d suspect the deepseek r1 8B distill with some quantization would work well.
I tried out the 8B deepseek and found it pretty underwhelming - the responses were borderline unrelated to the prompts at times. The smallest I had any respectable output with was the 12B model - which I was able to run, at a somewhat usable speed even.
I have 16 GB of RAM and recently tried running local LLM models. Turns out my RAM is a bigger limiting factor than my GPU.
And, yeah, docker’s always taking up 3-4 GB.
vram would help even more i think
Either you use your CPU and RAM, either your GPU and VRAM
Fair, I didn’t realize that. My GPU is a 1060 6 GB so I won’t be running any significant LLMs on it. This PC is pretty old at this point.
You can run a very decent LLM with that tbh
You could potentially run some smaller MoE models as they don’t take up too much memory while running. I’d suspect the deepseek r1 8B distill with some quantization would work well.
I tried out the 8B deepseek and found it pretty underwhelming - the responses were borderline unrelated to the prompts at times. The smallest I had any respectable output with was the 12B model - which I was able to run, at a somewhat usable speed even.
Ah, that’s probably fair, i haven’t run many of the smaller models yet.