You could potentially run some smaller MoE models as they don’t take up too much memory while running. I’d suspect the deepseek r1 8B distill with some quantization would work well.
I tried out the 8B deepseek and found it pretty underwhelming - the responses were borderline unrelated to the prompts at times. The smallest I had any respectable output with was the 12B model - which I was able to run, at a somewhat usable speed even.
You could potentially run some smaller MoE models as they don’t take up too much memory while running. I’d suspect the deepseek r1 8B distill with some quantization would work well.
I tried out the 8B deepseek and found it pretty underwhelming - the responses were borderline unrelated to the prompts at times. The smallest I had any respectable output with was the 12B model - which I was able to run, at a somewhat usable speed even.
Ah, that’s probably fair, i haven’t run many of the smaller models yet.