It’s also depends on your use cases. I have 10 GB VRAM and local LLMs work ok for spell/style checking, idea generation and name generation (naming planert clusters thematically in Star Ruler 2).
I get functional code snippets for around 3 of 4 questions in any major or most minor languages from a local model. I also get good summarized information about code functionality if I paste up to around 1k lines into the context. I also get fun collaborative story writing in different formats using unique themes from my own science fiction universe. I have explored smaller models in hopes of fine tuning before I discovered the utility of a much larger but quantized model. I never use anything smaller than a 70B or 8×7B because there is no real comparison in my experience and uses. On my hardware, these generate a text stream close to my reading pace.
It’s also depends on your use cases. I have 10 GB VRAM and local LLMs work ok for spell/style checking, idea generation and name generation (naming planert clusters thematically in Star Ruler 2).
I get functional code snippets for around 3 of 4 questions in any major or most minor languages from a local model. I also get good summarized information about code functionality if I paste up to around 1k lines into the context. I also get fun collaborative story writing in different formats using unique themes from my own science fiction universe. I have explored smaller models in hopes of fine tuning before I discovered the utility of a much larger but quantized model. I never use anything smaller than a 70B or 8×7B because there is no real comparison in my experience and uses. On my hardware, these generate a text stream close to my reading pace.
Image and video generation is where you see that VRAM bottleneck. I have a 12GB 4070 and it cannot generate any video despite my best efforts/tweaks.