For about half a year I stuck with using 7B models and got a strong 4 bit quantisation on them, because I had very bad experiences with an old qwen 0.5B model.

But recently I tried running a smaller model like llama3.2 3B with 8bit quant and qwen2.5-1.5B-coder on full 16bit floating point quants, and those performed super good aswell on my 6GB VRAM gpu (gtx1060).

So now I am wondering: Should I pull strong quants of big models, or low quants/raw 16bit fp versions of smaller models?

What are your experiences with strong quants? I saw a video by that technovangelist guy on youtube and he said that sometimes even 2bit quants can be perfectly fine.

  • @j4k3
    link
    English
    214 hours ago

    I prefer a middle ground. My favorite model is still the 8 x 7b mixtral and specifically the flat/dolphin/maid uncensored model. Llama 3 can be better in some areas but alignment is garbage in many areas.

    • Smorty [she/her]OP
      link
      fedilink
      English
      214 hours ago

      Yeaaa those models are just too large for most people… You gotta have 56GB of VRAM to run an 8bit quant, which most people don’t have a quarter of.

      Also, what specifically do you mean by alignment? Are you talking about finetuning or instruction alignment?

      • @j4k3
        link
        English
        1
        edit-2
        13 hours ago

        deleted by creator

        • Smorty [she/her]OP
          link
          fedilink
          English
          212 hours ago

          Another user @[email protected] commented about there being a way to split it between GPU and CPU. Are you talking about this nvidia only and windows only thingy, which only works with the proprietary driver? If so, I’m really not gonna use that…

          Have you tried some of the abliterated models? They work really nicely even for the spiciest of topics. They literally can’t refuse your instruction, so they just go ahead and do what you want. But maybe even these models are too narrow for your specific application…

          • @j4k3
            link
            English
            19 hours ago

            deleted by creator