• @[email protected]
      link
      fedilink
      English
      21 day ago

      Are they, though? You need shit loads of Vram, or at least RAM, to get a usable experience.

      • @[email protected]
        link
        fedilink
        English
        51 day ago

        Not really. You can run the 8b model with like 6-8gb of vram… It’s literally in the realm where people can run it on their phone

    • @AdrianTheFrog
      link
      English
      32 days ago

      for the distilled lighter models you can run them easily, the original you need like at least 260 gb of ram it looks like

      this video gets a semi usable experience with a $5500 cpu https://www.youtube.com/watch?v=o1sN1lB76EA

      you could get the thelio astra to run it for like $6900 total and probably get similar performance, still cheaper than the base model mac pro lol

      for better speed you could probably buy a bunch of old tesla gpus on ebay, that might work

      • Pup Biru
        link
        fedilink
        English
        28 hours ago

        you don’t actually need to fit the whole model in RAM at once: the 70b for example “requires” something like 120gb of VRAM, but i’m running it on my 64gb m1 mbp - it just starts to run a bit slower (still very usable; i reckon about a word per 300ms)

      • @[email protected]
        link
        fedilink
        English
        21 day ago

        True, but who cares about the base models? Usefulness is what matters - the 8gb model is pretty useful, better than the free tier of anything I’ve tried

        Maybe the paid models are better… Just like adaptive cruise control, I refuse to rely on it until I can rely on it. I’m driving, I know the top models still need me to drive them, so I’m happy with what I have… Why rely on something that could be taken away?

        • @AdrianTheFrog
          link
          English
          116 hours ago

          I was trying the 14b model (q4_k_m quantized) on my 3060 recently, and while it is clearly stupider than ChatGPT (i tried asking it some things from old chatgpt chats) it is much faster (20 tokens per second) and at least doesn’t suddenly become dumber once openai decides you’ve had enough 4o time today on the free plan and the rest of your chat will use whatever earlier model there was