• @GamingChairModel
      link
      English
      571 month ago

      “Whistleblows” as if he’s some kind of NVIDIA insider.

    • Eager Eagle
      link
      English
      491 month ago

      I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

      • Jeena
        link
        fedilink
        English
        251 month ago

        Exactly, I’m in the same situation now and the 8GB in those cheaper cards don’t even let you run a 13B model. I’m trying to research if I can run a 13B one on a 3060 with 12 GB.

        • The Hobbyist
          link
          fedilink
          English
          151 month ago

          You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.

          • Jeena
            link
            fedilink
            English
            61 month ago

            Oh nice, that’s faster than I imagined.

          • @levzzz
            link
            English
            41 month ago

            You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory

          • @[email protected]
            link
            fedilink
            English
            21 month ago

            I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx

            • The Hobbyist
              link
              fedilink
              English
              4
              edit-2
              1 month ago

              Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.

              I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.

              Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M

              Edit: I confirmed I do get 27.9 t/s, using default ollama settings.

              • @[email protected]
                link
                fedilink
                English
                21 month ago

                Ty. I’ll try ollama with the Q-4-M quantization. I wouldn’t expect to see a difference between ollama and SGlang.

              • Jeena
                link
                fedilink
                English
                21 month ago

                Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.

        • @[email protected]
          link
          fedilink
          English
          41 month ago

          I’m running deepseek-r1:14b on a 12GB rx6700. It just about fits in memory and is pretty fast.

      • newcockroach
        link
        English
        81 month ago

        “Some hentai games are good” -Edward Snowden

        • @Siegfried
          link
          English
          230 days ago

          Note that this is from 2003

    • ඞmir
      link
      fedilink
      English
      101 month ago

      I’ll keep believing this is a theonion post