• theunknownmuncher
    link
    fedilink
    arrow-up
    1
    ·
    7 days ago

    “I don’t think any of that is true. show me data” is shown data “I won’t accept that data!” Lol. Lmao even.

    Yeah, I’m not going to play this game of trying to anticipate which numbers you’re willing to accept and which you aren’t. You have just as equal access to a search engine as I have. All of the results I have seen align with the numbers that Qwen released and are well within margins of error.

    This model’s release caused such a stir and was a big deal due to the fact that it reproducibly meets or beats Claude Opus 4.5 while being locally runnable. If you won’t believe it, okay, I don’t care. 🤷

      • theunknownmuncher
        link
        fedilink
        arrow-up
        1
        ·
        7 days ago

        I run 27b at q8 with unquantized KV cache and 256k context on two Instinct MI60 GPUs. Definitely the best model that I have been able to run locally at a reasonable speed. 35b generates tokens as fast as you’d expect from any cloud provider. 27b is slower than 35b, of course, but token generation is still faster than my reading speed and suitable with coding agents.

          • theunknownmuncher
            link
            fedilink
            arrow-up
            1
            ·
            7 days ago

            The wattage is actually relatively low compared to a lot of current gen GPUs (mainly NVIDIA ones). They are software capped to 225W, but the GPUs can handle 300W. Compared to 5090 which is like 600W