• @brucethemoose
    link
    English
    11 day ago

    IMO its not really “enough” until the bus is 256 bit. Thats when 32B-72B class models start to look even theoretically runnable at decent speeds.

      • @brucethemoose
        link
        English
        1
        edit-2
        1 day ago

        Also that is a very low context test. A longer context will bog it down, even setting aside the prompt processing time.

        …On the other hand, you could probably squeeze a bit more running openvino instead of llama.cpp, so that is still respectable.

        • @[email protected]
          link
          fedilink
          English
          21 day ago

          text test. A longer co

          yeah, it’s definitely not good enough for user-facing work, but if I’m working on development for something like translations, being able to see the 70b output to compare it to other models, it’s super useful before I send it off to something that costs more money to run.

          9/10 times, the bigger model isn’t significantly better for what I’m trying to do, but it’s really nice to confirm that.