• @brucethemoose
    link
    3
    edit-2
    27 days ago

    One can’t offload “usable” LLMs without tons of memory bandwidth and plenty of RAM. It’s just not physically possible.

    You can run small models like Phi pretty quick, but I don’t think people will be satisfied with that for copilot, even as basic autocomplete.

    About 2x faster than Intel’s current IGPs is the threshold where the offloading can happen, IMO. And that’s exactly what AMD/Apple are producing.