• @brucethemoose
    link
    1
    edit-2
    9 hours ago

    It turns out these clusters are being used very inefficiently, seeing how Qwen 2.5 was trained with a fraction of the GPUs and is clobbering models from much larger clusters.

    One could say Facebook, OpenAI, X and such are “hoarding” H100s but are not pressured to utilize them efficiently since they are so GPU unconstrained.

    Google is an interesting case, as Gemini is getting better quickly, but they presumably use much more efficient/cheap TPUs to train.