• @PlutoniumAcid
    link
    201 month ago

    So if the Chinese version is so efficient, and is open source, then couldn’t openAI and anthropic run the same on their huge hardware and get enormous capacity out of it?

    • @Jhex
      link
      91 month ago

      Not necessarily… if I gave you my “faster car” for you to run on your private 7 lane highway, you can definitely squeeze every last bit of the speed the car gives, but no more.

      DeepSeek works as intended on 1% of the hardware the others allegedly “require” (allegedly, remember this is all a super hype bubble)… if you run it on super powerful machines, it will perform nicer but only to a certain extend… it will not suddenly develop more/better qualities just because the hardware it runs on is better

      • @PlutoniumAcid
        link
        31 month ago

        This makes sense, but it would still allow a hundred times more people to use the model without running into limits, no?

      • @merari42
        link
        21 month ago

        Didn’t deepseek solve some of the data wall problems by creating good chain of thought data with an intermediate RL model. That approach should work with the tried and tested scaling laws just using much more compute.

    • @AdrianTheFrog
      link
      English
      91 month ago

      OpenAI could use less hardware to get similar performance if they used the Chinese version, but they already have enough hardware to run their model.

      Theoretically the best move for them would be to train their own, larger model using the same technique (as to still fully utilize their hardware) but this is easier said than done.

    • @Yggnar
      link
      11 month ago

      It’s not multimodal so I’d have to imagine it wouldn’t be worth pursuing in that regard.

      • @merari42
        link
        11 month ago

        doesn’t deepseek work on that though with their janus models?