Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • @randon31415
    link
    22 days ago

    If the Source is Open to copying, and I won’t get sued for doing it, well, then…

    • Fushuan [he/him]
      link
      fedilink
      English
      22 days ago

      The source OP is referring to is the training data what they used to compute those weights. Meaning, petabytes of text. Without that we don’t know which content theynused for training the model.

      The running/training engines might be open source, the pretrained model isn’t and claiming otherwise is wrong.

      Nothing wrong with it being this way, most commercial models operate the same way obviously. Just don’t claim that themselves is open source because a big part of it is that people can reproduce your training to verify that there’s no fowl play in the input data. We literally can’t. That’s it.