• lurch (he/him)
    link
    fedilink
    English
    19 days ago

    a list with references to the training data plus what they added would be the bare minimum to call it open source, in my opinion, but a lot of people see this more strict than i do.

    • archomrade [he/him]
      link
      fedilink
      English
      29 days ago

      None of the flagship models publish their training data because they’re all trained on less-than-legal datasets.

      It’s a little like complaining that jellyfin doesn’t publish any media with their code - not only is that not legal but it’s implied that you’re responsible for attaining your own.

      If you’re someone who can and does compile and re-train your own 64B parameter LLM models, you almost certainly have your own dataset for that purpose (in fact huggingface has many).

      • lurch (he/him)
        link
        fedilink
        English
        18 days ago

        still doesn’t make it magically open source.

        debian would probably split the package in a non-free and open source part, for this reason.