• @AustralianSimon
    link
    English
    51 year ago

    You can scrape Lemmy instances for training data without even running an instance.

    • @[email protected]
      link
      fedilink
      English
      0
      edit-2
      1 year ago

      Yeah, sorry if I’m not great at communicating. That’s exactly what I’m trying to point out when I said:

      Even if we don’t federate with them, Meta can still harvest the data so we should add these protections regardless.

      • @AustralianSimon
        link
        English
        11 year ago

        That’s the thing, anything public is fair game. This is why Reddit is ruining their API.

        • @[email protected]
          link
          fedilink
          English
          01 year ago

          It’s not fair game for for-profit bussinesses training LLM’s. That’s part of why Reddit made the move; so that companies would need to pay Reddit for access to the data for legally training models

          • @AustralianSimon
            link
            English
            11 year ago

            They changed the terms and made the API pay to use for large volumes of use. People using it to train models have already pillaged what they need and you can get the data prior to APIgeddon elsewhere.

            • @[email protected]
              link
              fedilink
              English
              0
              edit-2
              1 year ago

              Sure, but it’s still true that there are legal protections we can add that make it not fair game for Lemmy. At best it would be unfair-game (illegal scraping of Lemmy)

              • @AustralianSimon
                link
                English
                11 year ago

                A rule for one Lemmy or even the Lemmy app doesn’t mean same rule applies across ActivityPub Federation, if your data federated to my instance, it’s mine too.