Wanting to profit from AI companies hunt for training data (over and above the community that created that data) is a big part of what created the context for the recent migration away from Reddit. How will the fediverse approach this problem?

  • @sachasageOP
    link
    21 year ago

    Fair, but then there’s a line between scraping through ordinary traffic and using API access to gather large data sets.

    • key
      link
      fedilink
      21 year ago

      Is there? Effect is the same. Use machine learning to parse html generically and throw hardware and a pool of IPs at it. A lot more efficient than coding an API client for every service out there. It’s the same approach search engines use.

      I don’t see anything being done effectively without legal protections.