• @buddascrayon
    link
    English
    35 days ago

    when the data used to train the AI is copyrighted, how do you make it open source? it’s a valid question.

    It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I’ll bet all of my teeth that the training data they’ve used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people’s lives are the training data.

    • @kava
      link
      English
      15 days ago

      Literally 3+ billion people’s lives are the training data.

      yep. I never thought about it but you’re absolutely right. that is Facebook’s “competitive advantage” that the other AI companies don’t have.

      although that’s part of it. I’m sure they do web scraping, novels, movie transcripts, college textbooks, research papers, newspapers, etc.