With how prevalent the AI and data scraping conversation has become
You realize that “conversation” is fake, right? There is no increased load on Twitter, Reddit, or other web services due to “AI data scraping”. That was made up to distract from the material causes of Twitter’s failure, namely:
most of their engineers were laid-off or quit
they don’t pay their bills
Big tech companies that already run search engines already have a copy of all public Web pages, which they use for search engine indexing. They don’t need to make a second copy for AI training; they can just use the same one.
Google can train Bard with the same copy of the public Web that they use to create Google Search; same with Microsoft, Baidu, or any other big company that runs a search engine.
“Fake” from the side of data load, sure, I can see that, but there’s plenty of interest in trying to stave off the “dead internet” by incorporating new systems where bots and AI generated content aren’t profitable. That’s more what I was referring to.
You realize that “conversation” is fake, right? There is no increased load on Twitter, Reddit, or other web services due to “AI data scraping”. That was made up to distract from the material causes of Twitter’s failure, namely:
Big tech companies that already run search engines already have a copy of all public Web pages, which they use for search engine indexing. They don’t need to make a second copy for AI training; they can just use the same one.
Google can train Bard with the same copy of the public Web that they use to create Google Search; same with Microsoft, Baidu, or any other big company that runs a search engine.
And for everyone else, there’s Common Crawl.
“Fake” from the side of data load, sure, I can see that, but there’s plenty of interest in trying to stave off the “dead internet” by incorporating new systems where bots and AI generated content aren’t profitable. That’s more what I was referring to.