A new web crawler launched by Meta last month is quietly scraping the web for AI training data

lemme in · 6 months ago

A new web crawler launched by Meta last month is quietly scraping the web for AI training data

@[email protected] · edit-2 6 months ago

The AI cat is out of the bag. How do they know they’re not feeding AI generated garbage into their models?

Actually I think I’m gonna go in my personal website and add 200 pages of locally generated LLM garbage with hidden links to those pages that only bots should follow.

@[email protected] · 6 months ago

How do they know they’re not feeding AI generated garbage into their models?

They don’t. Any popular place on the internet which lets users type text for people to publicly view is now full of AI trash. They’ve fucked it, this shit is just gonna spiral into progressively worse garbage

@[email protected] · 6 months ago

They screwed the artificial pooch in a manner of speaking.