Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

Stopthatgirl7 · 6 months ago

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

@T156 · 6 months ago

At least in theory you could still do NLP from online sources, but the sheer amount of work necessary to ensure that you got the bots out makes it unfeasible.

Not just that, but the increasing number of sites blocking or having countermeasures against the tools they use also increases the amount of work/makes it harder.

Several years ago, it would have been easy and cheap to noodle up a quick Twitter or Reddit bot to churn through posts and spit out the posts on the other side. These days, you need to pay for that, and in some cases, pay quite a lot.

X (formerly known as Twitter), for example, wants to charge $100/month, and Reddit wants $0.24 per 100 API calls.

You can scrape, of course, but that risks getting you banned, if you’re not going to run into barriers. The website formerly known as Twitter no longer allows you to see parent tweets, nor replies if you’re not logged in, for example.