Thanks to Samantha Cole at 404 Media, we are now aware that Automattic plans to sell user data from Tumblr and WordPress.com (which is the host for my blog) for “AI” products. In respon…
So I guess there are two paths of training data. Some company selling it explicitly, and the companies just scraping accessible data. Not that either is “good”, but at least with public data, you only have the AI company profiting.
Yep. That’s why the two things I say Automattic MUST do to make things right are about proper consent controls for Automattic’s use of data and sale to AI vendors, but the third thing is a proposed proactive defense against scrapers.
So I guess there are two paths of training data. Some company selling it explicitly, and the companies just scraping accessible data. Not that either is “good”, but at least with public data, you only have the AI company profiting.
Yep. That’s why the two things I say Automattic MUST do to make things right are about proper consent controls for Automattic’s use of data and sale to AI vendors, but the third thing is a proposed proactive defense against scrapers.