Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Stopthatgirl7 · 2 months ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

@gcheliotis · 2 months ago

The real question here is why the researcher “librarian” didn’t even attempt to anonymize the dataset before making it available. Full anonymization isn’t a trivial task, but at least removing unique identifiers or replacing them with randomly generated ones would be good practice.