Source: https://front-end.social/@fox/110846484782705013
Text in the screenshot from Grammarly says:
We develop data sets to train our algorithms so that we can improve the services we provide to customers like you. We have devoted significant time and resources to developing methods to ensure that these data sets are anonymized and de-identified.
To develop these data sets, we sample snippets of text at random, disassociate them from a user’s account, and then use a variety of different methods to strip the text of identifying information (such as identifiers, contact details, addresses, etc.). Only then do we use the snippets to train our algorithms-and the original text is deleted. In other words, we don’t store any text in a manner that can be associated with your account or used to identify you or anyone else.
We currently offer a feature that permits customers to opt out of this use for Grammarly Business teams of 500 users or more. Please let me know if you might be interested in a license of this size, and I’II forward your request to the corresponding team.
so “we know businesses would leave in droves so of course we don’t train on their data, but you customers are stupid and so we’ll train on all of your private shit”
A true classic
Gang, I hate to tell you this but this is what we mean when we say “you are the product” especially with free offerings.
But if you hate that I have a worse thing to introduce you to: the internet. If you respond to this comment, or any comment on any lemmy instance or other federated service or website or blog… your words can be consumed, copied and used to train whatever anyone wants. It is trivially easy to create web scrapers with just a bit of coding knowledge. These days it’s pretty easy to then use that data to train AI models. To a computer, it’s just data.
Grammarly is a product where you give it bad grammar and it gives you good grammar. Grammarly, like many products, gets better over time when it can understand what went wrong so its teams can make it right. This can often include any text entered into the program. I don’t know the specifics but they should be outlined in the privacy policy. A company using data it already has to train AI makes sense, especially if it anonymizes that data. It may not be ethical given that users weren’t aware of AI at the time they accepted the privacy policy, but with american capitalism a company can change a privacy policy and you can opt out if you don’t like it.
That’s why we all have lawyers on retainer to read and translate all privacy policies for all websites and applications we interact with in a daily basis. Right? That’s normal, right?
I will say, could this support person have meant that an organization with 500+ employees get a custom AI model trained on only the organization’s 500+ accounts? Because that would be better, and likely more ethical too.
If that’s not the case and any content you have put into grammarly is being used to train AI, then I guess it’s time to stop using grammarly then huh? But it’s also time to stop posting anything on the web, too. Oh, and don’t publish anything, ever.
Or, you could go with the flow. This data is mixed with millions of other accounts… sort of like what happened when chatgpt trained on anything you’ve already put out there. The only real concern I could see is if you discussed a very specific thing or invented your own personal coded style of writing and used it so much that, among the millions of other users, dominated the corpus and skewed the training model. Say there are only 5 grammarly users and you are number 5… you keep talking about “procorpia” being “mass sledge”, generating hundreds of entries with thousands of tokens “words”. By contrast let’s say the other 4 grammarly users only used it a few times a month to send short emails. Now, after training, the 6th grammarly user mispells a word as “procorpia” and grammarly generares “procorpia is totes mass sledge brah”. Suddenly, your secret is out.
If, on the other hand you speak the same broken english as the rest of us, you are probably fine.
Good assessment. I am leery of companies using my data for AI training in some ways, but overall I understand that data is data to the AI model and it neither knows nor cares who I am and what I say. Also, after anonymization, sanitation, and cleaning, most data sets look like noise to the casual observer, even someone who knows what they are looking at usually has to take some time to get their head around a format, so it isn’t like I’m really worried about some human looking at my data in the dataset.
My issue is, and will always be, data brokers. The instant that that data set is being sold to some broker who can rather trivially de-anonomize it by cross-referencing it with other data in their possession and turn around to sell it to god only knows who (fuck politicians using data brokers to target specific people), I have some serious problems. So I guess the collection and internal utilization of data is not a problem for me, it is what some greedy little shit looking to make his Q2 report bottom line decides to do with it that worries me.
deleted by creator
Great call-out! Probably because most over 500 can’t allow exfiltration of data for security (banks, gov, corp secrets). From my understanding most, if not all, AI data sets are built from our data.
Google and OpenAI was caught (or opening saying) using web spiders for AI datasets, crawling public data from Wikipedia, Facebook, Reddit and more. Google even using data from Google Assistant and other opt-in data “to improve their products”.
I haven’t looked into Microsoft but they are also using OpenAI as they are a major investor for OpenAI.
The whole privacy thing started ten years ago already. People talk about it, protest about it, but at the end of the day they log into facebook and use messanger “because its easy”. Why would they take us seriously?
The only social media I’ve ever had is Reddit and Lemmy. Whenever I tell people that I don’t have the others they always say, “WhAt dO YoU hAvE tO HiDe,” as if I’m a criminal masterind poltting to take over the world or something 🙄
Privacy is meant to be a human right but is increasingly obvious that privacy is actually user pays in modern society.
This might be a good time to push open source alternatives like LanguageTool
Just scrolled past a post mentioned not all parts of it is in fact open source
From my understanding, their free tier is fully open source, but their paid features are closed sourced. I could be wrong.
I think it’s still the better alternative out there, so I’m still recommending it. (the free tier is more than enough for most people)
Maybe, something about a new update putting things behind closed doors. It’s still is a viable free option imo. I hate that our data is free game in so many places
I wonder if this violates GDPR or something similar
deleted by creator