Source: https://front-end.social/@fox/110846484782705013

Text in the screenshot from Grammarly says:

We develop data sets to train our algorithms so that we can improve the services we provide to customers like you. We have devoted significant time and resources to developing methods to ensure that these data sets are anonymized and de-identified.

To develop these data sets, we sample snippets of text at random, disassociate them from a user’s account, and then use a variety of different methods to strip the text of identifying information (such as identifiers, contact details, addresses, etc.). Only then do we use the snippets to train our algorithms-and the original text is deleted. In other words, we don’t store any text in a manner that can be associated with your account or used to identify you or anyone else.

We currently offer a feature that permits customers to opt out of this use for Grammarly Business teams of 500 users or more. Please let me know if you might be interested in a license of this size, and I’II forward your request to the corresponding team.

    • @fcSolar
      link
      English
      141 year ago

      Per their website premium includes “Unlimited sentence paraphrasing powered by A.I.” so I’m not sure they’re an appropriate alternative to avoid the “AI” bullshit.

      • @[email protected]
        link
        fedilink
        18
        edit-2
        1 year ago

        You can’t avoid the AI “bullshit”. It’s like saying you want to avoid this portable phone craze. It’s a tool.

        • @fcSolar
          link
          English
          61 year ago

          I can avoid it like I’ve avoided cryptocurrency and NFTs. And it may be a “tool,” but it’s one built on the theft from and unpaid labor of tens of thousands of independent creators, and is nigh wholly controlled by corporate interests bent on eliminating those same independent creators whose data they stole to make their “tools.” It should not exist. Not until it can be made in an ethical manner without harming the creatives necessary to make it.

          • arglebargle
            link
            fedilink
            English
            6
            edit-2
            1 year ago

            I don’t buy the theft argument. Was reading books to my daughter to help them learn how to read theft? When we were working on parameters in the 60s to help a computer identify a balloon vs. a dog, was that theft? The corpulent side of me says if you put something out in public spaces, people are going to learn from it. If you don’t want that, don’t share it.

            But even beyond that, parameters of learning are not copying, they are examples to develop data points on. Or in the case of imagery and something like stable diffusion it is math formulas developed in the 40s on how to make noise and then reverse that. Is that copying or theft?

            I am willing to have the argument that AI is full of pitfalls. And that corporate control is not a good thing. I am struggling to see this theft.

            • @[email protected]
              link
              fedilink
              71 year ago

              It isn’t theft because the technology fundamentally steals. It’s theft because the people in control of the technology fundamentally steal.

              I’m not talking about basement dwellers with a 3090 either. People using their m2 to generate lactating joe Biden fanfic aren’t the problem like multi-billion dollar companies taking advantage of the webs openness to train models that will be used to sell generative services replacing the creators of the stuff they were trained on are.

              It’s the enclosures all over again.

              Now when people speak out about it they’re called luddites and we don’t have the historical literacy to say “yes, I will smash this and any mill used to oppress me”.

              • arglebargle
                link
                fedilink
                English
                -2
                edit-2
                1 year ago

                I still do not see that as theft. Or at least no different than theft of labor like a company store.

                Corporate dominance, commercialization, exploitation, something along those lines. But that is the same as everything else, AI is not specifically the issue.

                Then again I was listening to Knowledge Fight and frankly the fact that people will believe DNA has antennas, or that a team of people on Real World cannot solve “what is 27 divided by 3” does not leave me much hope for us anyways. They tried and ran out of time saying it was unsolvable. Maybe we get what we deserve.

                • @[email protected]
                  link
                  fedilink
                  21 year ago

                  people struggling for a way to express how massive incredibly powerful companies are literally building technology that will take their livelihoods away aren’t gonna develop a vocabulary for the exact thing that’s happening.

                  they’re stealing. it’s theft.

                  it doesn’t matter that the precise thing happening isn’t what we would legally call theft. the people saying ai is theft believe that their only chance to keep the corporate users of the technology from destroying their future and leaving them with nothing is to mobilize now using language that everyone understands. so they’re calling it theft.

                  those people are wrong btw, they can’t win even a tiny victory against the entire economic system that’s decided the way itll extract profit from the tech sector is by automating labor.

                  thats the difference between which side of the spear youre on.

                  if you got the point youre yelling “ahh, dont, youre literally killing me! please, if you have any humanity in you save me from this monster!” if you have the stick end youre yelling “oh fuck off, the severe hemorrhage is killing you, stop lying! check out what happens when i twist this sucker!” when youre on the sidelines youre just eating popcorn and arguing about minutiae like us.

                  to your last point, i’d be more worried about historical literacy. a calculator can act as a bandage for lack of numeracy, no machine will bridge the gap in understanding that pedantry mobilized against justice sails through.

          • @[email protected]
            link
            fedilink
            -12
            edit-2
            1 year ago

            The whole system is built on exploitation. I don’t see you boycotting luxury clothes, diamond, rare metal that are made by exploiting someone from a third world country to inhuman levels. Ah, yes. It could affect people you know, It’s immoral now. Am tired of this hypocrisy.

      • @[email protected]
        link
        fedilink
        3
        edit-2
        1 year ago

        I’m pretty sure most tools like this have to use ai to some degree to be more effective than something like Microsoft Word. I think the issue is more whether it’s opt in or not to include your own data.

      • @[email protected]
        link
        fedilink
        91 year ago

        I have. It’s pretty short and to the point. They’re based out of Germany so their requirements for clarity are pretty high by law. They go into quite a lot of detail about what is sent.

        In this case they send date, time, language, processing time, number and the type of errors, but not the text itself

        However, they do have an optional feature that uses OpenAI to rephrase sentences so that might be training through the back door.

        I’ve been using it for years and have been very happy with the service.

    • @[email protected]
      link
      fedilink
      31 year ago

      I took a quick look at this and it seems that the server portion of this product is open source but the apps such as extensions are not. I’m not saying it’s bad or even that it’s a red flag. I just felt like I should point it out.

    • Frog-Brawler
      link
      fedilink
      -191 year ago

      I appreciate you spreading open source alternatives, but this is one of those things that needs an HR solution; not IT.

        • Frog-Brawler
          link
          fedilink
          71 year ago

          I actually have no idea what I was talking about when I wrote that. That’s a bit troubling.

  • Michael
    link
    fedilink
    English
    931 year ago

    Yeah Grammarly was selling all your data LONG before the AI showed up.

    Funny how some people are only nervous now that their data might be used to train a language model. I was always more worried about spooks! :)

    • Poggervania
      link
      fedilink
      37
      edit-2
      1 year ago

      Companies selling consumer data for profit and marketeering: i sleep

      Companies using consumer data to train AI models:
      R E A L S H I T

    • Eager Eagle
      link
      English
      171 year ago

      True. Companies sell our data to third parties since forever, but some people are worried about it being used to train machine learning models? I’m far more concerned by people using it than AI.

    • @[email protected]
      link
      fedilink
      101 year ago

      It’s because certain companies are stirring the pot and manipulating. They want people mad so they can put restrictions on training AI, to stifle the open source scene.

        • @[email protected]
          link
          fedilink
          41 year ago

          They even named their company specifically to make it harder for open source ai to name themselves. Thats some dedication.

          • cloaker
            link
            fedilink
            21 year ago

            I honestly thought they were foss hearing the name. Pretty awful lol

  • @[email protected]
    link
    fedilink
    79
    edit-2
    1 year ago

    I see you posted this article to 4 communities. According to the comments on this post if you use the cross post function (in the default web frontend), it will only show once in the feeds instead of 4 times (which can be a bit annoying).

    Thanks

    EDIT: post link and aclaration regarding the UI

    • @[email protected]OP
      link
      fedilink
      651 year ago

      I did use the cross-post function. Most apps do not currently acknowledge this function which might explain why the article has appeared to you multiple times.

        • PorkRollWobbly
          link
          fedilink
          181 year ago

          What is this healthy communication?! Aren’t you supposed to go into the “what the fuck did you just say to me” ramble?

      • @[email protected]
        link
        fedilink
        31 year ago

        I see content from many servers in the lemmy federation. My understanding, which could be wrong, was that like email, you can post to any domain and see posts from other domains. What’s the advantage of posting to many instances?

  • @[email protected]
    link
    fedilink
    English
    231 year ago

    How much do you have to pay for them to not monitor your every keystroke, including all your IP and passwords?

    Oh, that’s their business model, right.

  • fiat_lux
    link
    fedilink
    211 year ago

    Even as someone who declines all cookies where possible on every site, I have to ask. How do you think they are going to be able to improve their language based services without using language learning models or other algorithmic evaluation of user data?

    I get that the combo of AI and privacy have huge consequences, and that grammarly’s opt-out limits are genuinely shit. But it seems like everyone is so scared of the concept of AI that we’re harming research on tools that can help us while the tools which hurt us are developed with no consequence, because they don’t bother with any transparency or announcement.

    Not that I’m any fan of grammarly, I don’t use it. I think that might be self-evident though.

    • harmonea
      link
      fedilink
      231 year ago

      Framing this solely as fear is extremely disingenuous. Speaking only for myself: I’m not against the development of AI or LLMs in general. I’m against the trained models being used for profit with no credit or cut given to the humans who trained it, willing or unwilling.

      It’s not even a matter of “if you aren’t the paying customer, you’re the product” - massive swaths of text used to train AIs were scraped without permission from sources whose platforms never sought to profit from users’ submissions, like AO3. Until this is righted (which is likely never, I admit, because the LLM owners have no incentive whatsoever to change this behavior), I refuse to work with any site that intends to use my work to train LLMs.

      • @[email protected]
        link
        fedilink
        -2
        edit-2
        1 year ago

        Models need vast amounts of data. Paying individual users isnt feasible, and like you said most of it can be scraped.

        The only way I see this working is if scraped content is a no go and then you pay the website, publishing house, record company, etc which kills any open source solution and doesn’t really help any of the users or creators that much. It also paves the way for certain companies owning a lot of our economy as we move towards an AI driven society.

        It’s definitely a hot mess but the way I see it, the more restrictive we are with it, the more gross monopolies we create for no real gains.

        • harmonea
          link
          fedilink
          91 year ago

          Paying individual users isnt feasible

          Sounds like their problem to solve, not mine.

        • @[email protected]
          link
          fedilink
          English
          21 year ago

          I don’t see why those are the only two options.

          We could update GPL, CC, etc. licensing so that it specifies whether the author intends to allow their work to be used for LLM training. And you could still put a non-commercial or share-alike constraint on it.

          Hooray, open source is saved while greedy grubby hands are thwarted.

          • @[email protected]
            link
            fedilink
            01 year ago

            What happens when every corporation and website closes their doors to AI? There isn’t any open source if we can’t use scrapped information from stack overflow, GitHub, Reddit etc.

            Sure some users will opt out but most won’t. Every single website will restrict though and then they will sell it to google and Microsoft who will be the only companies able to build ais.

            • @[email protected]
              link
              fedilink
              English
              01 year ago

              If I could predict what happens to the tech market when XYZ policy is enacted, I wouldn’t be posting on Lemmy during my tea breaks. Whatever policies end up sticking around, success is gonna require a lot of us having ideas, trying them out, and recombining them.

              But I’ll claim this about my personal metric of “success”: If the future of open source looks like copying the extractive data-mining model of big tech and hoping we can shove the entire history of human thought into a blender faster than them, I think we’ve failed.

        • @[email protected]
          link
          fedilink
          1
          edit-2
          1 year ago

          I mean they’re not even giving credit or asking permission, which both cost nothing. Make a site where people can volunteer their own work, program the ai to generate a list of citations of all the works it used data from when it provides output (I know that this might be lengthy, that’s fine), if you implement it into any sites or software make it so that people can opt out of having their data used, etc. It’s not that hard.

          • @[email protected]
            link
            fedilink
            1
            edit-2
            1 year ago

            Most of the data is scraped, it’s not up to the website. You can’t give a list of citation since it isn’t a search engine, it doesn’t know where the information comes from and it’s highly transformative, it melds information from hundreds if not thousand of different sources.

            If it worked only with volunteer work, there would simply be not enough data.

            Any law restricting data use in AI is only going to benefit corporations, there isn’t a solution for individual content creators. You can’t pay them for the drop in the bucket they add, thee logistics are insane. You can let them opt out, but then you need to do the same for whole websites which leads to a corporate hellscape where three companies own our whole economy since they are the only ones who can train ais.

            • @[email protected]
              link
              fedilink
              1
              edit-2
              1 year ago

              Most of the data is scraped, it’s not up to the website.

              It is up to whoever runs the ai, and those are the people I’m addressing for the most part, though plenty of websites do have control over what data is fed to the ai they’re using. In grammarly’s case it’s absolutely up to them whether there’s an option provided to opt out of having your work used for training the ai, as shown by the fact that they offer it to the business license. They just choose not to offer that option to other users.

              You can’t give a list of citation since it isn’t a search engine, it doesn’t know where the information comes from and it’s highly transformative, it melds information from hundreds if not thousand of different sources.

              It’s all code, the people coding it are 100% capable of programming it to keep track of where the information comes from. Even if it’s transformative, that doesn’t prevent it from keeping track of what was transformed.

              If it worked only with volunteer work, there would simply be not enough data.

              According to who? There are plenty of ways to get data from voluntary sources just like we get for any number of studies. It’s just up to the one who runs the ai to put in the legwork to get enough data that way, and there are lots of methods. You don’t have to just sit and wait for people to come to you and sign up, though based on the ai frenzy I net they could have gotten plenty of data that way from people who are curious and want to contribute to ai training as a novel new concept. Making ai data gathering on websites something people can opt in or out on is just one way of making it more ethical than forcibly taking that data without permission.

              Any law restricting data use in AI is only going to benefit corporations,

              I fail to see how requiring permission and offering the option to opt out of having your data used would benefit corporations. That just sounds like an excuse to not even try to regulate them.

              You can let them opt out, but then you need to do the same for whole websites which leads to a corporate hellscape where three companies own our whole economy since they are the only ones who can train ais.

              I don’t understand how part A leads to part B here. Why would those corporations have an advantage just because everyone with ais, including them, have to offer the option to opt out? Also, it’s entirely possible to also restrict the scope of an ai or regulate ai monopolies alongside regulating stuff like basic consent. Historically a lack of regulation is what causes corporate hellscapes because without something keeping them in check the larger companies will take advantage of their reach to do whatever they want on a larger scale, pushing out or merging with competitors. It’s not like requiring permission and providing opt-out would give them more of an advantage than they already have.

              • @QuaternionsRock
                link
                11 year ago

                It’s all code, the people coding it are 100% capable of programming it to keep track of where the information comes from. Even if it’s transformative, that doesn’t prevent it from keeping track of what was transformed.

                This is a fundamental misunderstanding of how LLMs actually work. Given a list of previous tokens, a complicated set of linear algebra and normalization operations are applied to yield the “probability” (in quotes because this is a dubious application of the word imo) that each known token will follow it. The model is trained using an equally complicated regression algorithm that slowly adjusts the billions of linear algebra coefficients to more closely match the training data. RLHF is then used to make more adjustments that allow the AI to fulfill its intended purpose (e.g., to reinforce the question-answer format expected of ChatGPT).

                You may recall regression from your first statistics class. Even in the case of simple linear regression, when the input consists of millions of data points, it is essentially impossible to determine which point should be “credited” for any given aspect of the output line. The same is true for AI: you could maybe compile a list of training data that makes a token “likely” to appear after another token, but nothing more complex than that. It is very rare for a small set of sources to be responsible for a sequence longer than a few tokens.

                I do, however, believe they should be required to provided a very specific list of sources used for training the model. I think it’s ridiculous to claim that generative AI is transformative in a practical sense: I can’t imagine it would be legal for companies to make endless photocopies of copyrighted material and have a computer make fancy scrapbooks out of it, even if “it’s a fledgling industry” or whatever.

              • @[email protected]
                link
                fedilink
                -11 year ago

                It depends for what kind of AI and but no, giving sources and building with just volunteer data is just not possible at our current technological level. I’m mostly talking about large llms because that’s what’s really at stake and they train on huge amounts of data. Like ALL of stack, GitHub, Reddit, etc. Just fine tuning them on a consumer level takes more than 50 000 question and answer pairs, that’s just one tiny superficial layer that’s added on top.

                Grammerly should absolutely add an opt out option to gain consumers trust, but forcing the the whole industry to do so is a disaster.

                If individuals can opt out, so will websites to “protect their users”. Then we get data hoarding, where stack and GitHub opt out of all open source options but sell it to the only ones that can now afford to build ais, Microsoft and google. it won’t include data of certain individuals, the few that opt out, but I’m guessing eventually the opt in will be directly into the terms of service of websites, you opt in or you fuck off.

                How does anyone except corporations benefit from this kind of circus. In 10 years, AI will be doing most office work. Google isn’t dumb and wants that profit. They and openai have all the data, they can strong arm or buy what they are missing. Restricting and legislating only widens their moat.

      • @[email protected]
        link
        fedilink
        -31 year ago

        I’m against the trained models being used for profit with no credit or cut given to the humans who trained it.

        Sorry mate, hell’s gonna get cold before this happens. We’re talking about the biggest moth******ers on earth since always. Do you think Meta/[insert big tech company name here] will start to behave all of the sudden? These people literally KILL people everyday for a profit (looking at you Instagram).

        The only way to get something from these scumbags is fining them something like 100k per hour, until they start respecting people’s privacy

        • harmonea
          link
          fedilink
          7
          edit-2
          1 year ago

          I did already say I don’t expect this to ever change, so “sorry mate,” but you’re not exactly telling me anything I don’t know here.

          But I suspect this was a knee-jerk rant typed before bothering to read past what you quoted. Oh well. Good thing I can still stand against something even if I don’t expect it to change much.

          • @[email protected]
            link
            fedilink
            31 year ago

            Sorry if it sounded rude (and yeah, it was kind of a rant, sorry). What I’m trying to say is: these people do much worse things and don’t bother to say “sorry” publicly. The only way to make them behave is to fine them by a huge amount, just like Norway did.

            • harmonea
              link
              fedilink
              51 year ago

              Well, we can agree on that! Make paying contributors the cheaper option.

              I won’t hold my breath though. :')

  • @[email protected]
    link
    fedilink
    English
    131 year ago

    They’re honestly doing you a favor. Grammarly is terrible. I’ve seen some of my friends whose first language isn’t English use it to try to clean their grammar up and it makes some really weird, often totally mistaken choices. Usually they would have been better off leaving it as they wrote it.

  • Cryptic Fawn
    link
    fedilink
    English
    101 year ago

    I wonder if ProWritingAid is doing the same now. I always preferred them over Grammarly.

    • Eochaid
      link
      8
      edit-2
      1 year ago

      They have a free tier and a $10/mo tier and prominently advertise their AI without any information about privacy. Guaranteed you and your text are the product being used to train their AI.

  • @damnthefilibuster
    link
    English
    81 year ago

    Any scope of privacy conscious users banding together to create a shell corp to pay for a business account? 500 users sounds doable. More the merrier, yeah?

    • @[email protected]
      link
      fedilink
      English
      111 year ago

      Alternatively, you switch to LanguageTool because it does the same thing but it is privacy minded.

    • @[email protected]
      link
      fedilink
      221 year ago

      I am perfectly fine with providing training data for AI, and have actually spent hours contributing to various projects. However, it is super scummy for a company to collect and use sensitive user data (literally a keylogger) not only without any form of communication or consent, but where the only way to opt out is to pay.

    • @[email protected]
      link
      fedilink
      181 year ago

      Stuff like this should always be opt-in. It looks better on the company and builds trust.

      Ideally, offer payment for users who opt-in to have their writing scraped and used to train AI.

      Seems like this could easily be a win-win situation if they gave it a few seconds of thought.

    • @[email protected]
      link
      fedilink
      181 year ago

      Uhh umm… You are the product! Aaand… Shill for greedy corporations!

      I remember when Google said quite openly that they’d give us email addresses with more storage than we’d ever dreamed for life and in return, they’d scan the first few sentences of all messages and use them to target ads at us and we were all like, “Sounds fair.”

    • @xantoxis
      link
      171 year ago

      Why do you assume everyone wants this garbage? We were fine without it.

      • @[email protected]
        link
        fedilink
        4
        edit-2
        1 year ago

        That’s basically what people said about:

        • mobile phones
        • The web
        • computers
        • calculators

        I’d also argue that the customers of grammarly want this because they are paying for it. At least In the extension or app

  • @[email protected]
    link
    fedilink
    71 year ago

    Think about this every time you or a project you contribute to is using Microsoft GitHub instead of an open source offering (or self-hosted) or folks contributing to your permissive-licensed project living elsewhere while using Microsoft GitHub Copilot. All your projects and that force-push history clean up now belong to the Microsoft-owned AI that sells itself back to the developers that wrote all the code it trained on—no compensation, no recognition.