Oof… At work we deal with clients whose projects are covered by NDAs and confidentiality agreements, among other things. This is bad enough if the information scanned is siloed per organization, as it could create a situation where somebody not under NDA could access confidential client info leaked by an LLM that ingested every PDF in Adobe’s cloud service without regard to distribution. Even worse if they’re feeding everything back into a single global LLM – corporate espionage becomes as simple as a bit of prompt engineering!
I highly doubt that they would be able to use private user data for training. Using data available on the internet is a bit legally grey, but using data that is not publicly available would surely be illegal. When the document is “read” by the LLM it is no longer training, so it won’t store the data and be able to regurgitate it.*
* that is, if they have designed this in an ethical and legal way 🙃
They will use every scrap of data you haven’t explicitly told them not to use, and they will make it so that the method to disable these ‘features’ is little known, difficult to understand/access, and automatically re-enabled every release cycle. When they are sued, they will point to announcements like this and the one or two paragraphs in their huge EULA to discourage, dismiss, and slow down lawsuits.
I suspect that they will explicitly advertise that they won’t be using any data for training. Just like Microsoft Copilot enterprise (or whatever it’s called) and Bing chat enterprise.
Companies absolutely know the risk with these systems and will never allow or buy a system that scans and saves their data.
I had a second part of my comment that I left off because I felt like I was hitting the point too hard, but…
I have firsthand knowledge of an organization that’s a GCC tenant. That’s the government cloud, and in mid-2022 Microsoft rolled out a product called Microsoft Viva without first consulting with platform admins. They just pushed it out into M365, activated and enabled. A personalized automated email was sent out to every person within the org, from Microsoft.com, with snippets of emails deemed to be follow up items by “Cortana” - which platform admins had disabled on every computer within the org. It was pretty clear that Microsoft had exfiltrated government data, analyzed it, and then sent emails to users regarding their analysis.
Platform admins did find a way to disable it within a few days, and leadership sent out an email characterizing the episode as a misconfigured, early release feature to assuage concerns. They promised to get to the bottom of it with Microsoft, and nothing was ever heard about it again.Then earlier this year - multiple pushes of consumer apps and features which are not released on the GCC roadmap. Automatic install of New Teams, which - thankfully, displays a message that the user isn’t licensed for it, but that creates IT tickets because it auto-launches and disables classic teams from auto-launching. Lots of user confusion there. New Outlook, which didn’t support data classification, multiple mailboxes, and many of the features that make Outlook useful. It’s been a huge boondoggle as users have enabled new Outlook, and then don’t know how to switch back to a working version of Outlook. Recently everyone’s PowerBI began failing to launch, because Microsoft rolled out a OneDrive/SharePoint integration without testing it. Same with HP Print manager.
My point in all that is not just to have a laundry list of Microsoft failures. I have a list for Adobe, too, but it’s to establish that updates are not vetted, and often just pushed into the wrong update channels.
When pressed, it’s always a ‘configuration error’ or an accidental early release. A bug or what-have-you.The line from annoying to dangerous is going to be quickly crossed once these companies start training AI on the harvested PII and government data they’ve procured through the sloppy deployment practices they’re already engaged in.
I guarantee you that rogue hackers and nation states alike are working on fuzzing every AI dataset they can, to see if it picked up anything juicy. Once Adobe gets their hands on everyone’s scanned health record, classified documents, and credit card application, we’re going to see an endless stream of ‘whoopsies.’
All the ones I’ve seen that are aimed at companies have explicit terms that protect your data and don’t allow it to be shared anywhere.
But that’s just like, a suggestion, man.
And it’s kind of predicated on their admins being highly proactive about data protection, because the vendors certainly aren’t.
- that is, if they have designed this in an ethical and legal way 🙃
Thus is adobe we’re talking about…
A corporation that charges a monthly subscription for products it could sell outright. Offers it to students at a time when they are most likely to develop habits in it, uses a proprietary storage format that only works well with their products.
Once you get a customer addicted, you’ve got them for life.
Does the AI include a feature that converts the bloated, non-functional hulk of an application that is Adobe Acrobat into a usable, fit-for-purpose PDF viewer/writer/editor with a consistent interface? Oo I really hope it does, that would be really helpful.
Check out SumatraPDF. When I started a job with a ton of random PDF paperwork to fill out, I needed to find something to use. It’s awesome. And free.
If you just need to fill out forms, you can just use Firefox (and probably Chrome).
I need to do drawing markups. Bluebeam does a good job, my current company refuses to get it and insists that Acrobat Pro is functional. I feel like thats something that someone who never has to use Acrobat Pro has decided.
Wow, that’s stupid. I just looked it up and it costs a few hundred per year, which is probably way less than you waste using a bad tool. If I was your manager, I’d get it for you.
zathura or evince ftw
“To learn more about the capabilities, and whether or not we’ll allow you to disable them,…”
I hope governments around the world punish this kind of espionage by publicly banning these “AI assisted” products in government organizations for being a (national) security threat. Their PR needs to be in the dirt.
Paperless-ngx is an awesome way to self host your documents.
Brian Krebs, one of the most well known security bloggers:
Ya know, AI has really pushed the Cyber Crime field years into the future! Adobe made an excellent decision adding it to their suite of technology used by businesses around the world!
It’s almost like they don’t have enough money already.
At work, this is actually very useful thing.
Useful for exposing PII, sure
For the past 24 hours, I’ve been arguing with somebody who insists AI should be given our political, religious, and brand biases in order to tailor a search engine that will only show us results we are comfortable with.
In a privacy community.
Things are going to get dumber before they get better.
I admire your
optionoptimism about it ever getting better.Are we sure that things won’t keep getting dumber before we all get used to it?
It’s a win-win… If I’m wrong, nobody will be smart enough to remember. 😉
Oh look what I stumbled upon, a blatant lie. It is a habit misreprenting other people’s opinion to get virtual internet points, I see. Dude, there is not even karma here…
Nobody claimed that you “should” do anything. However, you don’t understand the concept of privacy and how it relates to one’s agency, so maybe read up before crying about things getting dumber.
Nobody attacked your sovereignty. I take issue with your attempts to seduce other people into using a product from a corporation that wants to harvest incredibly valuable private data.
You aren’t unique.
The privacy community recently clowned on a similar piece of propaganda, that says “I don’t care about my privacy” but the argument is “you shouldn’t care about your privacy, either.”
You still completely misrepresent my opinion. Let’s also remember that the product you base your whole argument on ATM doesn’t harvest any data, and doesn’t intend to do so in the foreseeable future.
You:
a corporation that wants to harvest incredibly valuable private data.
Them: https://m.youtube.com/watch?v=DRVY-74lkBA&pp=ygUUa2FnaSBjb21tdW5pdHkgZXZlbnQ%3D
Minute 50:30
I quote:
what is your opinion on opt-in tracking?
we don’t need to do any tracking so I don’t have an opinion. I don’t know why you would track users.
Your who argument is based on a product that does not exist yet, and if it existed it could 100% exist in a privacy respecting way depending on who design the technology and what incentive they have (what if I could run it locally? What if I could selfhost? Etc.).
So I take issue with first of all your dishonest way to represent my opinion and slander my own reputation especially since I care and invest a lot into the privacy. I also take issue with the fact that you are somehow basing your whole criticism on your own interpretation of a sentence in a manifesto while ignoring all data points that contradict your view.
You called “a VC funded” company a company which is (according to them, maybe you have better sources) owned for 98% by employees, for example.
So yeah, I take issue with your gross lack of care for the reality and for facts. Basically at this point you are a tinfoil hat level conspirationist who is purposefully twisting my opinion to make seem your position reasonable. I would never user a product that ‘harvests’ my data to make a bubble around me, for example, and you asked this in your last comment in the conversation, in which I answered directly, and yet you completely misrepresented my opinion as “suggests that we should give data”, which is completely false.
And if I care it’s because I can’t wait for more companies to adopt business models that do not require them to fuck users over for their data, and can afford to have privacy as one of the core value. If companies that do this succeed, we have hope that more will do the same.
So ultimately I don’t give a fuck about the trend that you think you are seeing within the privacy world. This doesn’t give you any right to take my opinion, twisting it and misrepresenting it so that you can feel like your statistics or perception is right.
“I don’t care about my privacy” but the argument is “you shouldn’t care about your privacy, either.”
This is not an argument I believe, it’s not an argument I made nor support. So why would I care?
You want to know what I believe? I believe that privacy means that people can choose to give pieces of their data, whichever they want, whenever they want. If that data is used only for purposes they agree with and for nothing else, and if they understand fully the implications, and if that data is not given, sold, or accessed by other parties which were not intended, then this is a privacy respecting service.
Compare it with the very same screenshot in this thread: you want to use this service? You need to give data to us to train AI. I say, fuck this. The lack of agency and the fact that the data is given for the benefit of the company exclusively makes it completely different.
I can’t be clearer than this, so if you fail to acknowledge the fact that you grossly misrepresented my opinion, I need to conclude that you are discussing purely in bad faith.
P.s. I am now blocking you. I have no interest to discuss further with someone whose ego is bigger than their intellectual honesty. You seem to care more about being the white knight than about understanding. Everything that had to be said has been said, and anybody who wants to make a proper opinion can check our history and read what was actually said.
Holy shit lol
Yeah, at this time I think you should rise a white flag.
If the AI software is on-premises/offline or completely turned off by default. Anything else is espionage.
Ideally, consider using open-source software if possible to avoid potential bait-and-switches. But understandably, this is not always a practical option.
I will let our IT/legal/cyber security department deal with that.