AI chat privacy question

@JRL · 1 year ago

AI chat privacy question

@perchance · edit-2 3 months ago

Prompts and responses are not stored - i.e. once the server sends back the response, the text you sent and the generated response text should exist on your computer only. However note that, as @[email protected] says, it is a bad idea to put sensitive personal info (i.e. more than perhaps your first name) into any online service like this, even if you fully trust the person/company running it and they’ve assured you that it’s 100% private.

If you want to run AI text models (“LLMs”) completely privately, this is probably the best place right now to learn more: https://www.reddit.com/r/LocalLLaMA

You can stop reading now if the point is taken, but if not, here is (what accidentally became) a wall of text to scare you:

I recently realised that if there happens to be an error during inference, then the server “leaks” the prompt into the (temporary) server logs (this was actually fixed a few hours ago: https://github.com/huggingface/text-generation-inference/releases/tag/v1.2.0 ). This particular issue is basically harmless because the server logs themselves obviously aren’t public, and they’re not saved to a persistent drive, but it gives you an idea of the sort of thing that can occur even if you fully trust the provider and they’re acting benevolently.
If there’s a bug where e.g. someone’s requests are bringing down the server (e.g. tokenizer issue and stop sequence stuff have caused problems in the past) either accidentally or maliciously, investigating it often requires temporarily logging of the problematic requests. It’s impossible for a provider to claim with 100% certainty that your data will never be seen by a human. This is just the nature of building a complex platform, debugging it, fending off malicious users, etc.
If you’re using a generator on Perchance that someone else made (i.e. not one that you have coded yourself), then note that generator authors can code up arbitrary interfaces with arbitrary logic, so they could send your inputs off to their own server. This is of course the same as any other page on the internet that accepts user input, although Perchance does have the slight advantage that all generators have publicly-viewable and non-obfuscated code (via the edit button in the top-right of the page), so it’s not as simple for a coder on Perchance to get away with this, especially if the generator has become decently popular - someone would notice. Either way, you’re running some random person’s code - by default you should assume it is unfriendly.
Services can get hacked, and the more popular they become, the bigger the target painted on them. If a service has user accounts associated with the user chat data then if someone gets access to the database, they have a list of emails (or phone numbers, or whatever), with all the associated user data. All the benevolence in the world won’t make you immune from this. To be clear, Perchance does not associate user accounts with AI plugin requests - Perchance accounts are purely for people who want to build generators on Perchance - i.e. the only data that’s associated with a Perchance user account is the generators/pages that they’ve created. All requests to the AI plugin servers are anonymous, regardless of whether you’re logged in or not. But if you embed your personal info within those AI requests (i.e. in the text prompt) then nothing can save you, since a compromised server would mean the attacker could see all the data flowing through the server - i.e. prompts that include your sensitive personal info.

TL;DR: You should limit the personal info that you put into web pages/apps on the internet, regardless of any assurances given, even if you trust the dev/company. Swap out personal info for fake stuff, and if that’s not possible, then it’s time to save up for a second-hand RTX 3090.

While I’m here, @[email protected], IIRC someone asked a similar question (here or maybe on the /hub) about the text-to-image-plugin and I think you answered it, but just to be clear, the answer is the same as above: Images are only stored if they save it to the gallery, otherwise it’s gone forever. I’ll link this comment on both the image and text AI plugins.

Some minor notes:

The (currently undocumented) responseObj.submitUserRating feature of the ai-text-plugin allows users to rate a response to help improve the AI. If you submit a rating on a generator that uses this feature (like the thumbs up/down on perchance.org/ai-chat), then the response that you’re rating is temporarily stored as part of a scoring process to determine which of several “candidate variations” of an LLM is best (varying settings, model, etc.). The generator author should make it clear that ratings should not be submitted for text that contains personal data - I may make the plugin automatically show an informational message about this in a future update.
Some aggregate statistics are collected about prompts that are flowing through the manager server - e.g. I started collecting the ratio of PG-13 vs not on the image generation server so that I’d know if I accidentally broke my detection algorithm (such that e.g. it was flagging everything as nsfw, which happened once due to a regex error). Nothing in these aggregate statistics is even remotely private/personal. Just extremely high-level numerical statistics aggregated from literally millions of requests.
There’s also a script which tracks how many requests each IP has made in the past 2 days, which allows me to do rate limiting, and track abusive IPs. Again, prompts are not stored so there’s no association between IPs and prompts - it’s literally just a counter that says this IP made this many requests, and it’s cleared every 2 days.
As you probably know, the AI plugin servers are funded by ads. Perchance in general doesn’t have any ads, and has always been completely free, but the AI plugins are way too expensive to fund out of my own bank account. So if you’re not logged in, you’ll see ads on generators that use AI plugins. I figured it’s worth mentioning here that unlike basically every other ad-funded site on the internet, Perchance does not trust ads. Perchance has a sand-boxed separation between the actual generator/page contents (which live in a “iframe” - it’s basically like a separate browser tab embedded within the page), and the place where ad code runs – so ads cannot look at your chat/text/image/etc. data in order to guess at more relevant ads. Perchance uses a very reputable advertising company (same one used by Reuters and Aljazeera and several other large companies) so the likelihood of shady ad tech is already extremely low, but there’s no need for any trust here, thanks to the sand-boxing that Perchance has. So, in terms of showing you more relevant ads, all they can possibly see is the URL of the page that you’re on. That’s the only thing that’s exposed to ad serving algorithms by visiting a Perchance page, no matter how much information you input into a Perchance generator/page.

@JRL · 1 year ago

Thank you! That answers my every question and more. I wish every service provider would be as transparent as this.

april · 1 year ago

Assume any service offering AI is logging everything. The only way to be sure is to run the model on your own hardware.

VioneT · 1 year ago

Pinging dev @[email protected].