What is Cara, the Instagram alternative that gained 600k users in a week?

ekZepp · 8 months ago

What is Cara, the Instagram alternative that gained 600k users in a week?

@doodledup · 8 months ago

I don’t understand how this Glaze thing is supposed to stop AI being trained on the art.

@General_Effort · 8 months ago

It’s not. It’s supposed to target certain open source AIs (Stable Diffusion specifically).

Latent diffusion models work on compressed images. That takes less resources. The compression is handled by a type of AI called VAE. For this attack to work, you must have access to the specific VAE that you are targeting.

The image is subtly altered so that the compressed image looks completely different from the original. You can only do that if you know what the compression AI does. Stable Diffusion is a necessary part of the Glaze software. It is ineffective against any closed source image generators that have trained their own VAE (or equivalent).

This kind of attack is notoriously fickle and thwarted by even small changes. It’s probably not even very effective against the intended target.

If you’re all about intellectual property, it kinda makes sense that freely shared AI is your main enemy.

@[email protected] · edit-2 8 months ago

deleted by creator

@go_go_gadget · 8 months ago

if it’s human-viewable it’ll also be computer-viewable

Sort of. If you raise a person to look at thousands pictures of random pixels and say “that’s a fox” or “that’s not a fox” eventually they’ll make up a pattern to say if the random pixels are a fox or not. Meanwhile someone raised normally will take one look and go “that’s just random pixels it’s not a picture of anything”. AI is still in that impressionable stage. So you feed it garbage and it doesn’t know it’s garbage.

@General_Effort · 8 months ago

I’m sure it works fine in the lab. But it really only targets one specific AI model; that one specific Stable Diffusion VAE. I know that there are variants of that VAE around, which may or may not be enough to make it moot. The “Glaze” on an image may not survive common transformations, such as rescaling the image. It certainly will not survive intentional efforts to remove it, such as appropriate smoothing.

In my opinion, there is no point in bothering in the first place. There are literally billions of images on the net. One locks up gems because they are rare. This is like locking up pebbles on the beach. It doesn’t matter if the lock is bad.

Saw a post on Bluesky from someone in tech saying that eventually, if it’s human-viewable it’ll also be computer-viewable, and there’s simply no working around that, wonder if you agree on that or not.

Sort of. The VAE, the compression, means that the image generation takes less compute; ie cheaper hardware and less energy. You can have an image generator that works on the same pixels, visible to humans. Actually, that’s simpler and existed earlier.

By Moore’s law, it would be many years, even decades, before that efficiency gain is something we can do without. But I think, maybe, this becomes moot once special accelerator chips for neural nets are designed.

What makes it obsolete is the proliferation of open models. EG Today Stable Diffusion 3 becomes available for download. This attack targets 1 specific model and may work on variants of it. But as more and more rather different models become available, the whole thing becomes increasingly pointless. Maybe you could target more than one, but it would be more and more effort for less and less effect.

@[email protected] · 8 months ago

Not only is this kind of attack notoriously unstable, finding out what images have been glazed is a fantastic indicator for finding high-quality art that is the stuff you want to train on.

@General_Effort · 8 months ago

I doubt that. Having a very proprietary attitude towards one’s images and making good images are not related at all.

Besides, good training data is to a large extent about the labels.

@Etterra · 8 months ago

It pollutes the data pool. The rule of gigo (garbage in garbage out) is used to garbage the AI results.

Basically, it puts some imperceptible stuff in the image file’s data (somebody else should explain how because I don’t know) so that what the AI sees and the human looking at the picture sees are rather different. So you try and train it to draw a photorealistic car and instead it creates a lumpy weird face or something. Then the AI uses that defective nonsense to learn what “photorealistic car” means and reproduce it - badly.

If you feed a bunch of this trash into an AI and tell it that this is how to paint like, say, Rembrandt, and then somebody uses it to try to paint a picture like Rembrandt, they’ll end up getting something that looks like it was scrawled by a 10-year-old, or the dogs playing poker went through a teleporter malfunction, or whatever nonsense data was fed into the AI instead.

If you tell an AI that 2+2=🥔, that pi=9, or that the speed of light is Kevin, then nobody can use that AI to do math.

If you trained Chat GPT to explain history by feeding it descriptions of games of Civ6 them nobody could use it to cheat on their history term paper. The AI would go on about how Gandhi attacked Mansa Musa in 1686 with all out nuclear war. It’s the same thing here, but with pictures.

@egeres · 8 months ago

Right but, AFAIK glaze is targeting the CLIP model inside diffusion models, which means any new versions of CLIP would remove the effect of the protection