Why Can’t ChatGPT Draw a Full Glass of Wine? - Alex O'Conner

Theo · 3 months ago

Why Can’t ChatGPT Draw a Full Glass of Wine? - Alex O'Conner

remotelove@lemmy.ca · 3 months ago

This video was nice and simple.

It really drives home the point that chat bots aren’t actually creative and, in simple terms, just spit out averages and probabilities.

benignintervention · 3 months ago

Sounds like Francis Galtin’s Ox

“The classic wisdom-of-the-crowds finding involves point estimation of a continuous quantity. At a 1906 country fair in Plymouth, 800 people participated in a contest to estimate the weight of a slaughtered and dressed ox. Statistician Francis Galton observed that the median guess, 1207 pounds, was accurate within 1% of the true weight of 1198 pounds.This has contributed to the insight in cognitive science that a crowd’s individual judgments can be modeled as a probability distribution of responses with the median centered near the true value of the quantity to be estimated.”

Theo · 3 months ago

It is just an over-powered autocorrect, that same concept applies to images, just repeating patterns of pixels.

webghost0101@sopuli.xyz · edit-2 3 months ago

Image generation uses DallE and is not Baked into the model.

All it does is give it a prompt. Generates and shows you the results for that prompt.

You can click the pictures to see that prompt and you will see that it verbosely requested it overflowing but dalle does not always interpret that prompt the same, actually i found llms rather suck at prompting image generation models because they behave very strongly on certain words.

A similar experiment was done with “street with no lanterns” always resulting in a lantern.

curiousaur@reddthat.com · 3 months ago

If I ask you to imagine a street with no lanterns, are you imagining lanterns or no lanterns?

webghost0101@sopuli.xyz · edit-2 3 months ago

Lanterns of course.

Would take an image generation model at least 3 steps which it doesn’t have right now.

A review step to see if the output matches the prompt.

A identification step to detect elements that don’t match

A redo step to mix that area in the background image (remove) or regenerates an improvement.

Right now you cant iterate on images. Every minor tweak is a completely new image. At least not with dalle because you cant control the seed.

Theo · 3 months ago

It might be more accurate to use something like Leonardo.AI rather than ChatGPT because you can edit existing images as needed and set the seed. You can even keep a consistent ‘character’ and reuse it in many pictures. Its dreamshaper model is based on SD. I have had the most accurate results with Leonardo. I don’t use ChatGPT/Dall-E for images, it uses too much on a free plan.

billwashere · 3 months ago

And this is a prime example of why these trained models will never be AGIs. It only knows what it’s been trained on and can’t make inferences or extrapolations. It’s not really generating an image’s much as really quickly photoshopping and merging images it already knows about.

kakihara123 · 3 months ago

How sure are you that human brains don’t work in a similar way? Can you create something that you have never seen before without using known images you have in your head to create the result?

billwashere · 3 months ago

Well from a cognitive standpoint you’re absolutely correct, we aren’t a 100% sure how intelligence works or even how to properly define it. But I can absolutely think up things I’ve never seen before. And it’s easy to see how that’s possible. Look at any fantasy or science fiction art. That was all created without ever seeing it before because it doesn’t exist. In my opinion current AI completely lacks imagination. It only knows what it’s been trained on. And since people are pretty good with imagining things, we’ve created lots of art that doesn’t exist in real life and now the AI has been trained on this extensive art and is now pretty good at faking imagination.

I am by now means a cognitive specialist so I could be completely wrong.

Theo · 3 months ago

It’s just patterns of pixels. It recognizes an apple as just a bunch of reddish pixels etc, then when given an image of a similar colored red ball, or something, it is corrected until it ceases to recognize something not an apple as an apple. It really does not know what an apple looks like to begin with. It’s like declaring a variable. The computer does not know what the variable really means just what to equate it to.

paraphrand · edit-2 3 months ago

This makes me ponder the assertions that these are exotic compression algorithms.

I’m down with compression being the mechanism asserted for legal reasons.