@AdComfortable1514

@AdComfortable1514

Feed it an image, get the closest matching prompt out of a set of written alternatives.

Try it here: https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator

Example

Input image from google

Result when running the output prompt on perchance text-to-image model

Source code for this version : https://github.com/pharmapsychotic/clip-interrogator

In this source code, here are the list of pre-made “prompt fragments” this module can spit out:

You can write any prompt fragments in a list, and the output will be the “closest matching result”.

There are other online variants that use CLIP , but sample a different prompt library to find the “closest match” : https://imagetoprompt.com/tools/i2p

The difference between these two is just what “pre-written prompt fragments” they have choosen to match with the image. Both use the CLIP model.

What is it?

The CLIP model a part of the Stable Diffusion model , but it is a “standalone” thing that can be used for other stuff than just image generation

It would be nice to have the CLIP model available as a standalone thing on perchance.

The CLIP model : https://github.com/openai/CLIP/blob/main/CLIP.png

Practical use cases

Making for example a “fantasy character” image-to-prompt generator (note the order) by feeding it “fantasy prompt fragments” to match with

How does this work?

The CLIP model is where the “magic” happens.

Image below describes how CLIP is trained. It matches words with images in this kind-of-grid style format

The CLIP model can process an image , or a text , and both will generate a 1x768 vector that are “the same” .

That’s the “magic”.

How to make your own purpose-built CLIP interrogator on perchance (assuming this is a feature)

Tokenizer creates 77-token prompt chunk embedding from any kind of text (“the prompt”)

CLIP processes a 77-token prompt chunk embedding into 1x768 text encoding

Image (any kind) becomes a 1x768 image encoding (this requires GPU resources).

Image encoding A and Text encoding B “match” are calculated using cosine similarity via the formula below

Where the final range from 1 to 0.

1 = 100% match between image A and text B encodings , and 0 = no match at all.

Do this for 1000 text encodings and pick the “text” that gives the highest cosine similarity.

That’s it. Now you have a “prompt” from an image.

//—//

⚄︎ Perchance

This is a Lemmy Community for perchance.org, a platform for sharing and creating random text generators.

Feel free to ask for help, share your generators, and start friendly discussions at your leisure :)

This community is mainly for discussions between those who are building generators. For discussions about using generators, especially the popular AI ones, the community-led Casual Perchance forum is likely a more appropriate venue.

See this post for the Complete Guide to Posting Here on the Community!

Rules

1. Please follow the Lemmy.World instance rules.

The full rules are posted here: (https://legal.lemmy.world/)
User Rules: (https://legal.lemmy.world/fair-use/)

2. Be kind and friendly.

Please be kind to others on this community (and also in general), and remember that for many people Perchance is their first experience with coding. We have members for whom English is not their first language, so please be take that into account too :)

3. Be thankful to those who try to help you.

If you ask a question and someone has made a effort to help you out, please remember to be thankful! Even if they don’t manage to help you solve your problem - remember that they’re spending time out of their day to try to help a stranger :)

4. Only post about stuff related to perchance.

Please only post about perchance related stuff like generators on it, bugs, and the site.

5. Refrain from requesting Prompts for the AI Tools.

We would like to ask to refrain from posting here needing help specifically with prompting/achieving certain results with the AI plugins (text-to-image-plugin and ai-text-plugin) e.g. “What is the good prompt for X?”, “How to achieve X with Y generator?”
See Perchance AI FAQ for FAQ about the AI tools.
You can ask for help with prompting at the ‘sister’ community Casual Perchance, which is for more casual discussions.
We will still be helping/answering questions about the plugins as long as it is related to building generators with them.

6. Search through the Community Before Posting.

Please Search through the Community Posts here (and on Reddit) before posting to see if what you will post has similar post/already been posted.

[T2i][Tool] [Request] CLIP Interrogator - create prompt from an image

[T2i][Tool] [Request] CLIP Interrogator - create prompt from an image