I understand that, when we generate images, the prompt itself is first split into tokens, after which those tokens are used by the model to nudge the image generation in a certain direction. I have the impression that the model gets a higher impact of one token compared to another (although I don’t know if I can call it a weight). I mean internally, not as part of the prompt where we can also force a higher weight on a token.

Is it possible to know how much a certain token was ‘used’ in the generation? I could empirically deduce that by taking a generation, stick to the same prompt, seed, sampling method, etc. and remove words gradually to see what the impact is, but perhaps there is a way to just ask the model? Or adjust the python code a bit and retrieve it there?

I’d like to know which parts of my prompt hardly impact the image (or even at all).

  • @randon31415
    link
    English
    52 years ago

    At the bottom of stable diffusion, there is a script called x/y/z. One of the subtypes is the prompt S/R. Let’s say your prompt was ‘dog with orange hair’ . You put in ‘dog, cat, mouse, human’ into the S/R text box. This will search your prompt for the first word (or set of words: I like switching out artists names) and generate a picture swapping out the first word for the next in the list. You can even do another one where you type in ‘orange, red, blue, green’ and you can get a grid of pictures with the first one a orange dog, and the other corner a green human.