I don’t know if this community is intened for posts like this, if not, I’m sorry and I’ll delete this post ASAP…

So, I play TTRPG (mostly online) and I’m a big fan of visual aids, so I wanted to create some chahrcter images for my charakter in the new campaign I’m playing in. I don’t need perfect consistency as humans usually change a little over time and I only needed the character to be recognizable on a couple of images that are usually viewed on their own and not side by side, so nothing like the consistency you’d need for a comic book or something similar. So I decided to create a Textual Inversion following this tutorial and it worked way better than expected. After less than 6 epochs I had a consistency that was enough for my usecase and it didn’t start to overfit when I stopped the training around epoch 50.

Generated image of a character wearing a black hoodie standing in a rundown neighborhood at night Generated image of the character wearing a black hoodie standing on a street Gerneated image of the character cosplaying as Ironman Generater image of the character cosplaying as Amos from the Expanse

Then my SO, who’s playing in the same campaign asked me to do the same for their character. So we went through the motions and created and filtered the images. A first training attempt had the TI starting to overfit halfway through the second epoch, so I lowered the learning rate by factor five and started another round. This time the TI started overfitting somewhere around epoch 8 without reaching consistency before. The generated images alternate between a couple of similar yet distinguishable faces. To my eye the training images seem to have a simliar or higher quality than the images I used in the first set. Was I just lucky with my first TI and unlucky with the other two and simply should keep on trying or is there something I should change (like the learningrate that still seems high to me with 0.0002 judging from other machine learning topics)?

  • @SaucyGoodness
    link
    English
    1
    edit-2
    1 year ago

    I can’t really help with training textual inversions as I’ve never done it (and I think Loras are better anyway), but the absolute easiest way to get consistent faces is to just use a mix of celebrities in the prompt. If you have (David Tennant | Keanu Reeves) in there, it’ll give you a pretty consistent character without having to bother with training anything. It’s all a little bit dependant on model used and style, but realistically, it’s the fastest and easiest way to do it.

    Edit: not what you asked for, of course, but since you didn’t seem to fussy about it, I figured I’d suggest it anyway.

    • @deathxbyxtaxes
      link
      English
      21 year ago

      That’s a great tip, thanks for posting it. Seems to me both methods are useful, just use case dependent.