I don’t know if this community is intened for posts like this, if not, I’m sorry and I’ll delete this post ASAP…

So, I play TTRPG (mostly online) and I’m a big fan of visual aids, so I wanted to create some chahrcter images for my charakter in the new campaign I’m playing in. I don’t need perfect consistency as humans usually change a little over time and I only needed the character to be recognizable on a couple of images that are usually viewed on their own and not side by side, so nothing like the consistency you’d need for a comic book or something similar. So I decided to create a Textual Inversion following this tutorial and it worked way better than expected. After less than 6 epochs I had a consistency that was enough for my usecase and it didn’t start to overfit when I stopped the training around epoch 50.

Generated image of a character wearing a black hoodie standing in a rundown neighborhood at night Generated image of the character wearing a black hoodie standing on a street Gerneated image of the character cosplaying as Ironman Generater image of the character cosplaying as Amos from the Expanse

Then my SO, who’s playing in the same campaign asked me to do the same for their character. So we went through the motions and created and filtered the images. A first training attempt had the TI starting to overfit halfway through the second epoch, so I lowered the learning rate by factor five and started another round. This time the TI started overfitting somewhere around epoch 8 without reaching consistency before. The generated images alternate between a couple of similar yet distinguishable faces. To my eye the training images seem to have a simliar or higher quality than the images I used in the first set. Was I just lucky with my first TI and unlucky with the other two and simply should keep on trying or is there something I should change (like the learningrate that still seems high to me with 0.0002 judging from other machine learning topics)?

  • @BrianTheeBiscuiteer
    link
    English
    11 year ago

    I think TI is really a hit or miss method and I don’t believe everything in existence can be represented as a TI. I tried 14 different sessions of trying to train on a face and at best it looked like a 1st cousin that was having an allergic reaction. I tried a hypernetwork and got much better accuracy on my first attempt, although very overfitted.

    I’ve heard that Dreambooth is still the best for accuracy so I’ll be by trying that next (you can make a DB model then extract a Lora from that).