Photoshoot in Colter +LoRA [Album]

Cavendish · edit-2 1 year ago

Photoshoot in Colter +LoRA [Album]

NSFW

Cavendish · 1 year ago

There’s not much out there on training LoRAs that aren’t anime characters, and that just isn’t my thing. I don’t know a chibi from a booru, and most of those tutorials sound like gibberish to me. So I’m kind of just pushing buttons and seeing what happens over lots of iterations.

For this, I settled on the class of place. I tried location but it gave me strange results, like lots of pictures of maps, and GPS type screens. I didn’t use any regularization images. Like you mentioned, i couldn’t think of what to use. I think the regularization would be more useful in face training anyway.

I read that a batch size of one gave more detailed results, so I set it there and never changed it. I also didn’t use any repeats since I had 161 images.

I did carefully tag each photo with a caption .txt file using Utilities > BLIP Captioning in Kohya_ss. That improved results over the versions I made with no tags. Results improved again dramatically when I went back and manually cleaned up the captions to be more consistent. For instance, consolidating building, structure, barn, church, house all to just cabin.

Epochs was 150, which gave me 24,150 steps. Is that high or low? I have no idea. They say 2000 steps or so for a face, and a full location is way more complex than a single face… It seems to work, but it took me 8 different versions to get a model I was happy with.

Let me know what ends up working for you. I’d love to have more discussions about this stuff. As a reward for reading this far, here’s a sneak peek at my next lora based on RDR2’s Guarma island. https://files.catbox.moe/w1jdya.png. Still a work in progress.

@[email protected] · 1 year ago

Oof. Dude. You’re not wrong about what is and isn’t available online. But it’s okay. New frontier or whatever. Haha.

I’ve been mulling over the regularization image thing, so I created a reddit post asking about it, but I basically asked, “are these images supposed to represent what the model thinks ‘this’ thing is, and in that case, regularization images would serve the role of being ‘this, but not this’” or is it more like, “these fill in the gaps when the LoRA is lacking?”

I suspect it’s more like the first. That said, it might actually make sense to include all the defective and diverse images for the purpose of basically instructing the LoRA/model to be like, “I know you think I’m asking for ‘this,’ but in reality, that’s not what I want.”

If that’s the case, it might make sense to ENSURE your regularization images are way off base and messed up or whatever. Or at least anything in the class that you know you def don’t want.

I don’t have confirmation of any of this. I’m VERY new here (like ran my first LoRA training yesterday).

I like the idea of your batch size.

Ah. The captioning is something I REALLY need to think about. I’m guessing the cabin caption idea you used, basically you lost flexibility but gained accuracy by going that approach? I wonder if you could tag it ‘cabin, church’ and retain some of both?

The steps, to me, sound very high, but I can’t say, for sure. Ahaha. Because for people, I’ve heard 1500 to 3000.

I’ll be sure to come back and share findings once I have more. I think to really “do this right” you HAVE to train some of your own shit, but to do it well, as you’ve quickly realized, you’ve got to understand the methodology/philosophy of how it’s done.

@[email protected] · 1 year ago

Well. Maybe scratch some of what I said above. As with many things, the answer is simply more complicated than that.

I found this video fairly useful in helping understand the process. I hope it helps.

https://youtube.com/watch?v=EehRcPo1M-Q