Today, we’re headed to the frozen north. Dispite the snow on the ground, the sun is out and the light is perfect for a brisk shoot at the weather-worn cabins of Colter.

Two months ago, I fell into the trap that is Stable Diffusion. Today, I released my first trained model based on the snowbound town of Colter from Red Dead Redemption 2. For anyone interested in SD image generation, you can grab a copy at CivitAI. https://civitai.com/models/137327. I’d appreciate you taking a look, and giving it a like or a rating if you’re so inclined. The LoRA model is stylistically versatile, and there’s a bunch of SFW examples I made of its range.

As always, images link to full-size PNGs that contain prompt metadata.


  • @[email protected]
    link
    fedilink
    English
    71 year ago

    Thanks for your contributions to the community!

    I have questions if you don’t mind.

    In really trying to get into LoRA training in general and there’s a lot of things I can’t intuitively work out or find solid answers to.

    For example, with this, what is your “class”? And did you use regularization images? (I want to make a habit of using them). If you did, what did you use for them? Like places that aren’t this? Like deserts and forests, etc?

    Would you consider elaborating on batch size, repeats, epochs, etc, too?

    Thanks again!

    • CavendishOP
      link
      fedilink
      English
      51 year ago

      There’s not much out there on training LoRAs that aren’t anime characters, and that just isn’t my thing. I don’t know a chibi from a booru, and most of those tutorials sound like gibberish to me. So I’m kind of just pushing buttons and seeing what happens over lots of iterations.

      For this, I settled on the class of place. I tried location but it gave me strange results, like lots of pictures of maps, and GPS type screens. I didn’t use any regularization images. Like you mentioned, i couldn’t think of what to use. I think the regularization would be more useful in face training anyway.

      I read that a batch size of one gave more detailed results, so I set it there and never changed it. I also didn’t use any repeats since I had 161 images.

      I did carefully tag each photo with a caption .txt file using Utilities > BLIP Captioning in Kohya_ss. That improved results over the versions I made with no tags. Results improved again dramatically when I went back and manually cleaned up the captions to be more consistent. For instance, consolidating building, structure, barn, church, house all to just cabin.

      Epochs was 150, which gave me 24,150 steps. Is that high or low? I have no idea. They say 2000 steps or so for a face, and a full location is way more complex than a single face… It seems to work, but it took me 8 different versions to get a model I was happy with.

      Let me know what ends up working for you. I’d love to have more discussions about this stuff. As a reward for reading this far, here’s a sneak peek at my next lora based on RDR2’s Guarma island. https://files.catbox.moe/w1jdya.png. Still a work in progress.

      • @[email protected]
        link
        fedilink
        English
        21 year ago

        Oof. Dude. You’re not wrong about what is and isn’t available online. But it’s okay. New frontier or whatever. Haha.

        I’ve been mulling over the regularization image thing, so I created a reddit post asking about it, but I basically asked, “are these images supposed to represent what the model thinks ‘this’ thing is, and in that case, regularization images would serve the role of being ‘this, but not this’” or is it more like, “these fill in the gaps when the LoRA is lacking?”

        I suspect it’s more like the first. That said, it might actually make sense to include all the defective and diverse images for the purpose of basically instructing the LoRA/model to be like, “I know you think I’m asking for ‘this,’ but in reality, that’s not what I want.”

        If that’s the case, it might make sense to ENSURE your regularization images are way off base and messed up or whatever. Or at least anything in the class that you know you def don’t want.

        I don’t have confirmation of any of this. I’m VERY new here (like ran my first LoRA training yesterday).

        I like the idea of your batch size.

        Ah. The captioning is something I REALLY need to think about. I’m guessing the cabin caption idea you used, basically you lost flexibility but gained accuracy by going that approach? I wonder if you could tag it ‘cabin, church’ and retain some of both?

        The steps, to me, sound very high, but I can’t say, for sure. Ahaha. Because for people, I’ve heard 1500 to 3000.

        I’ll be sure to come back and share findings once I have more. I think to really “do this right” you HAVE to train some of your own shit, but to do it well, as you’ve quickly realized, you’ve got to understand the methodology/philosophy of how it’s done.