A sex offender convicted of making more than 1,000 indecent images of children has been banned from using any “AI creating tools” for the next five years in the first known case of its kind.

Anthony Dover, 48, was ordered by a UK court “not to use, visit or access” artificial intelligence generation tools without the prior permission of police as a condition of a sexual harm prevention order imposed in February.

The ban prohibits him from using tools such as text-to-image generators, which can make lifelike pictures based on a written command, and “nudifying” websites used to make explicit “deepfakes”.

Dover, who was given a community order and £200 fine, has also been explicitly ordered not to use Stable Diffusion software, which has reportedly been exploited by paedophiles to create hyper-realistic child sexual abuse material, according to records from a sentencing hearing at Poole magistrates court.

  • Tippon
    link
    fedilink
    English
    167 months ago

    Where does the training data come from to create indecent images of children?

    • Dran
      link
      English
      51
      edit-2
      7 months ago

      It doesn’t need csam data for training, it just needs to know what a boob looks like, and what a child looks like. I run some sdxl-based models at home and I’ve observed it can be difficult to avoid more often than you’d think. There are keywords in porn that blend the lines across datasets (“teen”, “petite”, “young”, “small” etc). The word “girl” in particular I’ve found that if you add that to basically any porn prompt gives you a small chance of inadvertently creating the undesirable. You have to be really careful and use words like “woman”, “adult”, etc instead to convince your image model not to make things that look like children. If you’ve ever wondered why internet-based porn generators are on super heavy guardrails, this is why.

        • Dran
          link
          English
          27 months ago

          I’m not going to say that csam in training sets isn’t a problem. However, even if you remove it, the model remains largely the same, and its capabilities remain functionally identical.

          • @PotatoKat
            link
            English
            07 months ago

            At that point it’s still using photos of children to generate csam even if you could somehow assure the model is 100% free of csam

            • Dran
              link
              English
              17 months ago

              That would be true, it’d be pretty difficult to build a model without any pictures of children at all, and then try and describe to the model how to alter an adult to make a child. Is anyone asking for that though? To make it illegal to have regular pictures of children in these datasets?

              • @PotatoKat
                link
                English
                07 months ago

                No but it is a reason why generating csam should be illegal. You’re using data trained on pictures of real kids

                • Dran
                  link
                  English
                  07 months ago

                  I’m not arguing whether or not it should be legal, I was just offering my first hand experience in regards to the capabilities of these local models since people seem to be confused as to how this actually works.

                  • @PotatoKat
                    link
                    English
                    07 months ago

                    Is anyone asking for that though? To make it illegal to have regular pictures of children in these datasets?

                    I was responding to this part of your comment which directly refers to legality

      • Tippon
        link
        fedilink
        English
        37 months ago

        Thanks for the reply, it’s given me a good idea of what’s most likely happening :)

        It’s a shame that the rest of the thread went to shit, but unfortunately it’s an emotional topic, and brings out emotional responses

        • Dran
          link
          English
          27 months ago

          Always happy to try and productively add to someone’s learning.

      • @[email protected]
        link
        fedilink
        English
        -477 months ago

        It is true, a 10 year old naked woman is just a 30 year old naked woman scaled down by 40%. /s

        No buddy, there isn’t some vector of “this is the distance between kid and adult” that a model can apply to generate what a hypothetical child looks like. The base model was almost certainly trained on more than just anatomical drawings from Wikipedia - it ate some csam.

        If you’ve seen stuff about “Hitler - Germany + Italy = Mousillini” for models where that’s true (which is not universal) it takes an awful lot of training data to establish and strengthen those vectors. Unless the generated images were comically inaccurate then a lot of training went into this too.

        • @[email protected]
          link
          fedilink
          English
          377 months ago

          Right, and the google image ai gobbled up a bunch of images of black george washington, right? They must have been in the data set, there’s no way to blend a vector from one value to another, like you said. That would be madness. Nope, must have been copious amounts of asian nazis in the training set, since the model is incapable of blending concepts.

          • @[email protected]
            link
            fedilink
            English
            -367 months ago

            You’re incorrect and you should fucking know better.

            I have no idea why my comment above was downvoted to hell but AI can’t “dream up” what a naked young person looks like. An AI can figure that adults wear different clothes and put a black woman in a revolutionary war outfit. These are totally different concepts.

            You can downvote me if you like but your AI generated csam is based on real csam so fuck off. I’m disappointed there is such a large proportion of people defending csam here especially since lemmy should be technically oriented - I expect to see more input from fellow AI fluent people.

                • @[email protected]
                  link
                  fedilink
                  English
                  27 months ago

                  Ok? Hundreds of images of anything isn’t going to necessarily train a model based on billions of images. Have you ever tried to get Stable Diffusion to draw a bow and arrow? Just because it has ever seen something doesn’t mean that it has learned it, nor, more importantly, does that mean that is the way it learned it, since we can see that it can infer many concepts from related concepts- pregnant old women, asian nazis, black george washingtons (NONE OF WHICH actually have ever existed or been photographed)… is unclothed children really more of a leap than any of those?

                  • @[email protected]
                    link
                    fedilink
                    English
                    -17 months ago

                    It is, yes. A black George Washington is one known visual motif (a George Washington costume) combined with another known visual motif. A naked prepubescent child isn’t just the combination of “naked adult” and “child” naked children don’t look like naked adults simply scaled down.

                    AI can’t tell us what something we’ve never seen looks like… a kid who knows what George Washington and a black woman looks like can imagine a black George Washington. That’s probably a helpful analogy, AI can combine simple concepts but it can’t innovate - it can dream, but it can’t know something that we haven’t told it about.

                • @[email protected]
                  link
                  fedilink
                  English
                  23
                  edit-2
                  7 months ago

                  The misinformation you’re spreading is related to how it works. A generative AI system will (without prompting away from it) create people with 3 heads, 8 fingers on each hand and multiple legs connecting to each other. Do you think it was trained on that? This argument of “it can generate it, therefore it was trained on it” is ridiculous. You clearly don’t understand how it works.

                  • @[email protected]
                    link
                    fedilink
                    English
                    -97 months ago

                    You’re extremely correct when it comes to combining different aspects of existing works to generate something new - but AI can’t generate something it doesn’t know about. If a generative model knows what a prepubescent naked body looks like it has been exposed to them before. The most generous way to excuse this is that medical diagrams exist and supplied the majority of inputs for any prompts about cp to work off of. A must more realistic view is that some cp made it into the training set.

                    I don’t disagree with any of your assessments but if you wanted a Van Gogh painting of a Glorp from Omnicron Persei 8, you’ll get out… something, but because the model has no reference for Glorps it’ll be hallucinations or guesses based on other terms it can find.

                    To be clear, I’m coming at this from the angle as someone who has trained and evaluated models in a company that’s used them for the better part of a decade.

                    I understand I’m going up against your earnestly held belief, but I’ve seen behind the curtain on a lot of this stuff and hopefully in time the way it works becomes demystified for more people.

            • @[email protected]
              link
              fedilink
              English
              -2
              edit-2
              7 months ago

              No, they’s referring to the internal workings of AI models, which are essentially a series of incredibly high-dimension matrices with extra bits around them to make them work. Individual concepts are embedded as vectors in the space that these models work in. That’s why linear algebra is brought up so frequently in discussions of AI.

              • @The_Vampire
                link
                English
                47 months ago

                While it’s true that linear algebra and vectors are used in learning models, they’re not using the term correctly in a way that says they know something about the subject (at least, the modern subject). Concepts aren’t embedded as vectors. In older models (before the craze), concepts were manually embedded as numbers or a collection of numbers, which could be a vector (but could be something else as well), and the machine would learn by modifying weights. However, in current models (and by current, I mean at least more than a couple years), concepts are learnt by the machine (weights are still modified by the machine as well) and the machine makes its own connections between features presented to it.

                For example, you give it a dataset of 10x10 pixel images (with text descriptions) and it reads that as 100 pixels split into 3 numbers (RGB) and then looks for connections between those numbers and in which pixels. It’s not identifying what a boob is, but knows that when an image has ‘boob’ in the text description then there’s a very high likelihood that there will be a circular collection of pixels with lots of red somewhere in the image that are also connected to other pixels that are often also lots of red. That’s me breaking down what a human would think given the same task/information, but the reality is the machine will come up with its own connections/concepts which are both often far better than humans (when the model works, at least) and far more ineffable to humans.

                • @[email protected]
                  link
                  fedilink
                  English
                  27 months ago

                  From my perspective as an algebraist, you seem to be splitting hairs when you’re making a distinction between vectors and n-tuples of real numbers. Furthermore, he’s referencing a specific 3blue1brown video. I’m not saying their conclusion is correct; they’s dead wrong but that doesn’t mean their understanding is so shallow that they’re simply repeating a word they heard to sound smart.

    • @[email protected]
      link
      fedilink
      English
      287 months ago

      The whole point of diffusion models is that you can generate new concepts using training data. Models trained on any nsfw images can combine those concepts with any of its non-nsfw concepts. Of course, that’s not to say there isn’t CSAM in any training data, because there objectively has been in the past, but there doesn’t need to be any to generate it.

    • Turun
      link
      fedilink
      English
      107 months ago

      Ai is able to fill in the last field in a table like “Old / young” vs “Clothed / naked” when given three of the four fields.