• @BetaDoggo_
    link
    English
    11 year ago

    Between 0.00002% and 0.00006%

      • @[email protected]
        link
        fedilink
        English
        8
        edit-2
        1 year ago

        While I agree with the sentiment, that’s 2-6 in 10,000,000 images; even if someone was personally reviewing all of the images that went into these data sets, which I strongly doubt, that’s a pretty easy mistake to make, when looking at that many images.

        • @[email protected]
          link
          fedilink
          English
          81 year ago

          “Known CSAM” suggests researchers ran it through automated detection tools which the dataset authors could have used.

        • Sapphire VelvetOP
          link
          fedilink
          English
          11 year ago

          They’re not looking at the images though. They’re scraping. And their own legal defenses rely on them not looking too carefully else they cede their position to the copyright holders.

          • snooggums
            link
            fedilink
            41 year ago

            Technically they violated the copyright of the CSAM creators!