• @[email protected]
    link
    fedilink
    English
    -110 months ago

    I read the whole thing. I understand it’s for detecting use of nightshade, not bypassing it. What other even slightly ethical use for this is there besides trying to make sure you don’t train on a poisoned image? These models are clearly not asking for permission first, else you’d never need to do this, so they’re just taking an image, assuming they’re allowed to use it, and then using this tool to detect if it’s going to poison their model.

    • @elliot_crane
      link
      English
      310 months ago

      I don’t think most people are collecting images by hand and saying “ah yes I’m just gonna yoink this and use it in my model”. There are a plethora of sites for sharing repositories of training data, and therefore it’s pretty easy for someone training a model to unknowingly pull down some data they don’t actually have permission to use. It’s completely infeasible to check licensing by hand on what could be millions of images, so this tool makes it easy to simply not train on images that have gone through Nightshade. I fail to see how that’s unethical, as not training on the image is the whole reason the original image was put through Nightshade in the first place.

      • @[email protected]
        link
        fedilink
        English
        010 months ago

        it’s completely infeasible

        Then it shouldn’t be done. That’s the unethical part. Trying to just avoid the problem by continuing to scrape large data sets for images that you shouldn’t be using is the entire problem. Either get permission for each image or don’t build your image model. Doing otherwise is unethical.

        • @elliot_crane
          link
          English
          310 months ago

          Again, in many instances, folks training models are using repositories of images that have been publicly shared. In many cases the person/people who assembled the image repositories are not the same person using them. I agree that reckless scraping is not responsible, but if you’re using a repository of images that’s presented as ok to use for AI training, I’d argue it’s even more ethical to strip out the Nightshaded images, because clearly the presence of Nigthshade means you shouldn’t use that one. I guess we’re just going to have to agree to disagree here, because I see this as a helpful tool to specifically avoid training on images you shouldn’t be.