• AlexKingstonsGigolo
    link
    fedilink
    121 year ago

    @generalpotato Ish. I read the technical write up and they actually came up with a very clever privacy-focused way of scanning for child porn.

    First, only photos were scanned and only if they were stored in iCloud.

    Then, only cryptographic hashes of the photos were collected.

    Those hashes were grepped for other cryptographic hashes of known child porn images, images which had to be in databases of multiple non-governmental organizations; so, if an image was only in the database of, say, the National Center For Missing And Exploited Children or only in the database of China’s equivalent, its cryptographic hash couldn’t be used. This requirement would make it harder for a dictator to slip in a hash to look for dissidents by making it substantially more difficult to get an image in enough databases.

    Even then, an Apple employee would have to verify actual child porn was being stored in iCloud only after 20 separate images were flagged. (The odds any innocent person even makes it to this stage incorrectly was estimated to be something like one false positive a year, I think, because of all of the safeguards Apple had.)

    Only after an Apple employee confirmed the existence of child porn would the iCloud account be frozen and the relevant non-government organizations alerted.

    Honestly, I have a better chance of getting a handjob from Natalie Portman in the next 24 hours than an innocent person being incorrectly reported to any government authority.

    • HelixDab
      link
      fedilink
      31 year ago

      From a technical perspective, how much would an image need to be changed before the hash no longer matched? I’ve heard of people including junk .txt files in repacked and zipped pirated games, movies, etc., so that they aren’t automatically flagged for removal from file sharing sites.

      I am not a technical expert by any means, and I don’t even use Apple products, so this is just curiosity.

      • conciselyverbose
        link
        fedilink
        31 year ago

        It depends on the type of hash. For the type of hashing used by checksums, a single byte is enough, because they’re cryptographic hashes, and the intent is to identify whether files are exact matches.

        However, the type of hashing used for CSAM is called a semantic hash. The intent of this type of hash is that similar content results in a similar (or identical) output. I can’t walk you through exactly how the hash is done, but it is designed specifically so that minor alterations do not prevent identification.

        • HelixDab
          link
          fedilink
          11 year ago

          If, for instance, I was pirating a video game, would packing it in an encrypted container along with a Gb or two of downloaded YouTube videos be sufficient to defeat semantic hashing? What about taking that encrypted volume and spanning it across multiple files?

          • conciselyverbose
            link
            fedilink
            11 year ago

            Encrypting it should be enough to defeat either hash.

            Without encryption I think it would depend on implementation. I’m not aware of the specific limitations of the tools they use, but it’s for photo/video and shouldn’t really meaningfully generalize to other formats.

      • MisuseCase
        link
        fedilink
        11 year ago

        That’s a good question. First it’s important to understand that hash functions for pirated games or other programs are actually different from hash functions used to detect media like pictures, movies, and sound recordings.

        If you alter a piece of code or text from the original version the hashes will no longer match, but typically those hashes should match and some kind of alarm gets tripped if they don’t.

        With media files like music, movies, or pictures, it works the other way around. Detection tools are looking for something that is not necessarily an exact match, but a very close match, and when such a match is found, alarms get tripped (because it’s CSAM, or a copyright violation, or something like that).

        As to the techniques you mentioned for concealing a pirated game in a ZIP file with a bunch of junk TXT files, that’s not going to work. The reason it doesn’t work is that if you ZIP something, all that uses compression algorithms that change the contents of the ZIP file in predictable repeating patterns. It’s easy to detect and compensate for. Now, if you use your ZIP/compression tool to actually encrypt the file with a good algorithm and a strong password, that’s different, but then you don’t need to pack it with junk. (And distributing the password securely will be a problem.)

        Please, people who know more about hashing and media detection with hashing, let me know if I got something wrong, I probably did.

    • @generalpotato
      link
      31 year ago

      Haha! Thanks for the excellent write up. Yes, I recall Apple handling CSAM this way and went out of it’s way to try and convince users it was still a good idea, but still faced a lot of criticism for it.

      I doubt this bill will be as thorough which is why I was posing the question I asked. Apple could technically comply using some of the work it did but it’s sort of moot if things are end to end encrypted.

    • ansik
      link
      fedilink
      31 year ago

      Great writeup! I tried searching but came up short, do you have a link to the technical documentation?

    • MisuseCase
      link
      fedilink
      21 year ago

      It would have worked and it would have protected privacy but most people don’t understand the difference between having a hash of known CSAM on your phone and having actual CSAM on your phone for comparison purposes and it freaked people out.

      I understand the difference and I’m still uncomfortable with it, not because of the proximity to CSAM but because I don’t like the precedent of anyone scanning my encrypted messages. Give them an inch, etc.