• conciselyverbose
    link
    fedilink
    31 year ago

    It depends on the type of hash. For the type of hashing used by checksums, a single byte is enough, because they’re cryptographic hashes, and the intent is to identify whether files are exact matches.

    However, the type of hashing used for CSAM is called a semantic hash. The intent of this type of hash is that similar content results in a similar (or identical) output. I can’t walk you through exactly how the hash is done, but it is designed specifically so that minor alterations do not prevent identification.

    • HelixDab
      link
      fedilink
      11 year ago

      If, for instance, I was pirating a video game, would packing it in an encrypted container along with a Gb or two of downloaded YouTube videos be sufficient to defeat semantic hashing? What about taking that encrypted volume and spanning it across multiple files?

      • conciselyverbose
        link
        fedilink
        11 year ago

        Encrypting it should be enough to defeat either hash.

        Without encryption I think it would depend on implementation. I’m not aware of the specific limitations of the tools they use, but it’s for photo/video and shouldn’t really meaningfully generalize to other formats.