• Primarily0617
    link
    fedilink
    311 months ago

    you can’t “anonymize” data

    ask the people outed as lgbt by netflix’s anonymized data set

    • @[email protected]
      link
      fedilink
      English
      111 months ago

      You absolutely can anonymise data.

      However it’s also true that of you don’t do it correctly users can be identified. Sounds like Netflix didn’t do it properly. I don’t know, do you have a link I could look at?

      • Primarily0617
        link
        fedilink
        311 months ago

        anonymising data is a treadmill problem

        what might work now won’t hold up to the de-anonymising techniques of a few years from now

        so no, you can’t really

        • @[email protected]
          link
          fedilink
          English
          111 months ago

          Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing. So instead of Max Reboo has purchased a subscription to jugs and hooters it’s user 12345678901234576 has purchased jugs and hooters. How can a future treadmill de-anonymise this? For sure if the storage is done badly then you can track back to a particular user.

          Also, once again, can you link to the netflix issue you quoted above please. Thanks.

          • Primarily0617
            link
            fedilink
            3
            edit-2
            11 months ago

            Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing

            which is more or less exactly what netflix did -> the whole thing’s not that hard to find on google

            but you need something to distinguish users at least a bit or the data’s equivalent to sales figures

            you combine that “not-quite-pii” with other independent data sources that have similar “not-quite-pii” and build a complete picture

            the treadmill effect comes from active research in this exact area trying to de-anonymise data sets finding new techniques to get around old ones