Before I left Reddit, I used a plugin through the api to replace all of my comments with random gibberish and then delete them. Part of this was because (mandatory) fuck spez. But more importantly, it was to protect the anonymity of my account. After years of posting, there is likely enough personal information shared to potentially connect my Reddit habits to my online identity. I wasn’t planning on using Reddit again in the future on that account, but I left it open in order to maintain some security control over the account. I’m not really sure what to do at this point because I still consider it a security vector that’s a bit concerning. There’s no way I can manually edit and delete all of my content with the snail’s-pace reddit UI, and I have no ability to assure that my content will remain unavailable or at least not publicly displayed.

  • @[email protected]
    link
    fedilink
    English
    4
    edit-2
    2 months ago

    So, I’m gonna be honest. I don’t think that mass deanonymization via text analysis is in the immediate future.

    Is it a theoretical risk? Yes. It’s not because I don’t think that it’s technically doable. It’s for a rather-more-depressing reason: because there’s lower-hanging fruit if someone is trying to build a deanonymized database. I just don’t think that it’s presently worth the kind of effort required to mass-deanonymize text, in general.

    Any time you have an account with some company that persists for a long time, if they retain a persistent IP address log, then whenever you log in, you’re linking your identity and the IP address at that time. Especially if one cross-correlates logs at a few companies, and a data-miner could do a reasonably reliable job of deanonymizing someone. Maybe it’s not perfect, maybe there are several people in a household or something, maybe some material is suspect. But if you’re watching cookies in a browser on a phone crossing from one network to another and such, my guess is that you can typically probably map an IP address to a fairly limited number of people.

    I mean, there are ways to help obfuscate that, like Tor. But virtually nobody is doing that sort of thing. And even through something like Tor, browsers tend to leak an awful lot of bits of unique information.

    And if someone’s downloading an app to their phone that’s intentionally transmitting a unique identifier, then it’s pretty much game over anyway, absent something like XPrivacyLua that can forge information. Companies want to get people using their phone apps.

    An individual person might be subject to doxxing from someone who wants to try to identify their real-life persona from an online persona. But I don’t think that companies will generally likely be going that route in the near future to try to deanonymize users en masse, because they’ve already got easier, more-reliable ways to track people that people are vulnerable to.

    All that being said, once text is out there, it’s potentially not going away, so keeping in mind that it might be deanonymized one day via future analysis might be a good idea. The Federalist Papers were deanonymized via Bayesian statistical analysis centuries after they were written using technologies that their authors could not have dreamed of.

    Robert Hanssen – a Soviet mole in the FBI who had counterintelligence expertise and could reasonably expect to be dealing with state-level intelligence agencies going after him – was caught because he used the unique phrase “the purple-pissing Japanese” on two occasions; once where his real-life identity wasn’t known but that he was a spy was, and once where his real-life identity was known but not that he was a spy. That deanonymization was done manually, via human effort, but if you figure that the same sorts of approaches could be used to link accounts at different services and across accounts on one service…shrugs I mean, I just don’t have the tools to try to resist something like that, to keep what I’m saying intact but present ideas in a way that I’d be confident would be strong against that kind of analysis.

    • Lvxferre
      link
      fedilink
      English
      22 months ago

      While I don’t think that text analysis (TA) is going to replace those techniques that you mentioned, I do think that it is a threat to anonimity in the immediate future, because it’ll likely be used alongside those techniques to improve their accuracy and lower their overall costs.

      The key here is machine “learning” lowering the TA fruit by quite a bit. People misattribute ML with almost supernatural abilities, but here it’s right at home, as it’s literally made to find correlations between sets of data. And, well, TA is basically that.

      Another reason why I think that it’s a threat is because even a partial result is useful. TA doesn’t just identifies you; it profiles you. And even if not knowing exactly your name and address, info like age, sex, gender, location, social class, academic formation etc. is still useful for advertisers and similar.

      (Besides the Federalist Papers and Robert Hanssen, another interesting example would be how the Unabomber was captured. It illustrates better how the analysis almost never relies on a single piece of info, but rather multiple pieces that are then glued together into a coherent profile.)

      (Also sorry for nerding out about this, it’s just a topic that I happen to enjoy.)