Reddit Undeleted all my posts and comments

@gedaliyah · 6 months ago

Reddit Undeleted all my posts and comments

@[email protected] · edit-2 6 months ago

So, I’m gonna be honest. I don’t think that mass deanonymization via text analysis is in the immediate future.

Is it a theoretical risk? Yes. It’s not because I don’t think that it’s technically doable. It’s for a rather-more-depressing reason: because there’s lower-hanging fruit if someone is trying to build a deanonymized database. I just don’t think that it’s presently worth the kind of effort required to mass-deanonymize text, in general.

Any time you have an account with some company that persists for a long time, if they retain a persistent IP address log, then whenever you log in, you’re linking your identity and the IP address at that time. Especially if one cross-correlates logs at a few companies, and a data-miner could do a reasonably reliable job of deanonymizing someone. Maybe it’s not perfect, maybe there are several people in a household or something, maybe some material is suspect. But if you’re watching cookies in a browser on a phone crossing from one network to another and such, my guess is that you can typically probably map an IP address to a fairly limited number of people.

I mean, there are ways to help obfuscate that, like Tor. But virtually nobody is doing that sort of thing. And even through something like Tor, browsers tend to leak an awful lot of bits of unique information.

And if someone’s downloading an app to their phone that’s intentionally transmitting a unique identifier, then it’s pretty much game over anyway, absent something like XPrivacyLua that can forge information. Companies want to get people using their phone apps.

An individual person might be subject to doxxing from someone who wants to try to identify their real-life persona from an online persona. But I don’t think that companies will generally likely be going that route in the near future to try to deanonymize users en masse, because they’ve already got easier, more-reliable ways to track people that people are vulnerable to.

All that being said, once text is out there, it’s potentially not going away, so keeping in mind that it might be deanonymized one day via future analysis might be a good idea. The Federalist Papers were deanonymized via Bayesian statistical analysis centuries after they were written using technologies that their authors could not have dreamed of.

Robert Hanssen – a Soviet mole in the FBI who had counterintelligence expertise and could reasonably expect to be dealing with state-level intelligence agencies going after him – was caught because he used the unique phrase “the purple-pissing Japanese” on two occasions; once where his real-life identity wasn’t known but that he was a spy was, and once where his real-life identity was known but not that he was a spy. That deanonymization was done manually, via human effort, but if you figure that the same sorts of approaches could be used to link accounts at different services and across accounts on one service…shrugs I mean, I just don’t have the tools to try to resist something like that, to keep what I’m saying intact but present ideas in a way that I’d be confident would be strong against that kind of analysis.

Lvxferre [he/him] · 6 months ago

While I don’t think that text analysis (TA) is going to replace those techniques that you mentioned, I do think that it is a threat to anonimity in the immediate future, because it’ll likely be used alongside those techniques to improve their accuracy and lower their overall costs.

The key here is machine “learning” lowering the TA fruit by quite a bit. People misattribute ML with almost supernatural abilities, but here it’s right at home, as it’s literally made to find correlations between sets of data. And, well, TA is basically that.

Another reason why I think that it’s a threat is because even a partial result is useful. TA doesn’t just identifies you; it profiles you. And even if not knowing exactly your name and address, info like age, sex, gender, location, social class, academic formation etc. is still useful for advertisers and similar.

(Besides the Federalist Papers and Robert Hanssen, another interesting example would be how the Unabomber was captured. It illustrates better how the analysis almost never relies on a single piece of info, but rather multiple pieces that are then glued together into a coherent profile.)

(Also sorry for nerding out about this, it’s just a topic that I happen to enjoy.)