If it were possible to run LLMs without a significant investment to GPU prowess, this problem wouldn’t be very relevant. However, the bigger FOSS LLMs require a lot of power to run.
Is there any automated technique (scripts, lookups etc) that can warn a user before the content is posted online? I’m asking this specifically for textual content.
Thanks
I didn’t mention what I wanted clearly enough, so here goes:
I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I’ll deal with the browser-agent, Cookies, IP etc.
Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities
Edit: it seems there are packages available on python and R to parser through text and try to infer identity from stylometric data. I’ll have to look into that, but it seems doable at a basic level.
you want to do scans before the content is posted? or you want to scan existing content online that you posted?
you could self-host LanguageTool for paraphrasing capability, which would vastly reduce stylometry correlations
https://github.com/languagetool-org/languagetool
Thank you, I’ll take a look! That’s a great idea!