Bots are running rampant. How do we stop them from ruining Lemmy?

@Buttflapper · edit-2 4 months ago

Bots are running rampant. How do we stop them from ruining Lemmy?

Dark Arc · 4 months ago

I’ve been thinking postcard based account validation for online services might be a strategy to fight bots.

As in, rather than an email address, you register with a physical address and get mailed a post card.

A server operator would then have to approve mailing 1,000 post cards to whatever address the bot operator was working out of. The cost of starting and maintaining a bot farm skyrockets as a result (you not only have to pay to get the postcard, you have to maintain a physical presence somewhere … and potentially a lot of them if you get banned/caught with any frequency).

Similarly, most operators would presumably only mail to folks within their nation’s mail system. So if Russia wanted to create a bunch of US accounts on “mainstream” US hosted services, they’d have to physically put agents inside of the United States that are receiving these postcards … and now the FBI can treat this like any other organized domestic crime syndicate.

@[email protected] · 4 months ago

I am absolutely not giving some Lemmy admin my address.

Dark Arc · 4 months ago

How would you feel if it was an independent third party (kind of an OAuth flow) with a well established presence and data policy?

(i.e., one with a face and name that you could sue if they did something bad with your address?)

@[email protected] · 4 months ago

Am I missing something? I thought you weren’t required to put a return address on postcards. Just put your username and email.

@[email protected] · 4 months ago

They are sending the card to you.

@QuadratureSurfer · 4 months ago

Easy way to get around that with “virtual” addresses: https://ipostal1.com/virtual-address.php

Just pay $10 for every account that you want to create… you may as well just go with the solution of charging everyone $10 to create an account. At least that way the instance owner is getting supported and it would have the same effect.

@[email protected] · edit-2 4 months ago

Just pay $10 for every account that you want to create

So, making identities expensive helps. It’d probably filter out some. But, look at the bot in OP’s image. The bot’s operator clearly paid for a blue checkmark. That’s (checks) $8/mo, so the operator paid at least $8, and it clearly wasn’t enough to deter them. In fact, they chose the blue checkmark because the additional credibility was worth it; X doesn’t mandate that they get one.

And it also will deter humans. I don’t personally really care about the $10 because I like this environment, but creating that kind of up-front barrier is going to make a lot of people not try a system. And a lot of times financial transactions come with privacy issues, because a lot of governments get really twitchy about money-laundering via anonymous transactions.

EDIT: I think that maybe a better route is to try to give users a “credibility score”. So, that’s not a binary “in” or “out”. But other people can see some kind of automated assessment of how likely, for example, a person might be to be a bot.

thinks more

I mean, this is just spitballing, but could even be done not at a global level, but at a per-other-user level. Like, okay, suppose you have what amounts to a small neural network, right? So the instance computes a bunch of statistics about a each user, like account age, stuff like that, and then provides that to the client. But it doesn’t determine the importance of those metrics in whether the other user should see that post, just provides the raw data. You’ve got a bunch of inputs to a neural net, then. Then the other user can have a set of classifications. Maybe just “hide”, but also maybe something like “bot” or “political activism” or whatever. And it takes those input metrics from the instances, and trains that neural net to produce client-side classifications, and then auto-tags users based on that. That’s gonna be a pain to try to defeat, because the bot operator can’t even see how they’re being scored – they haven’t “gotten over the hurdle” or not.

But you don’t want to make every end user train a neural net from scratch. Hmm.

So maybe what you do is let users create their own scores and expose those to other users, right? I think that I read that BlueSky does something like that, was working on letting users create “curated feeds” for other users. They’re doing something simpler, no machine learning, but that’s got some drawbacks, means that you have to spend more time determining whether a score is good. So, okay. Say I’m gonna try to score a user based on whether-or-not I think that they’re a bot. I have the option to make that score publicly-available. Other users can “subscribe” to that metric, and when they do, there’s a new input node added to their local classifier’s list of input nodes. Like, “Dons Bot list”.

But I don’t have to subscribe to Don’s Bot List, and even if I do, it doesn’t mean that I automatically consider that other user a bot. Don’s rating is just an input into whether my own classifier considers them a bot. If I regularly disagree with Don, even if I’m subscribed to his list, my local neural net will slash the importance of his rating. If I agree with Don unless some other input to my classifier’s neural net is triggered, then the classifier can learn that.

@QuadratureSurfer · 4 months ago

Yep, exactly this. It might deter some small time bot creators, but it won’t stop larger operations and may even help them to seem more legitimate.

If anything, my favorite idea comes from this xkcd:

https://xkcd.com/810/

Dark Arc · 4 months ago

Yeah, BlueSky has this concept of user moderation lists. It’s effectively like subscribing to a adblock filter. There might be some things blocked by patterns (e.g., you could have one that blocks anything that involves spiders) and there might be others that block specific accounts (e.g., you could have one that blocks users that are known to cause problems, are prone to vulgar language, etc).

I think the problem with credibility scores in general though, is it’s sort of like a “social score” from black mirror. Real people can get caught in the net of “you look like a bot” and similarly different algorithms could be designed to game the system by gaming the metrics to look like they’re not a bot (possibly even more so than some of the real people).

This is kind of what lead me down the route of bringing things back into the physical world. Like, once you have things going back through the normal systems … you arguably do lose some level of anonymity but you also gain back some guarantees of humanity.

It doesn’t need to be the level of “you’ve got a government ID and you’re verified to be exactly you with no other accounts” … just “hey, some number of people in the real world, that are subject to the respective nation’s laws, had to have come into contact with a real piece of mail.”

Maybe that just turns into the world’s slowest UDP network in existence. However, I think it has a real chance of making it easier to detect real people (i.e., folks that have a small number of overlapping addresses). The virtual mailbox the other person gave has 3,000 addresses… if you assume 5 people per mailing address is normal that’s 15,000 bots total before things start getting fishy if you’ve evenly distributed all of those addresses. If you’ve got 3,000 accounts at the same address, that’s very fishy. Addresses also change a lot less frequently than IP addresses, so a physical address ban is a much more strict deterrent.

Dark Arc · edit-2 4 months ago

Hm… I’m not sure if this is enough to defeat the strategy.

It looks like even with that service, you have to sign up for Form 1583.

Even if they’re willing in incur the cost, there’s a real paper trail pointing back to a real person or organization. In other words, the bot operator can be identified.

As you note, this is yet another additional cost. So, you’d have say … $2-3 for the card + an address for the account. If you require every unique address to have no more than 1 account … that’s $13 per bot plus a paper trail to set everything up.

That certainly wouldn’t stop every bot out there … but the chances of a large scale bot farms operating seem like they would be significantly deterred, no?

@QuadratureSurfer · 4 months ago

That’s a good point. I didn’t know about the USPS Form 1583 for virtual mailboxes… Although that is a U.S. specific thing, so finding a similar service in a country that doesn’t care so much might be the way to go about that.

Dark Arc · 4 months ago

True, though presumably users in those places would be stuck with the “less trustworthy” instances (and ideally, would be able to get their local laws changed to make themselves more trust worthy).

It’s definitely not perfectly moral… but little in the world is and maybe it’s sufficient pragmatic.

@QuadratureSurfer · edit-2 4 months ago

Yeah, the other thing I could see happening is a similar tactic used by scammers where they use Mules who pick up mail from various Airbnbs throughout whatever country, but this would definitely limit most bot operations… Unless some organization specializes in this and just offers some service to create a bunch of accounts for anyone willing to pay.

Also, how many accounts would you limit to a single address, and how long would you lock up an address before it could be used again (given that people do move around from time to time).

edit:typo.

Scribble902 · 4 months ago

I was thinking physical mail too. But I think It definitely would require some sort of system that is either third party or government backed that annonomyses you like how the covid Bluetooth tracing system worked (stupidly called track and trace in the UK). Plus you’d have to interact with someone at a postal office to legitimise it. But I’m talking, just a worker at a counter.

So you’d get a one time unique annonomysed postal address. You go to a post office and hand your letter over to someone. You, and perhaps they, will not know the address, but the system will. Maybe a process which re-envelopes the letter down the line into a letter with the real address on.

This way, you’ve kept the server owner private and you’ve had to involve some form of person to person interaction meaning, not a bot!

This system could be used for all sorts of verification other than for socal media so may have enough incentive for governments/3rd partys to set up to use beyond that.

Could it be abused though and if how are there solutions to mitigate them?