It would be a shame to lose the wealth of knowledge with easy-ish search that subreddits like datahoarder provide if the subreddit is taken down or stays locked forever. Sure it is currently accessible, but will it stay that way?
I know it is being archived, but the accessibility part is the problem.
There’s actually already a Reddit to Lemmy importer that lets you bring threads including comments https://github.com/rileynull/RedditLemmyImporter
Yeah, that appears to be what I had in mind. Good find!
If you’re just interested in searching:
http://redarc.basedbin.org/search
/r/datahoarder is indexed and searchable
That is good to know that exists, thanks! Although I still personally believe it being in a forum like lemmy is the best way to preserve it in its original format.
I have wondered is there an easy way to perform search through wayback machine for archived reddit data?
And for comments people back up to csv with stuff like power suite delete is there a nice way that displays them as opposed to excel?
It could be done, but that really isn’t the best possible solution in my opinion. What I was thinking was having a bot migrate all the comments and posts here (or another instance). So the bot would take all the names of the users and replace them with the bot’s names (instead of trying to create new users on lemmy) and put the old usernames in their comment. Like “Bread commented” and their comment. So we know who said it still.
If the bot maker had control of the instance, we probably might be able to put everything in chronological order by timestamp. So it would look like the comments were all made here orginally. The only indicator it wasn’t would be the bot name as the username. So search algorithms would be able to search it just like reddit.
I believe the best way to archive a forum style website, would be on a forum where things have one to one equals.
As for moving Datahoarder to a new instance, that sure would make backups a lot nicer if a datahoarder ran it. I am surprised that it isn’t on its own already considering the topic. Same thing with self-hosted.
I love this idea. It raises some issues to think about, too. Like, who “owns” that data? Would Reddit file a lawsuit against the Lemmy instance arguing that the data belongs to Reddit? Does the data belong to the users who posted? What TOS do we agree to when signing up for a Reddit account? Are we giving them ownership of all content we post?
I think it would be very hard to argue in court that someone’s ideas and thoughts that they made belong to reddit just because they posted them there. That is also why you can request reddit delete all your data and they must comply.
As for the legality of taking those comments and posts. I don’t know for certain. The internet archive already does though. If anything, they would have to remove any content that a person wants removed that they made. Like a DCMA request.
Like with most things on the internet, if it is illegal and nobody is enforcing it, it might as well be legal.
I’m not sure if it’s possible to retrodate posts, not even if it’s your own instance. But otherwise i think this might be the way.
The-eye has a nice archive of Reddit: https://the-eye.eu/redarcs/
While it’s easy enough to get the sub contents in json, I don’t believe there is a post API for lemmy yet, so no way to easily push it back up.
But it is still possible. The question now is should it be done?
Hmmm…. I’d be more inclined to ensure it was archived with the wayback machine, and then refer back to that.