There are some subreddits which may never happen to come online again. There are also some subreddits which are very valuable because of the old posts and responses. Alas, the intersection isn’t empty (I personally am anxious about r/suggestmeabook and r/TrueLit).
Naturally, one would like to download all posts and comments to an offline storage. Naturally, the usual methods are useless when the subreddit is private.
Are there any good options for the pessimistic scenario? Scraping the web archive? Filtering ML datasets? Anything else?
You can probably try pushshift
Is it going to work on a private subreddit?
Also, pushshift is effectively dead as far as I can see.
I think they wouldn’t have the posts if the posts were private at the time of posting but otherwise they store the posts so the posts should be available even though the subreddit is private now. Also the archives might be dead but the data until 2023 is available as a torrent here on Academic Torrents