There are some subreddits which may never happen to come online again. There are also some subreddits which are very valuable because of the old posts and responses. Alas, the intersection isn’t empty (I personally am anxious about r/suggestmeabook and r/TrueLit).

Naturally, one would like to download all posts and comments to an offline storage. Naturally, the usual methods are useless when the subreddit is private.

Are there any good options for the pessimistic scenario? Scraping the web archive? Filtering ML datasets? Anything else?

    • @AudalinOP
      link
      32 years ago

      Is it going to work on a private subreddit?

    • @AudalinOP
      link
      22 years ago

      Also, pushshift is effectively dead as far as I can see.

      • @[email protected]
        link
        fedilink
        12 years ago

        I think they wouldn’t have the posts if the posts were private at the time of posting but otherwise they store the posts so the posts should be available even though the subreddit is private now. Also the archives might be dead but the data until 2023 is available as a torrent here on Academic Torrents