There are some subreddits which may never happen to come online again. There are also some subreddits which are very valuable because of the old posts and responses. Alas, the intersection isn’t empty (I personally am anxious about r/suggestmeabook and r/TrueLit).

Naturally, one would like to download all posts and comments to an offline storage. Naturally, the usual methods are useless when the subreddit is private.

Are there any good options for the pessimistic scenario? Scraping the web archive? Filtering ML datasets? Anything else?

  • @AudalinOP
    link
    21 year ago

    Also, pushshift is effectively dead as far as I can see.

    • @[email protected]
      link
      fedilink
      11 year ago

      I think they wouldn’t have the posts if the posts were private at the time of posting but otherwise they store the posts so the posts should be available even though the subreddit is private now. Also the archives might be dead but the data until 2023 is available as a torrent here on Academic Torrents