I recently came across a torrent that seems to be an archive of Reddit. It got me thinking if it would be possible to make it locally browsable. However, I also considered the possibility that someone might have already addressed this by creating a public Lemmy instance, enabling the content to be accessible from any federated instance.

  • qprimed
    link
    fedilink
    English
    141 year ago

    I can actually see some merit to a lemmy API accessible reddit corpus. it would be interesting to reference old reddit info in a lemmy compatible way with zero reference to reddit itself.

    doing so for the entire corpus properly (link fixups, etc) would be… challenging, but doable.

  • Andy
    link
    English
    71 year ago

    Letting the LLMs source from their for free, completely invalidating the proposed licensing model at Reddit.

  • @vintprox
    link
    English
    11 year ago

    I suppose, having an easy 2-click solution to migrate your own content from Reddit to a Lemmy instance would be nice, instead of being shamelessly copied by someone (who will most likely forget to even mention the author). Keep posting 'em on the alternative.

  • @[email protected]
    link
    fedilink
    English
    1
    edit-2
    1 year ago

    Am I really going to buy a 2TB drive to hold all of reddit…

    Actually, I’m pretty surprised that it’s only 2TB.

    Edit: and it looks like it’s only captured data up until about six months ago.

    Edit 2: turns out I have an available 2TB drive. Now I’m tempted. Should I bother with a VPN for this or nah?

    • @[email protected]OP
      link
      fedilink
      English
      21 year ago

      It would be helpful if there were an instance that migrated all of this to Lemmy so that we could access it from any other instance, instead of having to download it for local browsing.

      • @[email protected]
        link
        fedilink
        English
        11 year ago

        I haven’t downloaded it. Looks like a collection of compressed files, but I don’t know exactly what’s inside of them. Do you know what format they’re in?