Is there an open source package that the Internet Archive runs? What is it? I assume sites like archive.is run the same. I’d like to know if I can also run it for self-hosted archiving.

  • @[email protected]
    link
    fedilink
    English
    12
    edit-2
    10 months ago

    I believe they used heritrix at one point. The important bit is that there is a special archive format that they use which is a standard. There are several tools that support it (both capturing to it and viewing it) - it allows for capturing a website in a ‘working’ condition with history or something. I’m a bit fuzzy on it since it’s been some time since I looked into it.