cross-posted from: https://lemmy.world/post/17748238

TIL That the entirety of Wikipedia is only ~100Gb and you can download it for offline use

In light of the recent Crowdstrike crash revealing how weak points in IT infrastructure can have wide ranging effects, I figured this might be an interesting one.

The entirety of wikipedia is periodically uploaded here, along with many other useful wikis and How To websites (ex. iFixit tutorials and WikiHow): https://download.kiwix.org/zim

You select the archive you want, then the language and archive version (for example, you can get an archive with no pictures, to save on space). For the totality of the english wikipedia you’d select the “wikipedia_en_all_maxi_2024-01.zim”

The archives are packed as .zim files, which can be read with the Kiwix app completely offline.

I have several USBs I keep that have some of these archives along with the app installer. In the event of some major catastrophe I’d at least be able to access some potentially useful information. I have no stake in Kiwix, and don’t know if there are other alternative apps and schemes, just thought it was neat.

  • @seaQueue
    link
    English
    165 months ago

    Compression ratios on plaintext are magical

  • Silverchase
    link
    fedilink
    English
    65 months ago

    Hi, unwanted internet pedant here. Gb is gigabits. GB is gigabytes. All of English Wikipedia compressed is about 100 GB.

  • astrsk
    link
    fedilink
    55 months ago

    Is there any tool that can keep an updated model automatically, kinda like the open source steam cache? I’d love to self-host Wikipedia that syncs daily or weekly with changes.

    • @[email protected]
      link
      fedilink
      English
      5
      edit-2
      5 months ago

      If you’re on nearly any flavor of UNIX (Linux, MacOS, etc.), you can run rsync on a scheduled basis via crontab.

      Only limit is going to be, so far as I am aware, the download is provided as a compressed file, so you’ll have to download the whole thing, uncompress it locally and then do a selective update.

  • @thirteene
    link
    English
    15 months ago

    Open question: is there a “high quality” static version that people prefer to use, similar to the avocado prices data set? I have to imagine that anything pre-2023 without AI data is considered to be more accurate. Potentially a date before/after an impactful policy change.