In light of the recent Crowdstrike crash revealing how weak points in IT infrastructure can have wide ranging effects, I figured this might be an interesting one.

The entirety of wikipedia is periodically uploaded here, along with many other useful wikis and How To websites (ex. iFixit tutorials and WikiHow): https://download.kiwix.org/zim

You select the archive you want, then the language and archive version (for example, you can get an archive with no pictures, to save on space). For the totality of the english wikipedia you’d select the “wikipedia_en_all_maxi_2024-01.zim”

The archives are packed as .zim files, which can be read with the Kiwix app completely offline.

I have several USBs I keep that have some of these archives along with the app installer. In the event of some major catastrophe I’d at least be able to access some potentially useful information. I have no stake in Kiwix, and don’t know if there are other alternative apps and schemes, just thought it was neat.

      • Silverseren
        link
        fedilink
        492 months ago

        I presume this is images directly hosted on English Wikipedia and not the entirety of Commons where the vast majority of images are kept, right?

          • @clearedtoland
            link
            English
            702 months ago

            So I have to upgrade my NAS again, ay?

            • @gmtom
              link
              English
              112 months ago

              You’re not already running petabyte NAS???

          • maegul (he/they)
            link
            fedilink
            English
            92 months ago

            Kinda interesting at a broad level … that there’s still something to the efficiency of language.

            Sure storage is cheap now, but so much of the calculation of the utility of data in modern tech is the presumption of an internet connection and retrieval of information over the network.

            With the internet going to shit in various ways, local or decentralised computing is making more sense, at least depending on your priorities and perspective. And so all of a sudden, storage tradeoffs become a bit more meaningful. Do I need all of the pictures and media … or would a simple textual description suffice for most instances with high res media available at a more centralised archive if I’m really interested? A picture is worth 1000 words, but takes a hell of a lot more digital storage space!

            • @[email protected]
              link
              fedilink
              English
              12 months ago

              So many home instructions are so much easier with a photograph or two, or better yet a video.

    • @retrospectologyOP
      link
      English
      82
      edit-2
      2 months ago

      The 100Gb version mentioned above does only have thumbnails/lowres pictures, yeah. Better than nothing for some types of articles, but not everything. The true text-only version is actually only ~53Gb though.

      • @Psythik
        link
        English
        15
        edit-2
        2 months ago

        I’ve installed game patches that were larger than this.

        • @Valmond
          link
          English
          12 months ago

          They should put it in a popular game patch.

  • @[email protected]
    link
    fedilink
    English
    632 months ago

    So something akin to this joke image I saw the other day is actually feasible for Wikipedia?

    • Max
      link
      English
      182 months ago

      Chatgpt is also probably around 50-100GB at most

      • @[email protected]
        link
        fedilink
        English
        52 months ago

        Probably a lot less, keep in mind that whenever it answers a question the whole model is traversed multiple times, going through multiple GBs is not possible in the matter of seconds the model answers.

        • Max
          link
          English
          72 months ago

          I’d be surprised if it was significantly less. A comparable 70 billion parameter model from llama requires about 120GB to store. Supposedly the largest current chatgpt goes up to 170 billion parameters, which would take a couple hundred GB to store. There are ways to tradeoff some accuracy in order to save a bunch of space, but you’re not going to get it under tens of GB.

          These models really are going through that many Gb of parameters once for every word in the output. GPUs and tensor processors are crazy fast. For comparison, think about how much data a GPU generates for 4k60 video display. Its like 1GB per second. And the recommended memory speed required to generate that image is like 400GB per second. Crazy fast.

        • @jose1324
          link
          English
          162 months ago

          No, but it’s the model after the input that you need.

    • @[email protected]
      link
      fedilink
      English
      152 months ago

      I mean, you can self-host your own local LLMs using something like Ollama. The performance will be bound by the disk space you have (the complexity of the model you’re able to store), and the performance of the CPU or GPU you are using to run it, but it does work just fine. Probably as good results as ChatGPT for most use cases.

      • @Nooodel
        link
        English
        32 months ago

        We do this at work (lots of sensitive data that we don’t want Openai to capitalize on) and it works pretty well. Hosted locally, setup by a data security and privacy sensitive admin, who specifically runs the settings to not save any queries even on the server. Bit slower than chatgpt but not by much

  • Em Adespoton
    link
    fedilink
    English
    392 months ago

    Aside from the text clarification, this is also only the US version of Wikipedia.

    What worries me though is that most videos linked on Wikipedia are hosted on YouTube. That’s a pretty dangerous choke point.

    • aname
      link
      fedilink
      English
      54
      edit-2
      2 months ago

      You mean the English version? There is no US version, thank god.

    • @AnUnusualRelic
      link
      English
      142 months ago

      I never even noticed any videos on Wikipedia. Maybe for some cinema articles.

      • PhobosAnomaly
        link
        fedilink
        English
        172 months ago

        Ten year old me would beg to differ.

        Videos turned Encarta 95 from being an encyclopedia to the encyclopedia!

        I jest - a multimedia experience helps but I agree that the text knowledge is the big draw.

        • Greg Clarke
          link
          fedilink
          English
          32 months ago

          I remember watching the hand of God goal in the library many times using Encarta 95

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          The real ones remember wandering around that damn maze answering questions while managing limited torches to see the map.

  • @[email protected]
    link
    fedilink
    English
    382 months ago

    This saved my ass at my engineering chemistry exam (still a requirement, even for software engineers) where only offline tools were allowed. Love Kiwix!

    • snrkl
      link
      fedilink
      English
      112 months ago

      LOL… Malicious compliance at its best…

  • Aatube
    link
    fedilink
    362 months ago

    DYK that Kiwix was actually created by Wikipedia? Back in the late 2000s there was this gigantic effort to select and improve a ton of articles to make an offline “Wikipedia 1.0” release. The only remains of that effort are Kiwix, periodic backups, and an incredibly useful article-rating system.

    • @felixwhynot
      link
      English
      172 months ago

      Can you write more about the rating system you mentioned?

      • Aatube
        link
        fedilink
        7
        edit-2
        2 months ago
        1. There is a set of criteria to rate an article B, C, Start or Stub. These are called classes. Similarly, articles can be rated to be of 1 of 4 importance values to a particular WikiProject.
        2. There’s a banner on every article’s talk page. Any editor can change an article’s rating between one of the above classes boldly; if a revert happens, they discuss it according to the criteria.
        3. Some WikiProjects have their own criteria for rating articles. Some of them even have process to make an article A-class.
        4. Before this system, Wikipedia already had processes to make an article a Good Article or Featured article.
        • With GAs, a nominator should put a candidate onto backlog. Later, a reviewer will scrutinize the article according to criteria. Often, the reviewer asks the nominator to fix quite a bit of issues. If these issues are fixed promptly, or the reviewer thinks that there are only nitpicks, the article passes. If they aren’t fixed in a week or the reviewer thinks that there are major problems, the article fails.
          • As with other processes, the nominator and reviewer can be anyone, though reviewers are usually experienced.
        • With FAs, a nominator brings the candidate to a noticeaboard. Editors there then come to a consensus about whether the article should pass.
        • Both processes display a badge directly on passed articles.
        • Both processes have an associated re-review process where editors come to a consensus whether the article should fail if it were nominated today
        • There’s also an informal process called “peer review”, where someone just puts an article at a noticeable and anyone can comment about its quality.
        1. Articles are automatically sorted into categories by their rating and importance. Editors usually look at these to decide which articles to focus on nowadays.
    • @NewAgeOldPerson
      link
      English
      122 months ago

      I couldn’t afford to donate for a long time but I used it near daily. So now I do monthly, probably larger than average, contribution to make up for sibs from other cribs that can’t afford it. Pay it forward is indeed a golden rule.

        • @NewAgeOldPerson
          link
          English
          62 months ago

          No cape. I’m brown so I’m on the radar bad enough as it is as soon as I leave major cities lol.

  • @ohwhatfollyisman
    link
    English
    182 months ago

    i remember a time when it was only 2gb for all of wikipedia. usain bolt had just burst onto the world stage at the time.

    • Ricky Rigatoni
      link
      fedilink
      English
      202 months ago

      And by now he’s exited the solar system at incomprehensible speeds.

  • @CannedCairn
    link
    English
    122 months ago

    I did! I do! Also all public domain books as part of the project Gutenberg

  • @clearedtoland
    link
    English
    72 months ago

    I know there are a few companies working on DNA storage. From the comment below about the entirety of Wikipedia and Wiki Commons, I’d say that’d be a pretty practical thing to store.

    Here’s the wiki article about it.

  • @ThatWeirdGuy1001
    link
    English
    52 months ago

    Imagine downloading it just after some troll changed critical information lmao

  • @Farmfixit
    link
    English
    22 months ago

    I tried to download it but couldn’t get it to work :(

    • @retrospectologyOP
      link
      English
      3
      edit-2
      2 months ago

      Download the kiwix app for whatever OS you’re using, then go into Kiwix and click on the folder icon in the app and navigate to where the .zim file you downloaded is located. If you click it it should automatically pop-up and be viewable.

      If you did that and it’s still failing, is it giving you a specific error or anything?

    • @ripcord
      link
      English
      32 months ago

      What’s already been done…?

      • @[email protected]
        link
        fedilink
        English
        32 months ago

        Sorry, I meant to reply to the commenter with the chatgpt on a dvd pic saying that it’s actually feasible for Wikipedia.

  • Don_Dickle
    link
    fedilink
    Afaraf
    -112 months ago

    I am currently reading on terrorists while in the states. But something tells me I will get my IP banning me. But I have read a shitton and I highly doubt its just 100gb. Otherwise you would see it more on piracy sites.