Well not quite but close. I’m holding a hard disk that has ALL of Wikipedia’s text in 10 different languages.

Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn’t that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.

I also have all of Wiktionary in the same hard drive. It’s around 16.4 GiB.

  • @[email protected]
    cake
    link
    fedilink
    English
    275 months ago

    It also connects you to a huge swath of humanity and the editors that brought that content to you.

    • @droning_in_my_earsOP
      link
      English
      195 months ago

      Yeah it’s pretty incredible. Wikimedia is the kind of project that almost feels like a small glimpse into a better world. What the internet could have been. It’s got some problems of course but it’s still a huge success.

      • @[email protected]
        link
        fedilink
        English
        14 months ago

        Uh, wikipedia is what the internet is.

        Wikipedia’s not a glimpse of a better world, it’s a glimpse of our current, existing world. Because wikipedia exists.

        It’s not like that hard drive came through a portal from another universe.

  • @trolololol
    link
    English
    105 months ago

    Not the sum. The summary.

  • Masterblaster
    link
    fedilink
    75 months ago

    there’s still so much valuable academic information that never sees the light of day, or gets erased as the internet serpent eats its own tail.

  • @[email protected]
    cake
    link
    fedilink
    English
    65 months ago

    Last time I looked into downloading Wikipedia it said it was 50gb for English text and 100 with images. How’d you get it for half the space?

    • @droning_in_my_earsOP
      link
      English
      95 months ago

      It’s only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.

      • @AbouBenAdhem
        link
        English
        25 months ago

        Does it include each article’s edit history, talk page, etc?

        • @ace_garp
          link
          English
          35 months ago

          OK yes, some supporting info is: Aard2 is an offline wikipedia app, that uses small compressed data files in .slob format.

        • @[email protected]
          link
          fedilink
          English
          14 months ago

          Slob compression is best visualized as putting a sleeping bag into a stuff sack, except it’s all your possessions and you’re stuffing them into an old Chevy Metro