Well not quite but close. I’m holding a hard disk that has ALL of Wikipedia’s text in 10 different languages.

Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn’t that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.

I also have all of Wiktionary in the same hard drive. It’s around 16.4 GiB.

  • @[email protected]
    link
    fedilink
    English
    610 months ago

    Last time I looked into downloading Wikipedia it said it was 50gb for English text and 100 with images. How’d you get it for half the space?

    • @droning_in_my_earsOP
      link
      English
      910 months ago

      It’s only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.

      • @AbouBenAdhem
        link
        English
        210 months ago

        Does it include each article’s edit history, talk page, etc?

        • @ace_garp
          link
          English
          310 months ago

          OK yes, some supporting info is: Aard2 is an offline wikipedia app, that uses small compressed data files in .slob format.

        • @[email protected]
          link
          fedilink
          English
          19 months ago

          Slob compression is best visualized as putting a sleeping bag into a stuff sack, except it’s all your possessions and you’re stuffing them into an old Chevy Metro