If you merge the three versions of DataSet 9 that are found so far:
DataSet%209.zip : https://github.com/yung-megafone/Epstein-Files Data Set 9.tar.xz : https://archive.org/details/data-set-9.tar.xz dataset9-more-complete.tar.zst : https://github.com/yung-megafone/Epstein-Files
You will end up with 531,282 IMAGES files (PDF). You would think that there is a lot missing, however, the partially corrupted DataSet%209.zip gives us a DAT and OPT file to see what files remain.
The DAT file reveals there are only 531,307 IMAGES files (PDF) supposed to be in the archive. Which means only 25 PDF files are actually missing.
You’d notice that 25 PDF files couldn’t possibly be the remaining 80-ish GB that remains of the original DataSet 9, but the DAT file doesn’t reveal how many NATIVES there were.
NATIVES are media files like videos and audio. You can see an example if you have a full DataSet 10. But from DataSet 10 it reveals to us that all NATIVES have a placeholder as a PDF which is always 4670 bytes.
So by searching all files that are that exact size, it reveals there are about 135 NATIVES (media files) that are missing, which would be the rest of the 80 GB that is missing.
I have listed below what IMAGES (PDF) and NATIVES (media) files are missing, such that it is easy to coordinate to track down the remaining files that we need for a complete DataSet 9.
(Though the remaining PDFs could be placeholder for up to 25 more natives, which would have to be checked when finding them).
MISSING_EFTA_IMAGES:
EFTA00709804,EFTA00709805,EFTA00709806,EFTA00709807,EFTA00770595,EFTA00774768,EFTA00823190,EFTA00823191,EFTA00823192,EFTA00823221,EFTA00823319,EFTA00877475,EFTA00892252,EFTA00901740,EFTA00912980,EFTA00919433,EFTA00919434,EFTA00932520,EFTA00932521,EFTA00932522,EFTA00932523,EFTA00984666,EFTA00984668,EFTA01135215,EFTA01135708
MISSING_EFTA_NATIVES:
EFTA00068376,EFTA00072394,EFTA00072395,EFTA00072396,EFTA00072397,EFTA00072398,EFTA00072399,EFTA00072400,EFTA00072401,EFTA00083881,EFTA00089243,EFTA00090492,EFTA00093515,EFTA00093697,EFTA00096469,EFTA00104842,EFTA00135578,EFTA00143411,EFTA00143735,EFTA00151167,EFTA00151168,EFTA00151169,EFTA00152684,EFTA00152685,EFTA00152686,EFTA00152687,EFTA00152688,EFTA00152689,EFTA00152690,EFTA00152691,EFTA00152692,EFTA00155484,EFTA00155485,EFTA00155486,EFTA00155488,EFTA00155489,EFTA00155490,EFTA00155551,EFTA00157542,EFTA00159164,EFTA00165150,EFTA00179442,EFTA00179443,EFTA00179444,EFTA00179445,EFTA00179446,EFTA00182656,EFTA00182657,EFTA00184097,EFTA00184098,EFTA00221035,EFTA00221036,EFTA00221037,EFTA00221038,EFTA00221039,EFTA00221040,EFTA00221041,EFTA00221042,EFTA00221043,EFTA00221044,EFTA00221045,EFTA00221046,EFTA00221047,EFTA00221048,EFTA00221049,EFTA00221050,EFTA00221051,EFTA00221052,EFTA00221053,EFTA00221054,EFTA00221055,EFTA00221056,EFTA00221058,EFTA00221059,EFTA00239786,EFTA00239787,EFTA00241270,EFTA00276490,EFTA00277088,EFTA00277091,EFTA00277094,EFTA00277095,EFTA00277096,EFTA00277098,EFTA00279451,EFTA00279453,EFTA00759424,EFTA00776196,EFTA01140431,EFTA01140602,EFTA01141209,EFTA01141213,EFTA01144362,EFTA01144363,EFTA01144697,EFTA01145825,EFTA01147043,EFTA01149290,EFTA01149291,EFTA01173979,EFTA01177273,EFTA01177560,EFTA01177632,EFTA01181146,EFTA01182315,EFTA01184143,EFTA01190710,EFTA01192998,EFTA01193063,EFTA01194887,EFTA01195505,EFTA01196058,EFTA01196418,EFTA01196421,EFTA01196518,EFTA01196747,EFTA01196752,EFTA01196754,EFTA01196756,EFTA01196936,EFTA01197105,EFTA01197126,EFTA01197787,EFTA01197931,EFTA01198064,EFTA01198505,EFTA01204371,EFTA01205883,EFTA01206089,EFTA01250813,EFTA01250814,EFTA01250815,EFTA01250886,EFTA01250917,EFTA01250922
I know it would be ridiculously ironic, but if any CSAM is in there could the authorities get you in trouble over its possession or even distribution?
Probably. CSAM is CSAM, I’m not sure the law would differentiate. Probably one of the reasons Dataset 9 has taken time to get restored, as I believe it was said it had some accidentally unredacted/uncensored CSAM in it?
You rock. I didn’t realize NATIVEs had a placeholder PDF. I’ll try and scrape the media files tonight to add to the existing dataset 9 more complete archive.
I fucking love this community. This is good and necessary work.
That’s good news. With the amount of people interested in these files and in data preservation it’s bound to be only a matter of time until the whole dataset 9 is restored. Someone out there’s gotta have the rest of the files.

