• JackbyDev
    link
    fedilink
    English
    118 hours ago

    Oh boy, I sure am excited to websites hosting PDFs! I love when the tool that everyone uses for hosting and viewing HTML get to be blessed with the perfect format that is PDF!

    I LOVE PDFS! I love two column PDFs! I love reading like this!

    1 3
    2 4
    5 7
    6 8

    Instead of like this

    1
    2
    3
    4
    5
    6
    7
    8

    It’s amazing and such a good user experience!

    I love that PDFs are so difficult to transform into HTML, too. I would never want the besmirch the publishers oerfect one approved layout by resizing the window!

    • @werefreeatlast
      link
      English
      158 minutes ago

      Choose your own adventure PDF! 1, 5, 7, 3, 9, 2, 0, 6, 4, 8! What an ending!

    • @[email protected]
      link
      fedilink
      English
      37 hours ago

      I love that PDFs are so difficult to transform into HTML, too

      FYI, if that’s relevant to your field, every new article published on arxiv.org now has a HTML render as well.

      And on many older publications, transforming “arxiv.org” into “ar5iv.org” leads to an HTML rendering that is a best-effort experiments they ran for a while.

      • JackbyDev
        link
        fedilink
        English
        26 hours ago

        That’s really cool! What I really would like is a tool that converts PDFs to semantic HTML files. I took a peek there and it seems easier for them because they have the original LeX source.

        I think for arbitrary PDFs files the information just isn’t there. I’ve looked into it a bit and it’s sort of all over. A tool called pdf2htmlex is pretty good but it makes the HTML look exactly like the PDF.

        • @[email protected]
          link
          fedilink
          English
          25 hours ago

          Yes, PDFs are much more permissive and may not have any semantic information at all. Hell, some old publications are just scanned images!

          PDF -> semantic seems to be a hard problem that basically requires OCR, like these people are doing

          • JackbyDev
            link
            fedilink
            English
            11 hour ago

            Oh nice, thanks for sharing that project. I haven’t heard of it before!