• @[email protected]
    link
    fedilink
    English
    27 hours ago

    Yes, PDFs are much more permissive and may not have any semantic information at all. Hell, some old publications are just scanned images!

    PDF -> semantic seems to be a hard problem that basically requires OCR, like these people are doing

    • JackbyDev
      link
      fedilink
      English
      13 hours ago

      Oh nice, thanks for sharing that project. I haven’t heard of it before!