It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).

Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?

Here’s a quick example, a random passage from a scientific article:

I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words “J.” and “Chem.”. The rest is just a random mess of characters.

Edit: thank you all for shitting on my handwriting. That was not asked for, and also not helpful. That sample was intentionally “not nice” but is how I would write a note for myself. (You should see how my notes look like when I don’t need to read them again, lol)

chatGPT can transcribe it perfectly, and also works on a slightly larger sample. Deepseek works ok-ish but made some mistakes, and gemini is apparently not available in my country atm. I guess the context awareness is what makes those models better in transcription, and also why I can read it back without problems.

  • @[email protected]
    link
    fedilink
    1910 hours ago

    I just asked chatGPT to transcribe it and it said

    The handwritten text in the image says:

    “Dimer stabilization free energies were also determined from thermodynamic integration (TI, see methods), which provide a direct validation of the MM-GBSA results.”

    J. Phys. Chem. B 2018, 122, 7038-7048

    There was a post on HN recently about using LLMs for OCR. https://news.ycombinator.com/item?id=42952605

    • @hinterluferOP
      link
      69 hours ago

      That’s perfect. Now I’m just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.

      Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it’s unreadable.

      • Optional
        link
        21 hour ago

        If I had to guess, I’d say it was the dot paper confusing the OCR reader. I suppose the LLM has some way to cancel out the dots and thereby gets a better scan of it.

      • @thefactremains
        link
        4
        edit-2
        6 hours ago

        Because LLMs can fill in gaps where the recognition fails.