Apr 3, 2013

Realtime OCR experiments with tesseract and openFrameworks

A few weeks ago, while trying to scan and OCR several pages of a book, I wished there was some app that could recognise the text from a webcam's snapshots straight away.
I started by having a look at what has been done, and how I could experiment with OCR and openFrameworks. One of the most used open OCR libraries is tesseract. There is also Kyle McDonald's ofxTesseract addon for openFrameworks.
It seems the library's image processing is quite slow, and it works only with images of the type that come from a flatbed scanner. A small angle in the text and it won't be recognised.
To improve this I preprocessed the images isolating and rotating the word-blobs.

It is quite slow for realtime, and I still have to play with the preprocessing as suggested in this link (tesseracts preprocessing feels quite slow).

1 comment:

  1. if you like tesseract ocr, you may like this free online ocr tool using tesseract ocr 3.02