Text Recognition Using Google Tesseract

Table Of Content
Facebook
X
LinkedIn
text-recognition-using-google-tesseract

Text Recognition or Optical Character Recognition (OCR) is an electronic and mechanical way to convert typed, handwritten or printed text into machine-encoded code (elaborate more about OCR for example). Google Tesseract is an open-source OCR engine initially developed by HP and was released as open-source software in 2005 and was sponsored by Google in 2006.

Tesseract Implementation in Python

For our implementation, we are going to use a Python wrapper for Tesseract known as Pyteseract. There are other wrappers of Tesseract available for different languages which can be found here. We can install it using pip install by Tesseract.

We can start it with reading the image and converting it into a NumPy array using the OpenCV function imread as shown in the image below.

img-=-cv2

We can also check the image if it’s correctly read or not using matplotlib function imshow.

plt-imshow

We have read the image and now we have the image in the form of an array showing pixel values. Now we have to process the information using tesseract using the function image_to_string which has Google Tesseract pre-trained models. We can use Pytesseract with default configurations or with our configuration.

For default configuration we just need to pass the image array to function as shown below.

image-to-string

For our own custom configuration, we can set the input language, OEM (OCR engine mode), and page segmentation.

  • Language: We can set our Tesseract model to detect from single or multiple languages. We can set different language configurations using the -l eng argument.
  • OEM (OCR engine Mode): We can set different OCR modes with this configuration. Currently, in Tesseract 4, there are 4 modes available
    • 0  Legacy Engine
    • 1  Neutral nets LSTM engine only
    • 2  Legacy + LSTM engines.
    • 3  Default, based on what is available.

a-complete-guide-to-computer-vision

We can try different OEM configurations using the OEM 3 argument

  • Page Segmentation: We can adjust different page segmentation according to our text for better results. Currently in Tesseract 4 following page segmentations are available
    • Orientation and script detection (OSD) only.
    • Automatic page segmentation with OSD.
    • Automatic page segmentation, but no OSD, or OCR.
    • Fully automatic page segmentation, but no OSD. (Default)
    • Assume a single column of text of variable sizes.
    • Assume a single uniform block of vertically aligned text.
    • Assume a single uniform block of text.
    • Treat the image as a single text line.
    • Treat the image as a single word.
    • Treat the image as a single word in a circle.
    • Treat the image as a single character.
    • Sparse text. Find as much text as possible in no particular order.
    • Sparse text with OSD.

We can try different page segmentation using –psm 3 argument.

The following example demonstrates setting custom configurations for our model.

psm-3-argument

Results of model:

We have run our model with default configurations and have fetched the following results.

Input Image

input-image

Output text:

Output-text

We can achieve better results with better pre-processing but there are also a few limitations of this model i.e. we can’t train it on our dataset for better results.

Pros and Cons of Tesseract

Pros

● Support for 40 languages
● Very easy to use (see the manual page, not built-in help)
● Works great with 300 DPI files
● Open source and had different wrappers for different programming languages

Cons

● Rudimentary image processing
● Not good results with tilted text
● Not good results with sharp and bright images
● Not good with stylish handwritten texta-complete-guide-to-computer-vision