Monday, February 02, 2009

Tesseract GUI for Mac OS X

Update 3 May 2011:

I've released new version for TesseractGUI (0.3) - the new version is using tesseract version 3 and it is linked against libtiff so it is able to open compressed TIFF files (packbits, lwz).




I've created a very simple user interface on top of tesseract.

It's available, along with sources under the MIT license, at: http://download.dv8.ro/files/TesseractGUI/

To use Tesseract GUI, open a TIF image.




After a couple of seconds, Tessertact GUI will switch to the text tab where it displays whatever it was able to recognize:




Nothing is saved, so if the recognition was successful you probably want to copy and paste it in some document.

20 comments :

Lenni said...

Hmm, I don't seem to be able to open any Tif files? All files are greyed out.

diciu said...

TesseractGui only looks for files with extension ".tif".
(i.e. it will not open "tiff").

This is a limitation of the tesseract tool that is probably easily removed, but I haven't looked into it.

My Hiking Photos said...

Would you compile for 10.4 as well as 10.5? 10.5 limitation is silly regarding this simple front end to Tesseract... Limiting users is silly. A lot of us don't use 10.5

diciu said...

@My Hiking Photos - Compiling for 10.4 takes quite a bit of work because TesseractGUI relies on things that do not exist in 10.4.

Jack said...

Very helpful. Your work is much appreciated. Have you looked into the upcoming OCRopus?

For those of us not used to compiling from source and other command-line tasks, your work is especially valuable.

JACK

diciu said...

@Jack - Yes, I've looked at OCRopus but it looks to me like it will take a lot of work to compile all of it on Mac OS X in such a way that it's usable from a standalone application.

It looks like someone has created a .pkg for OCRopus: http://stuporglue.org/tako/ - you may be able to use that if installing a package is an option.

Anonymous said...

works great! thank you so much!

Brij said...

Thanks a Lot!!

Pat Leevers said...

Super-simple, but still works fine under 10.6. Thanks!

Wanapitei said...

Somethings wrong. I'm on Snow Leopard 10.6.5. I've scanned two documents, both saved in .tif format. Open up TesseractGUI. The program opens fine, shows the scanned images. However the progress bar is static, doesn't move. After 10 minutes nothing has appeared in the Text tab.

A compatibility issue with Intel Macs?

diciu said...

@Wanapitei - it works under Intel Macs but the command line tool it uses is very fussy about the TIFF files it expects as input.
If you're willing to investigate using the Terminal.app this post describes possible solutions:
processing tif images for tesseractgui

Beware of Doug said...

This is awesome, a great entry level Tesseract application. Is there a way to train this Tesseract?

diciu said...

@Beware Of Doug - i'm not sure how training works but I imaging it creates some files inside the TESSDATA_PREFIX directory.

If this is the case, the training files could be copied inside the Contents/Resources/tessdata folder.

I haven't actually tried this, but it might work.

Anonymous said...

Very nice to have such a tool shared a with a open license. Great work together with tesseract!

Ali Shams said...

Thanks. This was really helpful. Keep up the good work.

goetzibubu said...

I try to use Tesseract in German. The input of "german" or "deutsch" didn't change. So I downloaded "deu.traineddata" form somewhere and put it into Contents-Resources-tessdata without success.
And: Now every time I open a file there is the error message that the language is not found. Even after I deleted all information in Preferences...

diciu said...

@goetzibubu - I remember I've tried it with German a while back and I think it worked - see if this helps: http://blog.loudhush.ro/2011/05/using-tesseractgui-with-other-languages.html

goetzibubu said...

I thank you.
This helped for the 1st step. I have to enter "deu" and for every file.
But the problem is remaining: Nothing is recognized. Maybe because I use handwritten text? Is there any chance to get trained data for this feature?

diciu said...

@goetzibubu - training Tesseract for handwritten test is probably a very challenging task - this thread touches on the subject: https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/xMYY55Dzxik%5B1-25%5D

goetzibubu said...

sorry for the delay.
But the last days I try to train tesseract my handwritten font.
With success - partly...
I created an own font. No change.
I move to tesseract under WIN7 there are better packages supported.
I created box files but I stopped at mftraining and the font_properties /V3).
I wonder if this time is lost...