I am looking for a optical character recognition solution and I've checked out OCRopus, but the early alpha stages it's in make it very hard to compile. OCRopus lists
tesseract as a dependancy so I've compiled and ran tesseract on a couple of scanned pages.
The results are impressive (see below the results of running it on a page from a Cisco manual).

|
Chapter 24 • Mixed-Media Bridging ending delimiter, which follows the data field) are treated differently depende ing on the bridge manufacturer Some bridge manufacturers simply ignore the bits. Others have the bridge set the C bit (to indicate that the frame has been copied) but not the A bit (which indicates that the destination station recog- nizes die address). Ln the former case, a Token Ring source node determines whether the frame it sent has become lost. Proponents of this approach sug~ gest that reliability mechanisms, such as the tracking of lost frames [..]
|
3 comments:
Tried my hand at this... Uber fail. I don't have much experience with terminal, so this is way out of my capabilities atm. Very cool thing to be able to do though.
I've created a very simple GUI on top of tesseract.
If you want to experiment with tesseract, this should make things easier.
Thanks diciu, I'll give it another go sometimes installing Tesseract. I am just going to need to meet up with a linux guru or something and get some tutoring =/.
Post a Comment