X

Thank you for sharing!

Your article was successfully shared with the contacts you provided.
“Consolidation” is key in much of today’s global economy, and nowhere is that as evident as with Optical Character Recognition (OCR) software. OCR software is used to convert the text on scanned images of a printed page into characters that can be indexed and searched by a full text search program or edited in a word processor or text editor program. Lawyers need OCR software to recycle draft contracts, interrogatories, the results of document production, or any other document that originates in someone else’s computer. In the last year it seems that ScanSoft, a publicly held corporation at one time owned by Xerox, acquired most of the well known OCR software that hadn’t been owned by Caere Corporation, an established name in OCR that had been on an OCR acquisition program of its own. ScanSoft then acquired Caere. There are three different OCR solutions from ScanSoft, TextBridge Pro Millennium Business Edition from the company’s Xerox tradition, the new edition of PaperPort from its Visioneer acquisition, and Version 10 of OmniPage Pro, the entry from ScanSoft’s Caere antecedents. The first OCR program that we remember did little more than look at a scanned page and attempt to translate each separate image of each character on the page into its text character equivalent. The program, which required a co-processor board to handle the heavy translation effort, worked well with very clean typewriter originals, particularly those printed with an IBM Electric ball with a font known as OCR B, but not as well with newspapers, books, faxes, carbon copies — this was a long time ago — or even second and third generation Xerox copies. “Well” was defined as 98 or 99 percent accuracy. This sounded good but meant that on a typical double spaced page with about 400 words or 2000 characters, this meant between 20 and 40 errors on a page. Cleanup wasn’t particularly difficult with the help of a good word processor spell checker, but both software and scanners were very expensive, and some secretaries found it easier to just enter the page with a typewriter, instead. Today’s OCR products are almost perfect on even complex proportional fonts printed with laser or inkjet printers, and deal with newspapers and books. Simple character image recognition has been supplemented with smart technology that can deal with columns, tables and other complex page formatting. Further, these programs use word recognition that compares each word with a dictionary and lets the user know when the characters it thinks it has recognized aren’t in the dictionary, hence are a potential error. As far as we can tell, none of these programs move to the sort of “phrase recognition” that speech recognition products like ViaVoice or Naturally Speaking use. Modern programs also pick up font type and size, and attributes such as bold or italics. The output from these programs can be automatically sent to your choice of word processing programs. THE PRODUCTS We tested the top of the line OmniPage Pro and TextBridge Millennium Pro Business Edition, each with a $500 price tag. The programs use different OCR engines, but both programs deal with English or a variety of foreign languages, both will “zone” each page to properly deal with columns, tables and other complex formatting. An OCR program that doesn’t “zone” well can run the lines from a two column page together, so that each recognized line would consist of one half of column one and one half from column two. We tested both programs on a variety of documents around the office including multipage contracts and pleadings. Both programs ran a Twain-compliant Hewlett Packard OfficeJet 500 scanner with an automatic document feeder (ADF). With our combination of a scanner capable of about six letter-sized pages per minute and a 266 MHz Pentium II processor, the OmniPage seemed to be able to recognize and read the scan at the same time, making the entire process a bit faster than with the TextBridge. The best way to deal with a long document, of course, is to toss it into the ADF and take a coffee break. Both products let us customize the zoning on individual pages, but both worked well enough with automatic zone recognition. Both programs pointed out unrecognized words and suggested possible alternatives, but we thought that the OmniPage method of presenting choices and highlighting possible errors was easier to see than the TextBridge version. Both programs did considerably better than 99 percent with the variety of documents that we were using to test, and two or three possible errors per page was typical, and about half of these were specialized legal vocabulary. Both programs let us add such words to a user dictionary so that they wouldn’t show up as errors the next time they were encountered. OmniPage did a better job picking up page formatting, but we had to do some formatting cleanup with the results of either product. OmniPagePro, but not TextBridge, has a text to speech option that lets the computer read the recognized pages aloud. Listening to the recognized text while following the scanned pages would seem to be an excellent way to proof the OCR output. The program does a good job of reading the words in a pleasant voice, but does not specifically mention punctuation, which makes the procedure less than fully useful for any document in which the placement of a comma or semicolon is important. The program also has a nasty habit of reading numbers as words, so that our local zip code is read as “sixty thousand four hundred thirty” rather than the much easier to understand “six zero four three zero”. TextBridge, but not OmniPagePro lets the user output the finished text as a PDF (Portable Document Format) file, a handy step if you want to use PDF. Users of OmniPage Pro will have to save to a word processor that can output PDF, or use Adobe Acrobat to change to a PDF file. We didn’t test the $79 standard, non-business edition of TextBridge Millennium, but are told that the only difference between the two is that $79 doesn’t buy PDF output. Although we like OmniPagePro a little better than TextBridge, if you don’t need PDF the $79 non-business TextBridge is a much better deal. The $60 PaperPort Deluxe 7.0 uses the TextBridge recognition engine and does, oddly enough, output to PDF, but we found it better as a personal, desktop clutter removal system, great for bills and receipts, letters and the like. You can also set the program to automatically index the documents that it scans and OCRs, so that you can find it when needed. Version 7 doesn’t differ much from earlier versions, but is a good buy if your OCR and scanning needs aren’t great. SUMMARY The latest $500 versions of TextBridge and OmniPage OCR systems are both excellent products, but if you don’t need PDF output, the $80 non-business edition of TextBridge is an excellent buy. PaperPort Deluxe Version 7.0 is little changed from earlier versions, but still a great personal scanning and OCR product to help eliminate desktop paper clutter. DETAILS OmniPage Pro 10, Price:$499.99. TextBridge Pro Millennium, Business Edition, Price: $499. TextBridge Pro Millennium, $79.00. PaperPort Deluxe 7.0, $79.00. All require computer with Pentium processor running Microsoft Windows 95 / 98 / 2000 or Windows NT 4.0. OmniPage requires 50 to 90 Mbytes hard disk space; TextBridge requires 40 Mbytes disk space. PaperPort requires 60 Mbytes hard disk space. ScanSoft Inc., 9 Centennial Dr., Peabody, MA 01960. Phone:(978) 977-2000. Web: www.scansoft.com.

This content has been archived. It is available exclusively through our partner LexisNexis®.

To view this content, please continue to Lexis Advance®.

Not a Lexis Advance® Subscriber? Subscribe Now

Why am I seeing this?

LexisNexis® is now the exclusive third party online distributor of the broad collection of current and archived versions of ALM's legal news publications. LexisNexis® customers will be able to access and use ALM's content by subscribing to the LexisNexis® services via Lexis Advance®. This includes content from the National Law Journal®, The American Lawyer®, Law Technology News®, The New York Law Journal® and Corporate Counsel®, as well as ALM's other newspapers, directories, legal treatises, published and unpublished court opinions, and other sources of legal information.

ALM's content plays a significant role in your work and research, and now through this alliance LexisNexis® will bring you access to an even more comprehensive collection of legal content.

For questions call 1-877-256-2472 or contact us at [email protected]

 
 

ALM Legal Publication Newsletters

Sign Up Today and Never Miss Another Story.

As part of your digital membership, you can sign up for an unlimited number of a wide range of complimentary newsletters. Visit your My Account page to make your selections. Get the timely legal news and critical analysis you cannot afford to miss. Tailored just for you. In your inbox. Every day.

Copyright © 2020 ALM Media Properties, LLC. All Rights Reserved.