Needing to type up transcriptions from digital newspaper images or printed documents?
For example, the other day I posted about my paternal grandmothers wedding attire. You can read about What did Olive Wear to Her Wedding, if interested. The details came from The Evening Post. It is available as a digital version in the PapersPast archive from the National Library of NZ.
Last year I looked for easier and quicker ways than typing to transcribe printed text. Now, as The Lazy Genealogist, I am sharing this with you. I use the free program PDF OCR X. Here is the description, as they note on their website:
PDF OCR X is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF even if that text is contained in an image. This is particularly useful for dealing with PDFs that were created via a Scan-to-PDF function in a scanner or photo copier.
Why I like this Free OCR software?
- This works on my Mac. There is a Windows version too.
- The version I use is FREE OCR software, to repeat it again.
- It can handle lazy screen dumps, quality permitting.
- You can convert PDF, JPEG, GIF, PNG, PICT, BMP, and most common image formats.
- Images converted can be output as text. See example below for quality.
- Useful for genealogists with materials scanned to image. Create a searchable document.
- Output quality depends on the input quality. (Obviously!)
- Super simple to use. Just drag and drop the image or PDF onto the window. My lazy choice. Or use the select file button.
- Easily include text and images in blog posts.
- This text is then searchable by search engines.
- The old fashioned “carriage return” or “line break” is in the correct position.*
What does it look like?
This is what the free OCR software looks like on my Mac after 2 files of images have been dropped onto the grey shaded area. After conversion the files are available below the grey section. The Open button opens the text file and the reveal button show which folder the text has been saved. Making it easy to find if you are converting lots of images.
When you drag and drop a file you see a screen with the OCR conversion settings options. The prime two settings to consider are the selection of Plain Text or Searchable PDF and the Output Folder selection. Clicking Convert starts the OCR process.
Next the new file appears under the grey shaded area and select plain text then a file opens up in TextEdit, on my Mac.
How well does PDF OCR X perform?
For the blog post mentioned earlier here is the image I converted. Some simple cropped screen dumps joined together in an image for the blog post about my Grandmothers wedding dress.
Following is the text that PDF OCR X returned. As you can see it requires you to check and edit the text. I think there are about the same amount of errors I would have to correct when “touch” typing. It does save me time.
After correction I then pasted the text into the blog post so that it is now searchable by search engines.
Testing another example. See next 2 part image. Screen dump image above and text from the OCR below. I enlarged the output font size to 20 to show that the input image font was also large.
A third test: This one with lighter text was not successful at all with the lazy genealogist small screen dump version. It was from a different newspaper than the previous examples.
So I tried it with a better quality down loaded image rather than the lazy genealogist screen dump method. A much better result.
For larger sections of text I even use PDF OCR when sites have text available. For example the PapersPast site has a continuous stream of text that I find more inconvenient to add line breaks than fix up OCR errors.
This is the App Logo to look for in the Mac App Store.
Remember always vet any software you are downloading to your computer. Make sure you have the appropriate security and malware protection in place.
Disclaimer: I was not paid for this review.
If you try out the windows version or the option to convert to searchable PDF please leave feedback.