When I scan in documents I use a product called PaperPort for my Optical Character Reader (OCR – turns images of words into text that can be edited in a word processor) but it does not know about Norwegian characters. So it has been a lot of work for me to clean up the result of a Norwegian scan in order to use Google Translate on it. Needless to say, I was delighted to read that there is an online OCR program for Norwegian! Jim Bergquist, a fellow subscriber to the rootsweb Norway list, posted the step by step process to that group for translating farm book entries using this tool and he has given me permission to rephrase his method on this blog. Here it is:
- Crop the text part of your scanned image and save as a separate image file. Make sure to do a multi-column pages one column at a time.
- Go to http://www.i2ocr.com/free-online-norwegian-ocr . This is an online Optical Character Recognition tool. You don’t have to download or install any software.
- Instructions at the bottom tell you to:
- Click the “File” radio button. Press “Select Image”. Use the file box to navigater to where you put the image on your computer.
- Leave the language in Norwegian.
- Enter the two numbers or words separated by a space. These are used to prevent automated robots from using the site for hours.
- Press “Extract Text.”
- Three buttons will appear at the bottom of the screen and the extracted text will be in the left hand box (see example below).
- Download, to put it on your own computer (a good choice).
- Translate, I haven’t used – it may send it as-is to Google Translate. However, OCR usually requires some corrections to be made, so you should look at the result and correct it before trying to translate.
- Edit in Google Docs, if you are familiar with working on documents in the cloud.
- Of course you can just cut and paste the text in the left hand box over to your word processor instead of any of the above options, which is what I did.
- When you have corrected any OCR errors in the file, select the text and paste it into Google Translate.