Translating Farm Books Using a Norwegian OCR program

When I scan in documents I use a product called PaperPort for my Optical Character Reader (OCR – turns images of words into text that can be edited in a word processor) but it does not know about Norwegian characters.  So it has been a lot of work for me to clean up the result of a Norwegian scan in order to use Google Translate on it. Needless to say, I was delighted to read that there is an online OCR program for Norwegian!   Jim Bergquist, a fellow subscriber to the rootsweb Norway list, posted the step by step process to that group for translating farm book entries using this tool and he has given me permission to rephrase his method on this blog. Here it is:

  1. Crop the text part of your scanned image and save as a separate image file. Make sure to do a multi-column pages one column at a time.
  2. Go to . This is an online Optical Character Recognition tool. You don’t have to download or install any software.
  3. Instructions at the bottom tell you to:
    1. Click the “File” radio button. Press “Select Image”. Use the file box to navigater to where you put the image on your computer.
    2. Leave the language in Norwegian.
    3. Enter the two numbers or words separated by a space. These are used to prevent automated robots from using the site for hours.
    4. Press “Extract Text.”
  4. Three buttons will appear at the bottom of the screen and the extracted text will be in the left hand box (see example below).
    • Download, to put it on your own computer (a good choice).
    • Translate, I haven’t used – it may send it as-is to Google Translate. However, OCR usually requires some corrections to be made, so you should look at the result and correct it before trying to translate.
    • Edit in Google Docs, if you are familiar with working on documents in the cloud.
    • Of course you can just cut and paste the text in the left hand box over to your word processor instead of any of the above options, which is what I did.
  5. When you have corrected any OCR errors in the file, select the text and paste it into Google Translate.

Here is an excerpt from the Fatland farm in Etne, Norway that I tried (from EtneSoga III). The page shows the image I used on the right and the resulting text on the left.


Here is the Norwegian text it gave me:

er det handfaste teiknet på at skottehandelen og annan tømmerhandel var ein viktig
del av økonomien, her som i Amevik.

Eit uttrykk for verdien av skogen og saga ser vi i tingboka 1768. Der vert det opplyst
at garden i uminnelege tider hadde hatt ei flaumsag med løyve til å skjera 400 bord
årleg, helvta til utførsel, og bonden då, Hans Olsson, bad om tingsvitne på at skogen
tolde 600 bord årleg.

I 1723 sat Anders Fatland med garden. Han dreiv og med 2 hestar, 19 storlevande og
24 sauer. Han sådde l/8 t bygg og fekk 3 t att, og av havre sådde han 6 t, som gav 24
t att, altså 4 foll, som ikkje er imponerande i forhold til det vanlege då. Men han
hadde tydelegvis oppdaga at det vart dyrt i lengda å bryggja øl av skottemalt, så han
var då blant dei få som sådde sin eigen bygg.

I 1802 var garden framleis heil og udelt.

Hans Olsson heldt 2 hestar, og garden fødde 14 kyr og 20 sauer. Han sådde 4 t havre
og l/2 t poteter, som gav att 32 t og 4 t. Saga var framleis ein del av næringa.

Det var ein husmann med 1 ku og 4 sauer.

1865 var garden delt i to sjølveigande bruk, og det var 6 husmenn med jord.

Hans Petersson dreiv med hest, 9 kyr og 28 sauer. Utsånaden var 4 t av havre og

Knut Larsson hadde 8 kyr og 30 sauer, og bruket fødde hest. Utsånaden var 3 l/2 t
havre og 4 t poteter.

Samla fødde husmennene 10 kyr og 57 sauer. Det mest produktive bruket på Fatland
var såleis husmannsplassa i 1865. Dei sådde 3 l/8 t havre og 4 3/4 t poteter.

The translation Google came up with was the following:

there are tangible signs that the Scottish trade and other timber trade was an important part of the economy , here as in Amevik .

An expression for the value of the forest and the thing we see things in the book 1768 . Der committee states the garden from time immemorial had a flaumsag with permission to skjera 400 table annually , halva to export , and when the farmer , Hans Olsson , asked witness things that forest tolde 600 tables annually .

In 1723 Anders sat Fatland with garden . He ran and 2 horses , 19 large living and 24 sheep. He sowed l / 8 t building and got 3 h again, and oats sowed his 6 h , which gave 24 t att , ie 4 -fold , which is not impressive compared to the ordinary then. But he clearly had discovered that it was expensive in the long run to the wharf beer of Scottish malt, so he was then among the few who sowed their own buildings.

In 1802 the farm was still whole and undivided .

Hans Olsson kept two horses and farm born 14 cows and 20 sheep. He sowed oats 4 hand l / 2 t potatoes, which gave att 32 h and 4 h Saga was still a part of the industry .

It was a crofter with one cow and four sheep.

1865 garden divided into two self- use , and there were 6 smallholders with soil.

Hans Petersson ran with the horse , 9 cows and 28 sheep. Utsånaden was 4 h of oats and potatoes.

Knut Larsson had eight cows and 30 sheep, and farm -born horse. Utsånaden was 3 l / 2 h oats and potatoes 4 h .

Total births crofters 10 cows and 57 sheep. The most productive farm on Fatland was thus croft in 1865. they sowed 3 l / 8 t oats and 4 3/4 t potatoes.

The Hans Olsson mentioned above is the brother of my 4th grandmother, Metta Olsdatter Ve (Fatland). Now to get one of my Norwegian cousins who is also descended from Fatland farm to fix up this translation!

3 thoughts on “Translating Farm Books Using a Norwegian OCR program

  1. Hei!

    Jeg fant et program (nedlastbart) som heter gImageReader hvor man slipper å bruke online-tjenester og har mye mer kontroll over bildebehandling og slikt! Man kan f.eks rotere og tilpasse bildet (jeg bruker mobilkameraet som sanner :).

    Jeg jobber med å bli MVA registrert så jeg ville digitalisere alle relevante kvitteringer og få teksten søkbar, etc.

    Problemet er at jeg ikke får til å installere norsk ordliste! De henviser til LibreOffice sine sider i Readmetexten, men der har de gått over til “extensions” så jeg finner ikke .diff-filen de spør etter!
    Heller intet hell på,,, eller andre søkemotorer.
    Gir ikke opp!

