So far I am finding that the common ancestors with Dad’s DNA matches at both 23andme and FamilyTreeDNA are much further back than predicted. We have found the MRCA for only those distant cousins with good paper trails and perhaps even a tree at GENI like we have.
Most of these matches are only one or two segments and the longer the segment the more likely it is to be a real match with a discoverable common ancestor. I actually found a new 5th cousin of mine through DNA, Dad’s 4th cousin once removed. She has a one segment match of 17.14 CMs and 2849 SNPs in common with Dad and our common ancestors are in the 1700s at farm Fatland in Etne, Hordaland, Norway (online resources for Etne research listed at familysearch.org)
23andme shows you all your 7cm and larger matches but many genetic genealogists think anything less than 10cm is suspect. My view is that if Dad’s match is also a match with either me or my brother (n.b. frequently the match is for fewer SNPs and CMs in the next generation) then it is real, even at 6cm. As you can see in the chart, we have found many common ancestors with smaller than 10cm matches. GEDmatch lets you look at even smaller segment matches with specific people as does Family Tree DNA.
Here is a summary of the most recent common ancestors in Norway that I found for Dad with some of his DNA matches:
CMs | SNPs | MRCA | relationship |
8.4 | 1467 | Ingeborg Djupesland (Bårdsdatter) b 1650 | 6th cousin * |
5.7 | 1308 | Ola Narvesen Glaim 1621-1714 | 7th cousin |
9.9 | 2129 | Gunnar Olafsen Gangså 1570-1639 | 10th cousin twice removed |
6 | 1197 | Gunnar Olafsen Gangså 1570-1639 | 10th cousin twice removed |
6 | 1202 | Ingeborg Djupesland (Bårdsdatter) b 1650 | 7th cousin once removed |
6.4 | 1246 | Ingeborg Djupesland (Bårdsdatter) b 1650 | 7th cousin |
6.8 | 951 | Knut Pedersen Åmot 1786-1851 | 4th cousin twice removed |
5.4 | 954 | Amund Jonson Seim (Holter) 1414-1480 | 13th cousin twice removed |
10 | 1722 | Ola Narvesen Glaim 1621-1714 | 7th cousin |
10.29 | 2096 | Ola Narvesen Glaim 1621-1714 | 7th cousin |
9.1 | 1442 | Ola Narvesen Glaim 1621-1714 | 6th cousin * |
17.14 | 2849 | Bjorn Ve (1725-1792) | 4th cousin once removed |
9.1 | 1543 | Amund Jonson Seim (Holter) 1414-1480 | 14th cousin once removed |
8.99 | 1900 | Nils Anderson Eig Øvrebø (1619-1683) | 7th cousin |
* 6th cousin is the same person and has two significant match segments
Yes there might be a closer connection for our 13th and 14th cousins but we have not found it yet!
The 7th cousin with a two segment match turns out to be doubly related to us. Once his parents tests came in, we discovered that they BOTH matched my Dad. So while we know we are 7th cousins on his Dad’s line there may well be a closer relationship on his Mom’s line (the 10cm match). As yet not found but both families worked the Konnerud mines in the 1700s near Drammen, Norway.
Noticing that ‘predicted’ matches often turn out to be more distant than the genetic company implies I asked myself why and I realized they are engaging in a certain amount of trickery.
To take a simplified example. If you have a certain strength of match say typical of a 4th cousin, let say a segment of a certain length with a certain number of cM, now the scientists can determine lets say that that strength of match would be produced 50% of the time by the 4th cousin, 25% of the time by a 3rd cousin and 25% by a 5th cousin, and so they can say with a relatively clear conscience that the predicted level of match is between 3rd and 5th cousin, with 4th cousin predicted.
But this is based upon a hidden assumption, which is that the the pool being drawn from has as many 2cd, 3rd, 4th 5th, ect cousin distances
But the number of cousins we have at each level of distance is definitely not equal. For this I applied a rule of 4. Given the size of American families for the past couple of hundred years one could predict that on average each of have about 4 aunts and uncles, 4 times that many 1st cousins, or 16, 4 times that many 2cd cousins or 64, ect. Actually given the size of the American family till recently this is probably conservative. So in the general population, from which we are drawing our match from, we have 4 times as many 4th cousins as 3rd cousins and 16 times as many 5th cousins as 3rd cousins.
So applying this to the simplified example above, instead of predicting that a given match at the 4th cousin level of strength is likely to be 25% 3rd cousin, 50% 4th cousin and 25% 5th cousin, the actual chances for us would be taking into consideration how many of each level of cousin we have: 4% 3rd cousin, 32% 4th cousin and 64% 5th cousin.
Now in the real world even these strength of matches that are predicted to be 3rd to 5th, a certain small percentage of those beyond 5th cousin would actually create the same pattern, and because the number of cousins at each level of distance is growing exponetially, we can get considerably more distant cousins even though the company has “predicted” a 4th cousin match. For instance in the above example if the scientific prediction was that 24% of 3rd cousins would create that pattern, 48% of 4th cousins 24% of 5th cousins and only 4% of 6th cousins, the actual probability for us of getting a 6th level cousin would be about 1/3,
Again to give an example, I ran the rule of 4 out to 9th cousin and realized I might have as many as a million or more 9th cousins. So lets say that only one out of every 10,000 9th cousins created a pattern typical of a 4th cousin, lets say one long segment with so many cM. Then in the general population there would be out there at least a 1000 9th generation individuals which could create this level of match, but in the general population I would also have only about 1000 4th cousins on average, so if the number of 4th cousins that created that strength of match was less than a 100%, a near certainty, then a given match of a certain strength would more likely be a 9th cousin than a 4th cousin.
Here I criticize the genetic companies for some dishonesty as genetic scientists are not mathematical dummies, but their marketing departments have allowed this double meaning of the word “predicted” to pass through because the hidden assumption on a predicted match is that the pool drawn from has equal chances of being any given distance of cousin whereas for those of us who are actually engaged in getting matches from the general public the number of cousins at each level is not equal at all. One is reminded of Mark Twains remarks about lies, damn lies and statistics.
Now the actual multiplier per generation probably is not exactly 4 but it would be interesting to find out what it is and to also find out the fall off of strength of match per generation, which the scientists may very well have the data for, but if these 2 factors were determined one could come up with a realistic prediction for a given strength of match of the likelihood of the distances implied.
Roberta Estes did a great blog post on this subject here:
http://dna-explained.com/2013/10/21/why-are-my-predicted-cousin-relationships-wrong/