Archives

Using Segment Sizes in DNA Relationship Predictions

Segment sizes matter for predicting the closeness of a relationship. That is because inherited DNA chunks get divided up with each generation’s recombination, becoming smaller over time. It has often puzzled me that none of the DNA relationship calculators take that into account. The total centimorgans (cM) can be similar for lots of different relationships. Click here for my post on telling half siblings apart from aunt/uncle/niece/nephew relationships by using segment sizes.

Click here for Ancestry’s explanation of how they predict relationships and here for the article at 23andme. Neither mentions segment sizes.

Click this image for my slides demonstrating the DNA painter relationship calculator

My favorite calculator has long been the one at DNA Painter where the table is based on Blaine Bettinger’s collected data and the probabilities are by Leah Larkin based on the Ancestry white paper. It can work from a percentage or a cM value and lights up just the possible relationships in its table. Clicking on any table entry gets a histogram of frequencies.  I have slides showing how nicely it works, done for a recent talk about 3rd party DNA tools for the SCGS jamboree. Click the image above to see those. That video should be available to attendees from the web site genealogyjamboree.com

Recently a calculator was published at DNA-SCI which includes segment sizes by asking for the number of segments. I have used it for a number of closer relationships and have been very impressed with its results.
Continue reading

How to find your haplogroup and why do that?

Haplogroups fascinate me because they reveal our deepest ancestry. A haplogroup is a way of assigning a portion of your DNA to a category based on areas of very slowly changing markers. There are two types of DNA that can be assigned haplogroups, because they do not recombine therefore change only slowly via mutations. These are the Y chromosome and the DNA of your mitochondria (mtDNA), which are separate organisms in every cell that provide us with energy and are passed along via a mother’s egg. The groupings for their haplogroups look like family trees when charted, for example the one shown below from Eupedia. That is because each mutation creates a new branch. There are haplogroups assigned for both the all female line (mtDNA) and the all male line (Y). Click here for Eupedia’s wonderful descriptions of all the haplogroups found in Europe.

The female H haplogroup from Eupedia.com on haplogroups

Men have a Y chromosome, which makes them male, which has been passed from father to son, to his son, to his son, and so forth from from time immemorial. We all have mitochondrial DNA (mtDNA ) which is passed from a mother to all her children unchanged. Thus your mtDNA is from your mother’s mother ‘s mother and so on. Both of those parts of DNA inheritance can be traced back to the dawn of humanity. That is unlike the other chromosomes which mix the inheritance from each parent such that after several generations there may be little or no trace of our deeper ancestors. Most of us have no verifiable autosomal DNA from before our 5th grandparents.

Those of you who have family legends about descent from an Indian princess might be able to prove the connection using mtDNA if there is a direct female line to that ancestor, since there are specific haplogroups for Native Americans (click here for the wikipedia article on that).

My Ancestral Haplogroups displayed in Paul Hawthorne’s colorful genealogy chart

One thing that I like to do is figure out the haplogroups of my recent ancestors by testing cousins in the needed line of descent. I made a chart of the ones I know using Paul Hawthorne’s colorful genealogy chart (click here for more about that) with the haplogroups added. As you can see, I have many more lines to chase down. Sadly my Thannhauser Bavarian Jewish line daughtered out, so I am trying to find a male descendant of the one who moved to Albany NY in the mid 1800s.

So how do you find your haplogroup from your DNA test? Well if you tested at 23andme or Living DNA then you will be provided with your high level haplogroup. However if you want to drill down the branches, then test your Y and/or your mtDNA at Family Tree DNA (summer sale until end of August). Ancestry tests enough SNPs to get a high level haplogroup by using other tools on your raw data. My Why Y blog post explains how to use the Morley tool but there is also a tool to find Y haplogroups from Borland Genetics. I have been trying to convince Kevin Borland to write one for mtDNA since the James Lick mthap tool will not currently take ancestry data.

Continue reading

A Nice New Feature at DNApainter.com

One tool I use all the time at the DNApainter site is the online shared cM calculator. This shows you the possible relationships that you have to a DNA match based on either the shared centimorgans (cM) or the percentage of DNA shared. It uses both the calculated odds from the DNA geek and the observed odds from Blaine Bettinger’s shared cM project. I find that these are far more useful than the predictions at the various testing companies.

Results of the online calculator for cousin “C’ sharing 1158 cM, red arrow points to new feature

When you input a number in the box at the top under the word Filter, you get a display like the one above which shows the likelihood of various relationships. Additionally those possibilities have their boxes light up in the chart underneath (click the image for the larger version which shows that). I used the 1158 cM that my first cousin “C” shares with me, on the high end for that relationship, to see what would show.

Do you see my red arrow pointing to the new feature? When you click on the words View these relationships in a tree you get a diagram like the one below, showing possible places for you in the tree of your match. Quick tip, right click those words to get a little menu from your browser letting you open it in a new tab or window. This diagram is created by the WATO (What Are The Odds) tool.

WATO image for C

WATO for cousin “C” showing the menu for editing her in the tree (click for larger version)

One thing that takes getting used to for many of us genealogists, is that WATO uses a backwards pedigree format, a sideways descendant tree. The presumed common ancestor is on the left and the descendants fan out on the right. Every person in this diagram can be edited by the way. You can add names, birth years, whether they are half relationships, and so on.

Most people like visual displays of relationships so it is great to see the possibilities laid out in a family tree. Click the Continue Reading below for my experiments with some of my known cousins. However you may prefer to read about the details of this new tool by its author, Jonny Perl, on his blog (click here) – he does a great job of explaining it.

Also to learn more about WATO, click here for the Family History Fanatics youtube video or click here for Leah Larkin’s many more advanced articles on WATO.

Continue reading

New DNA Tools and Blog from a Scientist at Cornell

Much of the work to build tools and write articles to help testers with their DNA results has been done by citizen scientists, bloggers, computer programmers, and scientists from other fields like Andrew Millard (behind the WATO math). In an exciting development, Amy Williams, a computational biologist at Cornell University, has built a few DNA tools, with more to come, and started a blog at https://hapi-dna.org/blog/

Her blog article titled “How often do two relatives share DNA” is particularly interesting. It includes the beautiful chart shown below which is created from simulations. Click it to go to the actual page where you can mouse over the columns to get the detailed numeric breakdowns.

Chart of How Often 2 Relatives Share DNA from https://hapi-dna.org/2020/11/how-often-do-two-relatives-share-dna-2/

The other article on her blog has a detailed explanation of what a centiMorgan is, the measurement used for DNA segment sizes (click here). I usually recommend not worrying about the exact definition since it is a measure of the frequency of recombination rather than a physical length. It is important just to know that there is not a one-to-one relationship between the cM and the sizes shown in the chromosome browsers. On the those charts, the same cM amount looks smaller at the ends of chromosomes than it does in the middle because recombination is more active on the ends.

The final statement of that article is: “In an upcoming post, we’ll talk more about cM lengths of DNA and how recombination leads more distant relatives to share fewer segments that are also on average smaller than those that close relatives share.” Something to look forward to!

Now to take a look at the tools that are available there so far.

Of particular interest to adoptees is the maternal versus paternal predictor for half siblings or grandparents. I tried it out on a number of half sibling pairs who I have helped in the past.

Here is the prediction created for a brother and sister who share the same father but have different mothers, using the comparison of their segment data from 23andme:

However I discovered  number of minor usage issues when trying to use data from the different DNA testing sites.
Continue reading

Super Large Numbers Do Not Work in my Ahnentafel to GEDCOM Tool

Alert, there is a bug in my tool to convert text files to GEDCOMs: very large Ahnentafel numbers like “46406041600” will cause it to hang.

I will add code to ignore large numbers by May (end of this week). If you are a regular user of this tool, check this post for the update when the new feature is released.

Something must have changed on the collection of trees because I had three emails in the last week complaining that my tool hung and did not complete the conversion. In all cases, the Ahnentafel went up to extremely large numbers, so eliminating those last few lines fixed the problem

Here is an example of the last several lines of a file that did not work:

Here are the last few lines of the same file after removing the lines with very high numbers. This version worked.

Do you really need these people born in the 1200s? What is the probability that they even are actually your ancestors?

This appears to be some sort of limitation in either the storage space for the program or the number sizes. Thus I propose to modify the code to ignore ahnentafel numbers with more than seven digits and to have it tell you that it did that.

Any other ideas out there? Remember I make almost no money on this, just the occasional small thank you donation, so I am not looking for a solution that will take lots of my time.