Archives

Using AI for Genealogy by Steve Little

One of the most unusual talks at the recent i4GG conference (videos coming soon) was the one about the use of AI for genealogy by Steve Little, the AI program director for the National Genealogical Society (NGS).  I learned that it was how you phrased your question that could lead to more accurate answers, e.g. “you are a professional genealogist … ” I found out that AI, particularly the paid versions, could extract text from documents, even handwritten ones and translate in context. Here is my favorite slide from that talk. Personally my first impression of ChatGPT had been that it was great at sounding good while making stuff up.

Slide from Steve Little’s talk, used by permission

Steve will be speaking at RootsTech at 8 am Thursday this week and will also be available at the NGS booth as per his post on FaceBook.

Amusingly, in my own talk about using bioancestry to solve unknown parentage cases, I had experimented with using AI generated images to illustrate a few of my points. For example, when I asked the deepAI image generator for a Hungarian violinist I got this image whose hands are imperfect, but it still adds pizzazz to the slide.

No sooner has my favorite DNA conference (i4GG) ended, than it is time to get ready for Rootstech! No I won’t be there in person this year, too much to do to prepare for our move to Connecticut. Hope everyone has a great time. I will attend virtually, so if you are logged in there, you can click here to see if you are related to me! As all my ancestors are fairly recent immigrants (earliest 1860s), I have only 434 relatives at Rootstech, the closest being a fifth cousin. Oh well.

Using Segment Sizes in DNA Relationship Predictions

Segment sizes matter for predicting the closeness of a relationship. That is because inherited DNA chunks get divided up with each generation’s recombination, becoming smaller over time. It has often puzzled me that none of the DNA relationship calculators take that into account. The total centimorgans (cM) can be similar for lots of different relationships. Click here for my post on telling half siblings apart from aunt/uncle/niece/nephew relationships by using segment sizes.

Click here for Ancestry’s explanation of how they predict relationships and here for the article at 23andme. Neither mentions segment sizes.

Click this image for my slides demonstrating the DNA painter relationship calculator

My favorite calculator has long been the one at DNA Painter where the table is based on Blaine Bettinger’s collected data and the probabilities are by Leah Larkin based on the Ancestry white paper. It can work from a percentage or a cM value and lights up just the possible relationships in its table. Clicking on any table entry gets a histogram of frequencies.  I have slides showing how nicely it works, done for a recent talk about 3rd party DNA tools for the SCGS jamboree. Click the image above to see those. That video should be available to attendees from the web site genealogyjamboree.com

Recently a calculator was published at DNA-SCI which includes segment sizes by asking for the number of segments. I have used it for a number of closer relationships and have been very impressed with its results.
Continue reading

How to find your haplogroup and why do that?

Haplogroups fascinate me because they reveal our deepest ancestry. A haplogroup is a way of assigning a portion of your DNA to a category based on areas of very slowly changing markers. There are two types of DNA that can be assigned haplogroups, because they do not recombine therefore change only slowly via mutations. These are the Y chromosome and the DNA of your mitochondria (mtDNA), which are separate organisms in every cell that provide us with energy and are passed along via a mother’s egg. The groupings for their haplogroups look like family trees when charted, for example the one shown below from Eupedia. That is because each mutation creates a new branch. There are haplogroups assigned for both the all female line (mtDNA) and the all male line (Y). Click here for Eupedia’s wonderful descriptions of all the haplogroups found in Europe.

The female H haplogroup from Eupedia.com on haplogroups

Men have a Y chromosome, which makes them male, which has been passed from father to son, to his son, to his son, and so forth from from time immemorial. We all have mitochondrial DNA (mtDNA ) which is passed from a mother to all her children unchanged. Thus your mtDNA is from your mother’s mother ‘s mother and so on. Both of those parts of DNA inheritance can be traced back to the dawn of humanity. That is unlike the other chromosomes which mix the inheritance from each parent such that after several generations there may be little or no trace of our deeper ancestors. Most of us have no verifiable autosomal DNA from before our 5th grandparents.

Those of you who have family legends about descent from an Indian princess might be able to prove the connection using mtDNA if there is a direct female line to that ancestor, since there are specific haplogroups for Native Americans (click here for the wikipedia article on that).

My Ancestral Haplogroups displayed in Paul Hawthorne’s colorful genealogy chart

One thing that I like to do is figure out the haplogroups of my recent ancestors by testing cousins in the needed line of descent. I made a chart of the ones I know using Paul Hawthorne’s colorful genealogy chart (click here for more about that) with the haplogroups added. As you can see, I have many more lines to chase down. Sadly my Thannhauser Bavarian Jewish line daughtered out, so I am trying to find a male descendant of the one who moved to Albany NY in the mid 1800s.

So how do you find your haplogroup from your DNA test? Well if you tested at 23andme or Living DNA then you will be provided with your high level haplogroup. However if you want to drill down the branches, then test your Y and/or your mtDNA at Family Tree DNA (summer sale until end of August). Ancestry tests enough SNPs to get a high level haplogroup by using other tools on your raw data. My Why Y blog post explains how to use the Morley tool but there is also a tool to find Y haplogroups from Borland Genetics. I have been trying to convince Kevin Borland to write one for mtDNA since the James Lick mthap tool will not currently take ancestry data.

Continue reading

A Nice New Feature at DNApainter.com

One tool I use all the time at the DNApainter site is the online shared cM calculator. This shows you the possible relationships that you have to a DNA match based on either the shared centimorgans (cM) or the percentage of DNA shared. It uses both the calculated odds from the DNA geek and the observed odds from Blaine Bettinger’s shared cM project. I find that these are far more useful than the predictions at the various testing companies.

Results of the online calculator for cousin “C’ sharing 1158 cM, red arrow points to new feature

When you input a number in the box at the top under the word Filter, you get a display like the one above which shows the likelihood of various relationships. Additionally those possibilities have their boxes light up in the chart underneath (click the image for the larger version which shows that). I used the 1158 cM that my first cousin “C” shares with me, on the high end for that relationship, to see what would show.

Do you see my red arrow pointing to the new feature? When you click on the words View these relationships in a tree you get a diagram like the one below, showing possible places for you in the tree of your match. Quick tip, right click those words to get a little menu from your browser letting you open it in a new tab or window. This diagram is created by the WATO (What Are The Odds) tool.

WATO image for C

WATO for cousin “C” showing the menu for editing her in the tree (click for larger version)

One thing that takes getting used to for many of us genealogists, is that WATO uses a backwards pedigree format, a sideways descendant tree. The presumed common ancestor is on the left and the descendants fan out on the right. Every person in this diagram can be edited by the way. You can add names, birth years, whether they are half relationships, and so on.

Most people like visual displays of relationships so it is great to see the possibilities laid out in a family tree. Click the Continue Reading below for my experiments with some of my known cousins. However you may prefer to read about the details of this new tool by its author, Jonny Perl, on his blog (click here) – he does a great job of explaining it.

Also to learn more about WATO, click here for the Family History Fanatics youtube video or click here for Leah Larkin’s many more advanced articles on WATO.

Continue reading

New DNA Tools and Blog from a Scientist at Cornell

Much of the work to build tools and write articles to help testers with their DNA results has been done by citizen scientists, bloggers, computer programmers, and scientists from other fields like Andrew Millard (behind the WATO math). In an exciting development, Amy Williams, a computational biologist at Cornell University, has built a few DNA tools, with more to come, and started a blog at https://hapi-dna.org/blog/

Her blog article titled “How often do two relatives share DNA” is particularly interesting. It includes the beautiful chart shown below which is created from simulations. Click it to go to the actual page where you can mouse over the columns to get the detailed numeric breakdowns.

Chart of How Often 2 Relatives Share DNA from https://hapi-dna.org/2020/11/how-often-do-two-relatives-share-dna-2/

The other article on her blog has a detailed explanation of what a centiMorgan is, the measurement used for DNA segment sizes (click here). I usually recommend not worrying about the exact definition since it is a measure of the frequency of recombination rather than a physical length. It is important just to know that there is not a one-to-one relationship between the cM and the sizes shown in the chromosome browsers. On the those charts, the same cM amount looks smaller at the ends of chromosomes than it does in the middle because recombination is more active on the ends.

The final statement of that article is: “In an upcoming post, we’ll talk more about cM lengths of DNA and how recombination leads more distant relatives to share fewer segments that are also on average smaller than those that close relatives share.” Something to look forward to!

Now to take a look at the tools that are available there so far.

Of particular interest to adoptees is the maternal versus paternal predictor for half siblings or grandparents. I tried it out on a number of half sibling pairs who I have helped in the past.

Here is the prediction created for a brother and sister who share the same father but have different mothers, using the comparison of their segment data from 23andme:

However I discovered  number of minor usage issues when trying to use data from the different DNA testing sites.
Continue reading