Tag Archive | clustering

Automated Tree Building with Genetic Affairs

Clustering has changed the way many of us work on genealogy mysteries and unknown parentage cases. Genetic Affairs was just one of the sites offering automated clustering (click here for my first clustering post), but then they added tree building. That’s right, they make tree diagrams for each cluster that has at least two people with trees that can be matched up. They even include a GEDcom for those trees in the zip file they send.

One way I use these diagrams is to show cousins how we are related. Another way I use this feature, is to solve unknown parentage cases. I use both DNA2tree and Genetic Affairs and then go with whichever seems to have the more relevant looking trees. The advantage of Genetic Affairs (GA) is that it will look at the unlinked trees and at your ThruLines. Also the output is easy to glance over to see what is worth pursuing, once you are used to the format. Click here for my recent post on automated tree-building tools.

Above is the diagram GA built for the descendants of my gg-grandparents who are in the lonely box on the far left. Click on the image for a larger image in a new tab. My great grandparents lived on farm Skjold in Etne, Hordaland, Norway and had eight children, four of whom, plus the child of another, emigrated to the USA and have many tested descendants at Ancestry.

Here is a key to what you are seeing. The green box on the bottom line is me. The mustard yellow box means that the match’s unlinked tree was used from that person on down. The people in pink in the middle were determined from my ThruLines. Living people are shown as just id numbers, except for your matches who are shown by the name they have chosen to be seen as. All DNA matches are on the far right and are also colored pink with the source and the amount shared listed. Clicking on a match gets a little box to pop up in the lower right corner (as shown) with the name of their family tree, clickable to their Ancestry tree.

UPDATE 24 Jun 2020: Clustering on Ancestry is no longer available as they issued a cease and desist order to Genetic Affairs and many other 3rd party sites. Please click here and send a suggestion to Ancestry that they implement clustering on their site.

The purple box with the word ANCESTRY indicates the source of the tree information. Another GA feature is the ability to cluster both your Family Tree DNA matches and your Ancestry matches together.

When names are listed differently in other trees they will be shown in these diagrams as separate people. Notice that in the second from the left column, that the software could not tell that the A. Skjold who married L. Stephenson is the same person as the A. Halvorsdtr skjold who married L. Stephenson Fjaere. Norwegians did not have fixed surnames so we usually use the farm name as a surname in our trees. Often upon arriving in this country they often chose to use the patronymic, so Stephenson rather than Fjaere (click here for more on Norwegian naming). However the other Anna Halvorsdtr Skjold listed between those two really is a different person and she married a Thompson. Reusing first names is another bane of the Norwegian genealogist.

This tree building capability from Genetic Affairs recently helped me solve an unknown father mystery.

When “Amy” discovered her brother was only a half brother by doing an Ancestry DNA test, she was very surprised. She had heard that her mother was pregnant with her when marrying her late father, but everyone knew he was her Dad, or so she thought. Her mother was not willing to discuss this, so she asked for my help to figure out her biological father from the DNA.
Continue reading

More Clustering Tools!

There are many new ways to make those beautiful cluster diagrams of how your DNA relatives are related to each other. Both MyHeritage and Gedmatch GENESIS (tier 1) now have clustering tools (Thank you Evert-Jan Blom). These charts give you an easy way to see your family groupings and can help you figure out a new match since each cluster typically represents a common ancestral couple. Click here for my previous posts on clustering which is based on the Leeds method.

My Dads Clusters at Gedmatch GENESIS

The GENESIS cluster diagram shown above includes the total cM each match shares with you as well as their name and kit number. Click on the “i” in a circle for a pop up box with the user information which includes an email address and whether a GEDmatch tree is linked to this kit. Any of the colored boxes on the graph can be clicked to open a window for a one to one comparison between those two people. Plus you can check the boxes in the select column for any number of matches and then submit them to the multi kit analysis using the orange “Submit to Multi Kit Analysis” button above the name column on the left. To get this clustering tool all you need is a Tier 1 membership and a kit number. It is listed at the bottom of the Tier 1 tools. Personally I like to raise the thresholds to a top 200 and a minimum of 20, but try the defaults first and see what is best for you.

One of the nice things about the cluster output from Genetic Affairs is that it lists all the cluster members in groups below the graph with the number of people in each tree (clickable) and any notes you made on the Ancestry profile. The MyHeritage version also has those cluster lists with your notes and the tree sizes; and of course they are clickable to the match (which may even have a theory of family relativity for you!) and the match’s tree. The down side is that you cannot select the parameters for the clustering yourself, they are preset. Possibly only power users care about that!

Extract from my list of matches in each cluster at MyHeritage

An exciting new feature for those looking for one unknown parent or grandparent is the ability to cluster just your starred Ancestry matches when using the clustering tool at Genetic Affairs.  Click here for my previous post about that tool. There now is a checkbox on the page where you select your parameters for getting a cluster analysis.

Newat Genetic Affairs is the checkbox for only starred matches when starting a cluster analysis

It is a common practice to star (mark as favorites) the matches that seem to be from the family of an unknown parent or grandparent at Ancestry. Usually these are determined by looking at who matches or doesn’t match a close relative like a half sibling or else by eliminating matches from the known side. Sometimes you can use ethnicity. I am currently helping someone where the known side is Jewish and the unknown side is Italian and those are easy to separate.

Continue reading

More Automated DNA Match Clustering!

Have you been wondering why are all your favorite bloggers are going crazy for automatic clustering? Well it is a fun visual technique to see which matches belong to which family line by making a chart with your matches across both the top and side, grouping them by who matches who, and then coloring those boxes in. This creates visual clusters which will roughly correspond to your great grandparents or their parents.

My perfect cousin has many matches on all her great grandparent lines (green is my Munson side) so I used her to showcase the new DNAgedcom clustering above. Notice how similar it is to her cluster from Genetic Affairs shown in my previous blog about that site and tool.

Here are all the new ways to cluster our DNA matches:

  • DNAgedcom now has a clustering tool in their client (DGC) which uses your ancestry match list and ICW files (described in detail in the read more below)
  • Genetic Affairs has Ancestry clustering working again
  • DNApainter created a tool to create a CSV from the Genetic Affairs html cluster file. Some of us love to use spreadsheets.
  • Andy Lee of Family History Fanatics figured out how to take an autosomal match matrix from GEDmatch and cluster it in a spreadsheet program, Click here for that video – the explanation starts just after 42 minutes and this is really fun!
  • Rumor has it that GEDmatch may add automatic clustering sometime in the new year…

All of this is based on the method developed by Dana Leeds to organize your matches which is easy and simple to do. Click here for her blog about that.

Read on for how I used the new DNAgedcom clustering tool for myself and my brother, where I know all our great grandparents.

Continue reading