Automatic Clustering from Genetic Affairs

My genealogy groups are buzzing with excitement about a new tool from Genetic Affairs to automate the clustering of your DNA matches. This takes the Leeds method concept to another level.

Everyone is posting pretty cluster pictures like the one below that I made for my perfect cousin, the star of many of my blog posts. This is a table where each DNA match is listed on the top and side; then if they match each other, the box is colored in with the color for that cluster. The chart is sorted by cluster. The idea is that each colored cluster shows descendants from a probable great grandparent couple of yours.

The gray boxes show where people match others outside the cluster which can often happen when families intermarry more than once or when they are first cousins enough times removed to have been in the second or third cousin group by DNA but are related to more than one set of great grandparents.

Automated clustering is useful because it puts your DNA relatives who are related to each other into visual groups so that you can quickly see which line a new match is related on. The picture is pretty but the workhorses are the charts for each cluster shown below that image when you scroll down. Here is the privatized one for my “perfect” cousin showing our MUNSON cluster.

Each name can be clicked to go to that Ancestry match page plus much useful additional information is shown next to the username: how many cMs shared, how many matches shared in the whole group, cluster number, how many people in their tree, and the notes you made for that match.

The image and charts are from the HTML file which arrived via email from Genetic Affairs after I requested automated clustering for my cousin’s Ancestry profile, which is shared with me there. You have to save the html file to your computer and then click on it to view it. When it first comes up, it is a mish-mosh sorted by name, but then it resorts itself by cluster. Fun to watch. Click here for the step by step of how to use this tool from the Intrepid Sleuth. It can also cluster matches from other sites like 23andme.

I decided to try it on an unknown father case I had not gotten around to working on yet, to see if it succeeded in speeding up the process and it did, to under an hour! A new record.

To be fair this was a case where the maternal side was well known and documented and all ancestors have deep American roots. Jack had tested to help his maternal half brother Everett find his Dad but initially had no interest in pursing his own bio dad connection (all names are changed as always for privacy). Recently he decided that knowing his paternal family medical history might be worthwhile so he gave me the go ahead.

I always knew that Jack’s search would be quick given that on his unknown paternal side he has 4 second cousin level matches and 18 third cousins, many with deep trees. This is way more than the average tester! Here is his automated cluster picture:

The names down the side of the page have been blurred out but I could see them and I was immediately struck by the appearance of a specific surname in half the usernames on that big cluster. I knew I had seen that surname in his best match’s tree, a first to second cousin, let’s call her Kate, who is too close to be included in the clustering.

Next I looked at cluster 2, also paternal, in the charts and saw that many of them had good sized trees. So I clicked the three top matches from the charts with trees and took a look at them. Again, one surname stood out.

Now can I find someone with surname1 marrying surname2? Those would be his paternal grandparents. Since Kate has a huge tree, I decided to search her tree for that second surname. There were only a few and just one was married to a woman with surname1. Plus they are the right ages to be his grandparents. How quick was that!

Next I build him a quick and dirty (Q&D) tree using Kate’s tree, green leaf hints, and those green “Potential Mother/Father” things. Connected his DNA as the son of an unknown son of the presumed grandparents.

While waiting for Ancestry to find those DNA ancestor hints, I decided to spot check if the expected relationships agree with his new tree for the relatives in the two clusters. So I clicked through from the charts to each clustered match with a tree, looked where the common ancestor was, and put in the note for that match what the expected relationship now is. I also checked each one in the DNApainter calculator to see if the theory fit. So far so good.

The next step is to try and figure out who the son(s) of this couple are. Usually that is via an obituary but I did not find one. So I reached out to a close relative on each side to ask and explained why. We will see if they respond. One logged in yesterday and has a huge tree so I am hopeful.

To summarize, the picture is pretty but the charts are an incredible time saver because I can see which matches have good sized trees and can click through to them to test my hypothesis. Not every adoptee will have such robust clusters so as to easily find the surnames of each grandparent, but hopefully some will.


15 thoughts on “Automatic Clustering from Genetic Affairs

Click here to add your thoughts at the end of the comments
  1. Thanks for sharing this example, Kitty! I love how the Leeds method has been automated and taken to a whole new level. I’ve been working with it over the past day and a half, but can’t wait to learn more. I’d really like to totally analyze my own matches!

    • Wendell is that what the problem is? GA will NOT take my Ancestry login. I messaged them and hopefully will find out what they think the problem is.

    • I did not have any problem adding or running the analysis with my Ancestry account.

      I have looked through my Ancestry results and they correspond well to what I know about some matches and to the Leeds method.

  2. This Clustering looks and sounds pretty awesome. To get started, should I be reading the manual or is it self explanatory via the site, and where?

    • There is a manual? LOL. I was excited to try this and just jumped right in. It was intuitive. I had my results within minutes. I will read any documentation available to make sure I maximize my use of the tool, of course, but the ease of use was a highlight of my first experience with this tool.

  3. I’m able to run Auto Cluster for multiple AncestryDNA profiles. I get the emails, save the HTML file to my desktop, open it… and the visual colored cluster is missing in each. All of the other info is there.

    I can also run this for my 23&Me accounts but I’m not getting email reports for them. I added and set up both websites and am confident the login info is correct.

    Any idea as to what I’m doing wrong? I read the manual, FAQ, and watched a youtube video.

  4. Deanna, Jack, et al –
    Ask the author of the tool – his email is info at the genetic affairs website.

    For those of you not getting the emails, check your spam folder. That’s where some of mine were. Then whitelist the above email or make a filter (what I did with my gmail account).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.