Automatic Clustering from Genetic Affairs

My genealogy groups are buzzing with excitement about a new tool from Genetic Affairs to automate the clustering of your DNA matches. This takes the Leeds method concept to another level.

Everyone is posting pretty cluster pictures like the one below that I made for my perfect cousin, the star of many of my blog posts. This is a table where each DNA match is listed on the top and side; then if they match each other, the box is colored in with the color for that cluster. The chart is sorted by cluster. The idea is that each colored cluster shows descendants from a probable great grandparent couple of yours.

The gray boxes show where people match others outside the cluster which can often happen when families intermarry more than once or when they are first cousins enough times removed to have been in the second or third cousin group by DNA but are related to more than one set of great grandparents.

Automated clustering is useful because it puts your DNA relatives who are related to each other into visual groups so that you can quickly see which line a new match is related on. The picture is pretty but the workhorses are the charts for each cluster shown below that image when you scroll down. Here is the privatized one for my “perfect” cousin showing our MUNSON cluster.

Each name can be clicked to go to that Ancestry match page plus much useful additional information is shown next to the username: how many cMs shared, how many matches shared in the whole group, cluster number, how many people in their tree, and the notes you made for that match.

The image and charts are from the HTML file which arrived via email from Genetic Affairs after I requested automated clustering for my cousin’s Ancestry profile, which is shared with me there. You have to save the html file to your computer and then click on it to view it. When it first comes up, it is a mish-mosh sorted by name, but then it resorts itself by cluster. Fun to watch. Click here for the step by step of how to use this tool from the Intrepid Sleuth. It can also cluster matches from other sites like 23andme.

I decided to try it on an unknown father case I had not gotten around to working on yet, to see if it succeeded in speeding up the process and it did, to under an hour! A new record.

To be fair this was a case where the maternal side was well known and documented and all ancestors have deep American roots. Jack had tested to help his maternal half brother Everett find his Dad but initially had no interest in pursing his own bio dad connection (all names are changed as always for privacy). Recently he decided that knowing his paternal family medical history might be worthwhile so he gave me the go ahead.

I always knew that Jack’s search would be quick given that on his unknown paternal side he has 4 second cousin level matches and 18 third cousins, many with deep trees. This is way more than the average tester! Here is his automated cluster picture:

The names down the side of the page have been blurred out but I could see them and I was immediately struck by the appearance of a specific surname in half the usernames on that big cluster. I knew I had seen that surname in his best match’s tree, a first to second cousin, let’s call her Kate, who is too close to be included in the clustering.

Next I looked at cluster 2, also paternal, in the charts and saw that many of them had good sized trees. So I clicked the three top matches from the charts with trees and took a look at them. Again, one surname stood out.

Now can I find someone with surname1 marrying surname2? Those would be his paternal grandparents. Since Kate has a huge tree, I decided to search her tree for that second surname. There were only a few and just one was married to a woman with surname1. Plus they are the right ages to be his grandparents. How quick was that!

Next I build him a quick and dirty (Q&D) tree using Kate’s tree, green leaf hints, and those green “Potential Mother/Father” things. Connected his DNA as the son of an unknown son of the presumed grandparents.

While waiting for Ancestry to find those DNA ancestor hints, I decided to spot check if the expected relationships agree with his new tree for the relatives in the two clusters. So I clicked through from the charts to each clustered match with a tree, looked where the common ancestor was, and put in the note for that match what the expected relationship now is. I also checked each one in the DNApainter calculator to see if the theory fit. So far so good.

The next step is to try and figure out who the son(s) of this couple are. Usually that is via an obituary but I did not find one. So I reached out to a close relative on each side to ask and explained why. We will see if they respond. One logged in yesterday and has a huge tree so I am hopeful.

To summarize, the picture is pretty but the charts are an incredible time saver because I can see which matches have good sized trees and can click through to them to test my hypothesis. Not every adoptee will have such robust clusters so as to easily find the surnames of each grandparent, but hopefully some will.

print

20 thoughts on “Automatic Clustering from Genetic Affairs

Click here to add your thoughts at the end of the comments
  1. Thanks for sharing this example, Kitty! I love how the Leeds method has been automated and taken to a whole new level. I’ve been working with it over the past day and a half, but can’t wait to learn more. I’d really like to totally analyze my own matches!

    • Wendell is that what the problem is? GA will NOT take my Ancestry login. I messaged them and hopefully will find out what they think the problem is.

    • I did not have any problem adding or running the analysis with my Ancestry account.

      I have looked through my Ancestry results and they correspond well to what I know about some matches and to the Leeds method.

  2. This Clustering looks and sounds pretty awesome. To get started, should I be reading the manual or is it self explanatory via the site, and where?

    • There is a manual? LOL. I was excited to try this and just jumped right in. It was intuitive. I had my results within minutes. I will read any documentation available to make sure I maximize my use of the tool, of course, but the ease of use was a highlight of my first experience with this tool.

  3. I’m able to run Auto Cluster for multiple AncestryDNA profiles. I get the emails, save the HTML file to my desktop, open it… and the visual colored cluster is missing in each. All of the other info is there.

    I can also run this for my 23&Me accounts but I’m not getting email reports for them. I added and set up both websites and am confident the login info is correct.

    Any idea as to what I’m doing wrong? I read the manual, FAQ, and watched a youtube video.

  4. Deanna, Jack, et al –
    Ask the author of the tool – his email is info at the genetic affairs website.

    For those of you not getting the emails, check your spam folder. That’s where some of mine were. Then whitelist the above email or make a filter (what I did with my gmail account).

  5. Thank you for all you do. I am requesting some assistance here. Working on this for a very long time. Can you tell from this i compared to Ged Match if my mom and this man are half siblings. (Share same father- different mother)1 9,909,594 47,999,722 55.6 5,563
    1 88,365,638 110,893,107 20.9 2,901
    1 154,834,876 181,424,547 34.5 4,062
    2 17,572 32,756,557 54.6 5,523
    2 71,074,301 85,220,482 15.7 1,865
    2 191,716,994 226,911,444 39.7 4,386
    2 235,678,877 242,586,061 14.6 1,210
    3 136,120,636 191,278,269 62.3 7,021
    5 135,219,360 157,459,405 23.4 3,122
    6 24,072,460 98,576,954 58.2 9,533
    6 157,835,791 170,663,725 23.9 2,708
    7 8,500,116 69,908,923 67.2 8,916
    7 107,245,797 131,557,585 20.1 2,543
    7 150,475,620 153,632,140 9.3 602
    8 13,534,834 123,890,637 101.6 13,875
    8 133,192,142 146,241,933 24.0 2,440
    9 14,238,877 17,832,652 7.6 825
    10 1,962,320 66,153,222 76.4 9,962
    10 109,337,592 126,959,951 27.4 3,172
    11 76,140,188 133,849,939 73.8 9,373
    13 98,044,216 114,082,644 37.7 3,411
    15 21,569,096 58,405,096 55.2 5,660
    16 41,263 84,352,773 124.1 12,671
    17 2,692,016 74,731,448 117.8 10,882
    18 7,442,614 64,545,969 75.0 8,447
    18 70,203,043 76,098,043 16.1 1,151
    20 54,719,882 62,349,775 25.1 1,877
    21 13,523,286 28,311,753 31.3 2,583
    22 15,437,138 21,615,252 18.8 986
    Largest segment = 124.1 cM
    Total of segments > 7 cM = 1,312.0 cM
    29 matching segments
    Estimated number of generations to MRCA = 1.7
    On ancestry DNA
    Possible range: 1st – 2nd cousins

    Confidence: Extremely High

    My Mom and and this man ( possible half brother) Shared DNA: 1,254 cM across 46 segments
    His daughter which may be my mom’s half niece share Shared DNA: 736 cM across 34 segments

  6. Thank you for evaluating this tool and showing us how it has worked for you. I look forward to learning more about it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.