Automatic Clustering from Genetic Affairs

My genealogy groups are buzzing with excitement about a new tool from Genetic Affairs to automate the clustering of your DNA matches. This takes the Leeds method concept to another level.

Everyone is posting pretty cluster pictures like the one below that I made for my perfect cousin, the star of many of my blog posts. This is a table where each DNA match is listed on the top and side; then if they match each other, the box is colored in with the color for that cluster. The chart is sorted by cluster. The idea is that each colored cluster shows descendants from a probable great grandparent couple of yours.

The gray boxes show where people match others outside the cluster which can often happen when families intermarry more than once or when they are first cousins enough times removed to have been in the second or third cousin group by DNA but are related to more than one set of great grandparents.

Automated clustering is useful because it puts your DNA relatives who are related to each other into visual groups so that you can quickly see which line a new match is related on. The picture is pretty but the workhorses are the charts for each cluster shown below that image when you scroll down. Here is the privatized one for my “perfect” cousin showing our MUNSON cluster.

Each name can be clicked to go to that Ancestry match page plus much useful additional information is shown next to the username: how many cMs shared, how many matches shared in the whole group, cluster number, how many people in their tree, and the notes you made for that match.

The image and charts are from the HTML file which arrived via email from Genetic Affairs after I requested automated clustering for my cousin’s Ancestry profile, which is shared with me there. You have to save the html file to your computer and then click on it to view it. When it first comes up, it is a mish-mosh sorted by name, but then it resorts itself by cluster. Fun to watch. Click here for the step by step of how to use this tool from the Intrepid Sleuth. It can also cluster matches from other sites like 23andme.

I decided to try it on an unknown father case I had not gotten around to working on yet, to see if it succeeded in speeding up the process and it did, to under an hour! A new record.

To be fair this was a case where the maternal side was well known and documented and all ancestors have deep American roots. Jack had tested to help his maternal half brother Everett find his Dad but initially had no interest in pursing his own bio dad connection (all names are changed as always for privacy). Recently he decided that knowing his paternal family medical history might be worthwhile so he gave me the go ahead.

I always knew that Jack’s search would be quick given that on his unknown paternal side he has 4 second cousin level matches and 18 third cousins, many with deep trees. This is way more than the average tester! Here is his automated cluster picture:

The names down the side of the page have been blurred out but I could see them and I was immediately struck by the appearance of a specific surname in half the usernames on that big cluster. I knew I had seen that surname in his best match’s tree, a first to second cousin, let’s call her Kate, who is too close to be included in the clustering.

Next I looked at cluster 2, also paternal, in the charts and saw that many of them had good sized trees. So I clicked the three top matches from the charts with trees and took a look at them. Again, one surname stood out.

Now can I find someone with surname1 marrying surname2? Those would be his paternal grandparents. Since Kate has a huge tree, I decided to search her tree for that second surname. There were only a few and just one was married to a woman with surname1. Plus they are the right ages to be his grandparents. How quick was that!

Next I build him a quick and dirty (Q&D) tree using Kate’s tree, green leaf hints, and those green “Potential Mother/Father” things. Connected his DNA as the son of an unknown son of the presumed grandparents.

While waiting for Ancestry to find those DNA ancestor hints, I decided to spot check if the expected relationships agree with his new tree for the relatives in the two clusters. So I clicked through from the charts to each clustered match with a tree, looked where the common ancestor was, and put in the note for that match what the expected relationship now is. I also checked each one in the DNApainter calculator to see if the theory fit. So far so good.

The next step is to try and figure out who the son(s) of this couple are. Usually that is via an obituary but I did not find one. So I reached out to a close relative on each side to ask and explained why. We will see if they respond. One logged in yesterday and has a huge tree so I am hopeful.

To summarize, the picture is pretty but the charts are an incredible time saver because I can see which matches have good sized trees and can click through to them to test my hypothesis. Not every adoptee will have such robust clusters so as to easily find the surnames of each grandparent, but hopefully some will.

44 thoughts on “Automatic Clustering from Genetic Affairs

Click here to add your thoughts at the end of the comments
  1. Thanks for sharing this example, Kitty! I love how the Leeds method has been automated and taken to a whole new level. I’ve been working with it over the past day and a half, but can’t wait to learn more. I’d really like to totally analyze my own matches!

    • Wendell is that what the problem is? GA will NOT take my Ancestry login. I messaged them and hopefully will find out what they think the problem is.

    • I did not have any problem adding or running the analysis with my Ancestry account.

      I have looked through my Ancestry results and they correspond well to what I know about some matches and to the Leeds method.

  2. This Clustering looks and sounds pretty awesome. To get started, should I be reading the manual or is it self explanatory via the site, and where?

    • There is a manual? LOL. I was excited to try this and just jumped right in. It was intuitive. I had my results within minutes. I will read any documentation available to make sure I maximize my use of the tool, of course, but the ease of use was a highlight of my first experience with this tool.

  3. I’m able to run Auto Cluster for multiple AncestryDNA profiles. I get the emails, save the HTML file to my desktop, open it… and the visual colored cluster is missing in each. All of the other info is there.

    I can also run this for my 23&Me accounts but I’m not getting email reports for them. I added and set up both websites and am confident the login info is correct.

    Any idea as to what I’m doing wrong? I read the manual, FAQ, and watched a youtube video.

    • I am also not able to see any colored blocks in my results. It is empty, but there is info on the sides with cluster numbers etc. Were you able to find out why? I am stumped.

  4. Deanna, Jack, et al –
    Ask the author of the tool – his email is info at the genetic affairs website.

    For those of you not getting the emails, check your spam folder. That’s where some of mine were. Then whitelist the above email or make a filter (what I did with my gmail account).

  5. Thank you for all you do. I am requesting some assistance here. Working on this for a very long time. Can you tell from this i compared to Ged Match if my mom and this man are half siblings. (Share same father- different mother)1 9,909,594 47,999,722 55.6 5,563
    1 88,365,638 110,893,107 20.9 2,901
    1 154,834,876 181,424,547 34.5 4,062
    2 17,572 32,756,557 54.6 5,523
    2 71,074,301 85,220,482 15.7 1,865
    2 191,716,994 226,911,444 39.7 4,386
    2 235,678,877 242,586,061 14.6 1,210
    3 136,120,636 191,278,269 62.3 7,021
    5 135,219,360 157,459,405 23.4 3,122
    6 24,072,460 98,576,954 58.2 9,533
    6 157,835,791 170,663,725 23.9 2,708
    7 8,500,116 69,908,923 67.2 8,916
    7 107,245,797 131,557,585 20.1 2,543
    7 150,475,620 153,632,140 9.3 602
    8 13,534,834 123,890,637 101.6 13,875
    8 133,192,142 146,241,933 24.0 2,440
    9 14,238,877 17,832,652 7.6 825
    10 1,962,320 66,153,222 76.4 9,962
    10 109,337,592 126,959,951 27.4 3,172
    11 76,140,188 133,849,939 73.8 9,373
    13 98,044,216 114,082,644 37.7 3,411
    15 21,569,096 58,405,096 55.2 5,660
    16 41,263 84,352,773 124.1 12,671
    17 2,692,016 74,731,448 117.8 10,882
    18 7,442,614 64,545,969 75.0 8,447
    18 70,203,043 76,098,043 16.1 1,151
    20 54,719,882 62,349,775 25.1 1,877
    21 13,523,286 28,311,753 31.3 2,583
    22 15,437,138 21,615,252 18.8 986
    Largest segment = 124.1 cM
    Total of segments > 7 cM = 1,312.0 cM
    29 matching segments
    Estimated number of generations to MRCA = 1.7
    On ancestry DNA
    Possible range: 1st – 2nd cousins

    Confidence: Extremely High

    My Mom and and this man ( possible half brother) Shared DNA: 1,254 cM across 46 segments
    His daughter which may be my mom’s half niece share Shared DNA: 736 cM across 34 segments

  6. Thank you for evaluating this tool and showing us how it has worked for you. I look forward to learning more about it.

  7. The DNAGedcom version works really well.
    That uses the DNAGedcom Client (paid) to extract the ICW info without causing Ancestry to crash.
    Really excellent for people starting out.
    Good for some of my problems and so much easier than doing it by hand for the last 3 months!
    Also some rather strange assignments – not a fault of the tool; just due to the random way DNA works. As long as you don’t have too many people in your comparison, try reducing cM by 5 (or 10 if very few) and compare again.

  8. I am not getting the option to add ancestry, it is completely missing from the choice of websites to add. Has it been removed? or just me.

    • Sorry Louise, I missed this comment. As you look at it in a web browser, you would use the print option in that browser. If you don’t know how to print from a browser, ask Google!

  9. If I only want to research one branch can I make some criteria for that? I will get thousands of matches for my father’s side. I need my mother’s grandfather.

    • Sorry Toni, I missed this comment. No way to separate. If you know one match that would be in that group, you can search for it on the page to find that cluster.
      A strategy is to get as many other descendants of your target person to test and then work through their common matches looking for clues.

  10. MAJOR CONCERN – Delete Icon Doesn’t Work.

    Hello Kitty,
    I tried Genetic Affairs back on Dec 20th, 2018. I only wanted a trial, and planned to delete everything afterwards. I went in on Dec. 21st, 2018 and clicked the delete icon beside each kit and beside my Ancestry profile. I found that clicking the delete icon did nothing, even after refreshing, each kit and my profile were still listed at Genetic Affairs. I’ve tried a few more times since then, most recently on Jan 16th.

    I’ve posted to their Facebook group twice now, once on Dec 21st and once on Jan. 16th. I’m seeing that while autocluster questions will be answered, they will not respond with why the delete icon isn’t working or when it will be working.

    I’m lucky in that I used a 2nd Ancestry account with only contributor level access, so I’ve been able to remove Genetic Affairs access by taking away access to my kits from this 2nd Ancestry account, and changing the 2nd accounts password at Ancestry. However, this makes me worried for those who have given Genetic Affairs accesss to their 23andMe or FTDNA results.

    Is there a chance you could get a response from Genetic Affairs about why the delete icon isn’t working and when it will be?

    • JA,
      Facebook is the wrong place to report a problem like this as it may get lost in the multitude. An FB group is more about users helping each other.
      If you had sent an email to info at that website, you would have had a prompt response from Jan. I did forward your complaint to him and presumably he has been in touch and taken care of it

  11. I just ran three clusterings from Ancestry and one from ftDNA. i Received the colored clustering chart and two excel spreadsheets. I did not receive the Auto Cluster Match Information chart shown at the top of this Blog. Any help would be greatly appreciated

  12. I received the email much faster than expected. In it I received a zipped csv file, which opened okay. There were two other untitled .dat files. No 2nd csv and no html, as described in the email, and no indication that clustering didn’t happen. What am I doing wrong?

  13. Are their any other resources to help understanding clustering. I did mine and the algorithm gave seven clusters based off my highest matches on Ancestry. Does this mean NPEs in the tree?

    • Jane, it just means that those matches of yours share a common ancestor with you and each other. Possibly a great grandparent or a great great. Look at their trees for common ancestors with each other and you. If there are groups that have the same common ancestors with each other but not you, then there may be something wrong on your tree, like an NPE.

      To diagnose an NPE you need to look at your tree and see if you have matches for each great grandparent (ThruLines can help with that). There of course is the possibility that you or a parent are adopted and did not know … try printing out ypur pedigree and starring the ancestors you have matches for as per Michele’s tips:

  14. I tried this after using the Leeds Method myself. How does on interpret results with more than 8 clusters? One of my DNA kits resulted in 16 distinct clusters on genetic affairs.

    • You can change the clustering parameters to try and get fewer clusters. Those extra clusters may represent gg-grandparenst or ancestors further back.

    • Those 16 could be your different great great grandparent lines or even from further back ancestors. Experiment with different parameters for the clustering to see what you get.

  15. Hi
    I have been using GEDmatch which is great.
    But I am new to clustering.
    Using Genetic affairs I get an email with an attachment. When I open it it is in computer language.
    I do not know how to work with Google Docs that is where it is landing.
    I have not been able to open a chart with the graphics.
    What programme do I need to open HTML ged etc to turn into a chart?

    DNA Detective in training
    Janet Tillman nee Stephens.

    • Janet –
      All you need is a browser like Chrome to look at the HTML file once it is UNZIPPED. If you have a PC then download the file and do an extract all. Then click on the html file name and it should appear in your default browser.

  16. Hi Kitty,

    I am using MyHeritage autoclustering tools and so far I only have 64 matches on it – got about 700 in total so most are not on the list.

    The tool gives me 13 clusters, so about 5 people listed per each.

    The range of these matches goes from 71 cM shared dna to 15 cM shared dna. My top MyHeritage match sharing 82 cM is unfortunately not listed in any cluster.

    I do not think I found any common surname from those who provided trees for any given cluster.

    Are these 13 clusters extremely faraway ancestors, say beyond gggg’s? Also, at least 5 of these 13 show predominantly Finnish names, are these 5 suggesting pileup regions with endogamy? I never heard of these lines in my family, though I have over 100 matches from there alone – a second country after the U.S. in total number of matches. No Finnish or Scandinavian in my estimate.

    In general, fewer matches in a cluster suggest closer common ancestors? The 71 cM match has 3 people, though there are 4 clusters having only 3 matches, with one of these 4 showing the top match sharing only 19 cM in total.

    Any clues would be greatly appreciated!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.