My genealogy groups are buzzing with excitement about a new tool from Genetic Affairs to automate the clustering of your DNA matches. This takes the Leeds method concept to another level.
Everyone is posting pretty cluster pictures like the one below that I made for my perfect cousin, the star of many of my blog posts. This is a table where each DNA match is listed on the top and side; then if they match each other, the box is colored in with the color for that cluster. The chart is sorted by cluster. The idea is that each colored cluster shows descendants from a probable great grandparent couple of yours.
The gray boxes show where people match others outside the cluster which can often happen when families intermarry more than once or when they are first cousins enough times removed to have been in the second or third cousin group by DNA but are related to more than one set of great grandparents.
Automated clustering is useful because it puts your DNA relatives who are related to each other into visual groups so that you can quickly see which line a new match is related on. The picture is pretty but the workhorses are the charts for each cluster shown below that image when you scroll down. Here is the privatized one for my “perfect” cousin showing our MUNSON cluster.
Each name can be clicked to go to that Ancestry match page plus much useful additional information is shown next to the username: how many cMs shared, how many matches shared in the whole group, cluster number, how many people in their tree, and the notes you made for that match.
The image and charts are from the HTML file which arrived via email from Genetic Affairs after I requested automated clustering for my cousin’s Ancestry profile, which is shared with me there. You have to save the html file to your computer and then click on it to view it. When it first comes up, it is a mish-mosh sorted by name, but then it resorts itself by cluster. Fun to watch. Click here for the step by step of how to use this tool from the Intrepid Sleuth. It can also cluster matches from other sites like 23andme.
I decided to try it on an unknown father case I had not gotten around to working on yet, to see if it succeeded in speeding up the process and it did, to under an hour! A new record.
To be fair this was a case where the maternal side was well known and documented and all ancestors have deep American roots. Jack had tested to help his maternal half brother Everett find his Dad but initially had no interest in pursing his own bio dad connection (all names are changed as always for privacy). Recently he decided that knowing his paternal family medical history might be worthwhile so he gave me the go ahead.
I always knew that Jack’s search would be quick given that on his unknown paternal side he has 4 second cousin level matches and 18 third cousins, many with deep trees. This is way more than the average tester! Here is his automated cluster picture:
The names down the side of the page have been blurred out but I could see them and I was immediately struck by the appearance of a specific surname in half the usernames on that big cluster. I knew I had seen that surname in his best match’s tree, a first to second cousin, let’s call her Kate, who is too close to be included in the clustering.
Next I looked at cluster 2, also paternal, in the charts and saw that many of them had good sized trees. So I clicked the three top matches from the charts with trees and took a look at them. Again, one surname stood out.
Now can I find someone with surname1 marrying surname2? Those would be his paternal grandparents. Since Kate has a huge tree, I decided to search her tree for that second surname. There were only a few and just one was married to a woman with surname1. Plus they are the right ages to be his grandparents. How quick was that!
Next I build him a quick and dirty (Q&D) tree using Kate’s tree, green leaf hints, and those green “Potential Mother/Father” things. Connected his DNA as the son of an unknown son of the presumed grandparents.
While waiting for Ancestry to find those DNA ancestor hints, I decided to spot check if the expected relationships agree with his new tree for the relatives in the two clusters. So I clicked through from the charts to each clustered match with a tree, looked where the common ancestor was, and put in the note for that match what the expected relationship now is. I also checked each one in the DNApainter calculator to see if the theory fit. So far so good.
The next step is to try and figure out who the son(s) of this couple are. Usually that is via an obituary but I did not find one. So I reached out to a close relative on each side to ask and explained why. We will see if they respond. One logged in yesterday and has a huge tree so I am hopeful.
To summarize, the picture is pretty but the charts are an incredible time saver because I can see which matches have good sized trees and can click through to them to test my hypothesis. Not every adoptee will have such robust clusters so as to easily find the surnames of each grandparent, but hopefully some will.
I hope MyHeritage is added soon.
MyHeritage now offers a “new” Cluster tool (Under DNA Tab) on their site.
Thanks for sharing this example, Kitty! I love how the Leeds method has been automated and taken to a whole new level. I’ve been working with it over the past day and a half, but can’t wait to learn more. I’d really like to totally analyze my own matches!
I love seeing AutoCluster in action! Thanks for this very nice blog post Kitty Cooper.
I’ve been using DNA Painter, but this is an amazing tool I’m just now hearing about. Cannot wait to try it out. Thanks Kitty!
I knew I could count on you to describe how it works!
Has anyone ran into a firewall trying to add their Ancestry acct?
Wendell is that what the problem is? GA will NOT take my Ancestry login. I messaged them and hopefully will find out what they think the problem is.
Yes TK, I called them and they stated “will not share with 3rd party due to privacy issues”.
Blaine said in his FB group that there appears to be a problem today with 3rd party software and ancestry logins….
I did not have any problem adding or running the analysis with my Ancestry account.
I have looked through my Ancestry results and they correspond well to what I know about some matches and to the Leeds method.
This Clustering looks and sounds pretty awesome. To get started, should I be reading the manual or is it self explanatory via the site, and where?
There is a manual? LOL. I was excited to try this and just jumped right in. It was intuitive. I had my results within minutes. I will read any documentation available to make sure I maximize my use of the tool, of course, but the ease of use was a highlight of my first experience with this tool.
I’m able to run Auto Cluster for multiple AncestryDNA profiles. I get the emails, save the HTML file to my desktop, open it… and the visual colored cluster is missing in each. All of the other info is there.
I can also run this for my 23&Me accounts but I’m not getting email reports for them. I added and set up both websites and am confident the login info is correct.
Any idea as to what I’m doing wrong? I read the manual, FAQ, and watched a youtube video.
I am also not able to see any colored blocks in my results. It is empty, but there is info on the sides with cluster numbers etc. Were you able to find out why? I am stumped.
I received an email that referred to attachment but nothing was attached.
Found it. Nice!
Deanna, Jack, et al –
Ask the author of the tool – his email is info at the genetic affairs website.
For those of you not getting the emails, check your spam folder. That’s where some of mine were. Then whitelist the above email or make a filter (what I did with my gmail account).
23andme has new mapping and “DNA sharing” features that seem very promising.
Thank you for all you do. I am requesting some assistance here. Working on this for a very long time. Can you tell from this i compared to Ged Match if my mom and this man are half siblings. (Share same father- different mother)1 9,909,594 47,999,722 55.6 5,563
1 88,365,638 110,893,107 20.9 2,901
1 154,834,876 181,424,547 34.5 4,062
2 17,572 32,756,557 54.6 5,523
2 71,074,301 85,220,482 15.7 1,865
2 191,716,994 226,911,444 39.7 4,386
2 235,678,877 242,586,061 14.6 1,210
3 136,120,636 191,278,269 62.3 7,021
5 135,219,360 157,459,405 23.4 3,122
6 24,072,460 98,576,954 58.2 9,533
6 157,835,791 170,663,725 23.9 2,708
7 8,500,116 69,908,923 67.2 8,916
7 107,245,797 131,557,585 20.1 2,543
7 150,475,620 153,632,140 9.3 602
8 13,534,834 123,890,637 101.6 13,875
8 133,192,142 146,241,933 24.0 2,440
9 14,238,877 17,832,652 7.6 825
10 1,962,320 66,153,222 76.4 9,962
10 109,337,592 126,959,951 27.4 3,172
11 76,140,188 133,849,939 73.8 9,373
13 98,044,216 114,082,644 37.7 3,411
15 21,569,096 58,405,096 55.2 5,660
16 41,263 84,352,773 124.1 12,671
17 2,692,016 74,731,448 117.8 10,882
18 7,442,614 64,545,969 75.0 8,447
18 70,203,043 76,098,043 16.1 1,151
20 54,719,882 62,349,775 25.1 1,877
21 13,523,286 28,311,753 31.3 2,583
22 15,437,138 21,615,252 18.8 986
Largest segment = 124.1 cM
Total of segments > 7 cM = 1,312.0 cM
29 matching segments
Estimated number of generations to MRCA = 1.7
On ancestry DNA
Possible range: 1st – 2nd cousins
Confidence: Extremely High
My Mom and and this man ( possible half brother) Shared DNA: 1,254 cM across 46 segments
His daughter which may be my mom’s half niece share Shared DNA: 736 cM across 34 segments
With 3 segments > 100 and that much shared DNA they are half siblings, same father see https://blog.kittycooper.com/2017/09/the-25-relationship-a-first-look-at-the-data/
Thank you for evaluating this tool and showing us how it has worked for you. I look forward to learning more about it.
The DNAGedcom version works really well.
That uses the DNAGedcom Client (paid) to extract the ICW info without causing Ancestry to crash.
Really excellent for people starting out.
Good for some of my problems and so much easier than doing it by hand for the last 3 months!
Also some rather strange assignments – not a fault of the tool; just due to the random way DNA works. As long as you don’t have too many people in your comparison, try reducing cM by 5 (or 10 if very few) and compare again.
I am not getting the option to add ancestry, it is completely missing from the choice of websites to add. Has it been removed? or just me.
Ancestry is back as of today
How do I print the page “Auto Cluster Match Information”?
Sorry Louise, I missed this comment. As you look at it in a web browser, you would use the print option in that browser. If you don’t know how to print from a browser, ask Google!
If I only want to research one branch can I make some criteria for that? I will get thousands of matches for my father’s side. I need my mother’s grandfather.
Sorry Toni, I missed this comment. No way to separate. If you know one match that would be in that group, you can search for it on the page to find that cluster.
A strategy is to get as many other descendants of your target person to test and then work through their common matches looking for clues.
MAJOR CONCERN – Delete Icon Doesn’t Work.
I tried Genetic Affairs back on Dec 20th, 2018. I only wanted a trial, and planned to delete everything afterwards. I went in on Dec. 21st, 2018 and clicked the delete icon beside each kit and beside my Ancestry profile. I found that clicking the delete icon did nothing, even after refreshing, each kit and my profile were still listed at Genetic Affairs. I’ve tried a few more times since then, most recently on Jan 16th.
I’ve posted to their Facebook group twice now, once on Dec 21st and once on Jan. 16th. I’m seeing that while autocluster questions will be answered, they will not respond with why the delete icon isn’t working or when it will be working.
I’m lucky in that I used a 2nd Ancestry account with only contributor level access, so I’ve been able to remove Genetic Affairs access by taking away access to my kits from this 2nd Ancestry account, and changing the 2nd accounts password at Ancestry. However, this makes me worried for those who have given Genetic Affairs accesss to their 23andMe or FTDNA results.
Is there a chance you could get a response from Genetic Affairs about why the delete icon isn’t working and when it will be?
Facebook is the wrong place to report a problem like this as it may get lost in the multitude. An FB group is more about users helping each other.
If you had sent an email to info at that website, you would have had a prompt response from Jan. I did forward your complaint to him and presumably he has been in touch and taken care of it
I just ran three clusterings from Ancestry and one from ftDNA. i Received the colored clustering chart and two excel spreadsheets. I did not receive the Auto Cluster Match Information chart shown at the top of this Blog. Any help would be greatly appreciated
The chart is on the HTML page, you have to scroll down. DNApainter has a tool to extract it to a spreadsheet
Thanks for the explanation Kitty. I hope this will help with my two biggest brick walls.
I received the email much faster than expected. In it I received a zipped csv file, which opened okay. There were two other untitled .dat files. No 2nd csv and no html, as described in the email, and no indication that clustering didn’t happen. What am I doing wrong?
Your email client apparently thought it was a good idea to rename the zip files to dat files. No worries, just download the dat files, rename them to something that ends with .zip and you’ll be able to extract them and view the HTML. We also have a nice user group on Facebook that could be interesting: https://www.facebook.com/groups/319181318684957 Good luck!
Are their any other resources to help understanding clustering. I did mine and the algorithm gave seven clusters based off my highest matches on Ancestry. Does this mean NPEs in the tree?
Jane, it just means that those matches of yours share a common ancestor with you and each other. Possibly a great grandparent or a great great. Look at their trees for common ancestors with each other and you. If there are groups that have the same common ancestors with each other but not you, then there may be something wrong on your tree, like an NPE.
To diagnose an NPE you need to look at your tree and see if you have matches for each great grandparent (ThruLines can help with that). There of course is the possibility that you or a parent are adopted and did not know … try printing out ypur pedigree and starring the ancestors you have matches for as per Michele’s tips:
I tried this after using the Leeds Method myself. How does on interpret results with more than 8 clusters? One of my DNA kits resulted in 16 distinct clusters on genetic affairs.
You can change the clustering parameters to try and get fewer clusters. Those extra clusters may represent gg-grandparenst or ancestors further back.
Those 16 could be your different great great grandparent lines or even from further back ancestors. Experiment with different parameters for the clustering to see what you get.
I have been using GEDmatch which is great.
But I am new to clustering.
Using Genetic affairs I get an email with an attachment. When I open it it is in computer language.
I do not know how to work with Google Docs that is where it is landing.
I have not been able to open a chart with the graphics.
What programme do I need to open HTML ged etc to turn into a chart?
DNA Detective in training
Janet Tillman nee Stephens.
All you need is a browser like Chrome to look at the HTML file once it is UNZIPPED. If you have a PC then download the file and do an extract all. Then click on the html file name and it should appear in your default browser.
I am using MyHeritage autoclustering tools and so far I only have 64 matches on it – got about 700 in total so most are not on the list.
The tool gives me 13 clusters, so about 5 people listed per each.
The range of these matches goes from 71 cM shared dna to 15 cM shared dna. My top MyHeritage match sharing 82 cM is unfortunately not listed in any cluster.
I do not think I found any common surname from those who provided trees for any given cluster.
Are these 13 clusters extremely faraway ancestors, say beyond gggg’s? Also, at least 5 of these 13 show predominantly Finnish names, are these 5 suggesting pileup regions with endogamy? I never heard of these lines in my family, though I have over 100 matches from there alone – a second country after the U.S. in total number of matches. No Finnish or Scandinavian in my estimate.
In general, fewer matches in a cluster suggest closer common ancestors? The 71 cM match has 3 people, though there are 4 clusters having only 3 matches, with one of these 4 showing the top match sharing only 19 cM in total.
Any clues would be greatly appreciated!