Have you been wondering why are all your favorite bloggers are going crazy for automatic clustering? Well it is a fun visual technique to see which matches belong to which family line by making a chart with your matches across both the top and side, grouping them by who matches who, and then coloring those boxes in. This creates visual clusters which will roughly correspond to your great grandparents or their parents.
My perfect cousin has many matches on all her great grandparent lines (green is my Munson side) so I used her to showcase the new DNAgedcom clustering above. Notice how similar it is to her cluster from Genetic Affairs shown in my previous blog about that site and tool.
Here are all the new ways to cluster our DNA matches:
- DNAgedcom now has a clustering tool in their client (DGC) which uses your ancestry match list and ICW files (described in detail in the read more below)
- Genetic Affairs has Ancestry clustering working again
- DNApainter created a tool to create a CSV from the Genetic Affairs html cluster file. Some of us love to use spreadsheets.
- Andy Lee of Family History Fanatics figured out how to take an autosomal match matrix from GEDmatch and cluster it in a spreadsheet program, Click here for that video – the explanation starts just after 42 minutes and this is really fun!
- Rumor has it that GEDmatch may add automatic clustering sometime in the new year…
All of this is based on the method developed by Dana Leeds to organize your matches which is easy and simple to do. Click here for her blog about that.
Read on for how I used the new DNAgedcom clustering tool for myself and my brother, where I know all our great grandparents.
At my recent GEDmatch talk for i4GG, I warned the crowd that soon Genesis would be the only place at GEDmatch where you could upload new DNA kits. Well that day has actually come! Although your kits will migrate from GEDmatch, you may want to upload to Genesis if you cannot wait to see the comparisons. By the way, your GEDmatch login will work just fine at Genesis. Note that Genesis has the GEDmatch logo with an apple core next to it.
So why do you have to move to GENESIS? The problem is that some companies are using newer chips which test for different not completely overlapping markers: LivingDNA and 23andMe since August 2017. Why you may ask? Because the new chips test more SNPs and have more non-European ethnic coverage.
So how do you compare apples to oranges? Well Genesis seems to do a good job of it and the new one-to-many warns you when there are not enough SNPs in common for confidence in the results by highlighting in red. Have a look:
Notice that the last three columns are new. One shows how many SNPs overlap between the kits (in other words, how many SNPs are in common between the two sets of test results so can be compared), the next shows the date compared, and finally the company where the test was done is listed. The latter is needed because kits uploaded directly to GENESIS get assigned kit ids that start with a pair of random letters so the origin is not known from that. Note that migrated kits keep the A,T,M, and H single letters. Also many recently migrated kits will show an overlap of 0 because that has not yet been compared for them.
You may also notice that many columns are missing like haplogroups, gedcoms, and X matching; nor are the columns sortable. Hopefully these features will be added back soon. The display is more compact with the confusing clickable L replaced by clicking on a kit number to see its list of one to many matches. By the way the Tier 1 version of the one-to-many looks exactly the same as the one on GEDmatch.
My genealogy groups are buzzing with excitement about a new tool from Genetic Affairs to automate the clustering of your DNA matches. This takes the Leeds method concept to another level.
Everyone is posting pretty cluster pictures like the one below that I made for my perfect cousin, the star of many of my blog posts. This is a table where each DNA match is listed on the top and side; then if they match each other, the box is colored in with the color for that cluster. The chart is sorted by cluster. The idea is that each colored cluster shows descendants from a probable great grandparent couple of yours.
The gray boxes show where people match others outside the cluster which can often happen when families intermarry more than once or when they are first cousins enough times removed to have been in the second or third cousin group by DNA but are related to more than one set of great grandparents.
Automated clustering is useful because it puts your DNA relatives who are related to each other into visual groups so that you can quickly see which line a new match is related on. The picture is pretty but the workhorses are the charts for each cluster shown below that image when you scroll down. Here is the privatized one for my “perfect” cousin showing our MUNSON cluster.
Each name can be clicked to go to that Ancestry match page plus much useful additional information is shown next to the username: how many cMs shared, how many matches shared in the whole group, cluster number, how many people in their tree, and the notes you made for that match.
The image and charts are from the HTML file which arrived via email from Genetic Affairs after I requested automated clustering for my cousin’s Ancestry profile, which is shared with me there. You have to save the html file to your computer and then click on it to view it. When it first comes up, it is a mish-mosh sorted by name, but then it resorts itself by cluster. Fun to watch. Click here for the step by step of how to use this tool from the Intrepid Sleuth. It can also cluster matches from other sites like 23andme.
I decided to try it on an unknown father case I had not gotten around to working on yet, to see if it succeeded in speeding up the process and it did, to under an hour! A new record.