Have you been wondering why are all your favorite bloggers are going crazy for automatic clustering? Well it is a fun visual technique to see which matches belong to which family line by making a chart with your matches across both the top and side, grouping them by who matches who, and then coloring those boxes in. This creates visual clusters which will roughly correspond to your great grandparents or their parents.
My perfect cousin has many matches on all her great grandparent lines (green is my Munson side) so I used her to showcase the new DNAgedcom clustering above. Notice how similar it is to her cluster from Genetic Affairs shown in my previous blog about that site and tool.
Here are all the new ways to cluster our DNA matches:
- DNAgedcom now has a clustering tool in their client (DGC) which uses your ancestry match list and ICW files (described in detail in the read more below)
- Genetic Affairs has Ancestry clustering working again
- DNApainter created a tool to create a CSV from the Genetic Affairs html cluster file. Some of us love to use spreadsheets.
- Andy Lee of Family History Fanatics figured out how to take an autosomal match matrix from GEDmatch and cluster it in a spreadsheet program, Click here for that video – the explanation starts just after 42 minutes and this is really fun!
- Rumor has it that GEDmatch may add automatic clustering sometime in the new year…
All of this is based on the method developed by Dana Leeds to organize your matches which is easy and simple to do. Click here for her blog about that.
Read on for how I used the new DNAgedcom clustering tool for myself and my brother, where I know all our great grandparents.
One problem for me is that our maternal side is half German Jewish and half Bavarian German. There are almost no Bavarian matches and Ancestry removes the population specific Jewish DNA. Thus, since grandad was an only child, we have very few matches on that line; mainly I expect to see Norwegian Americans from our Munson, Halling, Skjold, and Wold lines.
Here is the how to.
First you need to have a paid membership since clustering is a feature of the subscription only DNAgedcom client (DGC) which you download from the subscribers page and run on your PC or Mac. Look at all the new options it has! It even includes collecting from MyHeritage now.
Next click on the company that you want to collect your data from. Today’s post is just going to demo using the Ancestry files since that is where the majority of my family matches are. You give your ancestry username and password in the little window that comes up when you click on Ancestry. Before you start collecting, I suggest you click the box to Skip Distant Cousin Matches. You can also set the Minimum cM to 20 or even 30 if you are from an endogamous group. For clustering, you only need to gather matches and ICW, so I underlined those in green along with the selections I suggest.
If you have more than one persons DNA shared with you or in your account, you need to select the person whose matches you want to cluster, then click Gather Matches. When that is done click Gather ICW.
Now you are ready to cluster. Since this works with the files on your computer, you can easily fiddle with the numbers and rerun this many times until you have a picture you like. For cousin Jeanie’s above I used a maximum of 300 to get rid of some of her first cousins once removed and used a minimum of 50 but 60 would have worked for her as well. For my brother and myself, I found that the minimum of 30 and a maximum of 300 worked best.
There is a nice feature that you can set the result to open in your browser when you are done, thus perhaps answer a few emails while you wait. Actually it is pretty quick. This result is an html file (web page) that is on your computer in the folder you selected when you set up your DGC settings. It is called clm3d_ then the persons name and so forth.
Every match’s name in the left column of the resulting web page can be clicked to go to their match page on Ancestry. Plus the tree icons show if they have a linked tree and can also be clicked to go to that tree. All the gray boxes indicate that this person is actually in more than one cluster. In my chart, these are the children and grandchildren of my first cousins so they are expected to be in multiple clusters. Note that I have no cluster for my Halling great grandmother (married to the Munson). I have only one very distant match from her side.
I added the family group names to my image in photoshop for the benefit of my cousins. The Westby group is my current mystery. They are related to my Wold line, from the same farm in Skogar, Vestfold, Norway that my great great grandad Jorgen Wold lived on. Since he was born 10 years before his parents married and I have no DNA matches on his Dad’s line and a few on his mom’s line … perhaps his father of record was not his biological dad.
Interesting how different my brother and my matches are, but not a surprise.
UPDATE 31-DEC-2018: I forgot to mention that there is also an excel file from the DGC in the same folder with the same name as the web page file just a different extension. You can experiment with that too perhaps using some of Andy Lee’s techniques. Plus there are a number of tips written up for using it by John Motzi in the DNAgedcom user group on FaceBook.