More Automated DNA Match Clustering!

Have you been wondering why are all your favorite bloggers are going crazy for automatic clustering? Well it is a fun visual technique to see which matches belong to which family line by making a chart with your matches across both the top and side, grouping them by who matches who, and then coloring those boxes in. This creates visual clusters which will roughly correspond to your great grandparents or their parents.

My perfect cousin has many matches on all her great grandparent lines (green is my Munson side) so I used her to showcase the new DNAgedcom clustering above. Notice how similar it is to her cluster from Genetic Affairs shown in my previous blog about that site and tool.

Here are all the new ways to cluster our DNA matches:

  • DNAgedcom now has a clustering tool in their client (DGC) which uses your ancestry match list and ICW files (described in detail in the read more below)
  • Genetic Affairs has Ancestry clustering working again
  • DNApainter created a tool to create a CSV from the Genetic Affairs html cluster file. Some of us love to use spreadsheets.
  • Andy Lee of Family History Fanatics figured out how to take an autosomal match matrix from GEDmatch and cluster it in a spreadsheet program, Click here for that video – the explanation starts just after 42 minutes and this is really fun!
  • Rumor has it that GEDmatch may add automatic clustering sometime in the new year…

All of this is based on the method developed by Dana Leeds to organize your matches which is easy and simple to do. Click here for her blog about that.

Read on for how I used the new DNAgedcom clustering tool for myself and my brother, where I know all our great grandparents.

One problem for me is that our maternal side is half German Jewish and half Bavarian German. There are almost no Bavarian matches and Ancestry removes the population specific Jewish DNA. Thus, since grandad was an only child, we have very few matches on that line; mainly I expect to see Norwegian Americans from our Munson, Halling, Skjold, and Wold lines.

The new DNAgedcom client window

Here is the how to.

First you need to have a paid membership since clustering is a feature of the subscription only DNAgedcom client (DGC) which you download from the subscribers page and run on your PC or Mac. Look at all the new options it has! It even includes collecting from MyHeritage now.

Next click on the  company that you want to collect your data from. Today’s post is just going to demo using the Ancestry files since that is where the majority of my family matches are. You give your ancestry username and password in the little window that comes up when you click on Ancestry. Before you start collecting, I suggest you click the box to Skip Distant Cousin Matches. You can also set the Minimum cM to 20 or even 30 if you are from an endogamous group. For clustering, you only need to gather matches and ICW, so I underlined those in green along with the selections I suggest.

If you have more than one persons DNA shared with you or in your account, you need to select the person whose matches you want to cluster, then click Gather Matches. When that is done click Gather ICW.

DGC window to create a cluster file from matches and ICWs

Now you are ready to cluster. Since this works with the files on your computer, you can easily fiddle with the numbers and rerun this many times until you have a picture you like. For cousin Jeanie’s above I used a maximum of 300 to get rid of some of her first cousins once removed and used a minimum of 50 but 60 would have worked for her as well. For my brother and myself, I found that the minimum of 30 and a maximum of 300 worked best.

There is a nice feature that you can set the result to open in your browser when you are done, thus perhaps answer a few emails while you wait. Actually it is pretty quick. This result is an html file (web page) that is on your computer in the folder you selected when you set up your DGC settings. It is called clm3d_ then the persons name and so forth.

My Clustered Ancestry Matches Annotated

Every match’s name in the left column of the resulting web page can be clicked to go to their match page on Ancestry. Plus the tree icons show if they have a linked tree and can also be clicked to go to that tree. All the gray boxes indicate that this person is actually in more than one cluster. In my chart, these are the children and grandchildren of my first cousins so they are expected to be in multiple clusters. Note that I have no cluster for my Halling great grandmother (married to the Munson). I have only one very distant match from her side.

I added the family group names to my image in photoshop for the benefit of my cousins. The Westby group is my current mystery. They are related to my Wold line, from the same farm in Skogar, Vestfold, Norway that my great great grandad Jorgen Wold lived on. Since he was born 10 years before his parents married and I have no DNA matches on his Dad’s line and a few on his mom’s line … perhaps his father of record was not his biological dad.

My borther’s Clustered Ancestry Matches annotated

Interesting how different my brother and my matches are, but not a surprise.

UPDATE 31-DEC-2018: I forgot to mention that there is also an excel file from the DGC in the same folder with the same name as the web page file just a different extension. You can experiment with that too perhaps using some of Andy Lee’s techniques. Plus there are a number of tips written up for using it by John Motzi in the DNAgedcom user group on FaceBook.

print

24 thoughts on “More Automated DNA Match Clustering!

Click here to add your thoughts at the end of the comments
    • The Genetic Affairs one isn’t too hard – you literally just tell it if you want Ancestry or FT DNA website, provide your login and it does it all for you. And it’s really cool when it orders the clusters 🙂

    • I’m with you Martha, way, way over my head, but oh, how I would love to be able to do it. I have not even attempted to move my DNA from Ancestry to Gedmatch! And now, do I understand don’t do Gedmatch, do something called Genesis???

      • At one time, we all were there. In fact, some of us are there as different DNA tools become available. Check out the You Tube on how to transfer Anc DNA to Gedmatch. Use a split screen and follow along pausing as needed. Good luck.

  1. 2018 was quite an exciting year for genetic genealogy and clustering! I’m excited to have been a part of it. Thanks for this great step-by-step guide! I’ll be sharing a link from my website. 🙂

    Happy New Year, Kitty!

    • Thanks Dana, a pleasure meeting you at i4GG! And thank you so much for your contribution to genetic genealogy. As you can see, many have taken your idea and automated it, wonderful!

  2. I don’t see a mention of the Excel file. You get two files from the Client. One is HTML, one is an Excel spreadsheet. I prefer it because of the many options. You can lock the rows so you don’t lose the names looking for the boxes. But if you really don’t like scrolling, open the Data page on the Excel file, to get everything together. You see the names, you see the size of the match, the number of the set, and the color of the box all together. No more scrolling.
    I added a column for notes, labeled the sets etc. It is just so much easier to work with. I love the sets. What I don’t love is scrolling.

    • It’s mentioned in the update at the bottom of the article, perhaps you had the previous copy in your browser. Anyway, thanks for all the tips!

  3. Well I just tried it. I had it set at 600 and 15 cms with Ancestry.

    I have 63 clusters and 664 matches.

    I’m thinking I need to trim my tree a bit…

    • 15cMs is way too low, start much higher! 90 is the recommended start, I often use 60 but for people with fewer matches I go down to 30 or even occasionally 20.

      • Thanks for this tip. I’ve also done the clustering tool and ended up with I don’t even know how many boxes (they seem to go on forever). I’ve not seen the 90 cMs minimum mentioned before. Also, I have the problem that I don’t recognize any of the names and of course most of the FEW trees that are available are basically empty or private. 🙁 So I really don’t know what I can do with the information now that it’s sorted.

  4. There are also clustering algorithms available through network graph software. Depending on the software used (plenty of free options) there are a range of automatic clustering algorithms that work very well with shared DNA matches.

  5. At one time, we all were there. In fact, some of us are there as different DNA tools become available. Check out the You Tube on how to transfer Anc DNA to Gedmatch. Use a split screen and follow along pausing as needed. Good luck.

    • Martha –
      Sorry you cannot. Because the purpose of clustering is to look at how everyone matches each other and then group them together. Thus you cannot compare someone at ancestry to someone at 23andme.
      However if all your best matches are uploaded to GENESIS … clustering will be implemented there soon

  6. Kitty,
    What would happen if you make a superkit of you and your brother, and then did the clustering?
    Steve

  7. Hi
    Reading your post i noticed about the munsons
    My father in law was named richard ironside Yarrow. the Ironside was after his grandfather who was richard Ironside Munson
    Have you any ironsides in your tree?

  8. Kitty have you used EJ’s Rule Based super kit cluster tool for an unknown parentage, or a pseudo unknown for trials purposes, with endogamous backgrounds?
    I’ve learned best from the way you present material and I’m interested in if you have done the above, or maybe connect me to someone who has?
    Thanks!
    Tasha

    • Sorry, I have not tried it yet. Soon … but generally clustering has not been useful for the endogamous unless I raise the lower limit to at least 120 cM!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.