Eliminating the DNA Matches from One Side

UPDATE 26-Oct-2018: Click here for this tool  –  Extract Desired Lines

Unless you get lucky with a first cousin or closer match, searching for an unknown parent or grandparent involves building lots of trees for your DNA relatives and looking for common ancestors among them. Then you build down from those ancestors looking for someone in the right place at the right time. It is best to have two pairs of common ancestors because then you are looking for where their descendants meet in a marriage.

Sample ancestor list from GWorks – linked to slide in my GWorks presentation – using-dna-for-adoption-searches

It is wonderful to have automation to compare trees for you. The GWorks tool suite from DNAgedcom.com does just that and, no surprise, I have written many blog posts about how to use those tools. They can collect all your Ancestry matches and then all the ancestors in their DNA connected trees and give you a list of the most frequently seen ancestors. You can also upload GEDcoms collected elsewhere or created by your own research to use in the comparison.

There are many times I would like to automatically exclude half the ancestors collected from Ancestry. For example when I am helping a person who knows only one parent but has a half sibling or the known parent tested. Specifically when they look at the results from a GWorks run, how do they eliminate the matches from the other side?

View Trees with an example for deleting

One way is to go to the “View Trees” on the GWorks menu at DNAgedcom and delete all the trees from the known side by clicking the red X to the far right of each tree. Then rerun the “Match GEDcom files” in the Manage Tree Files function. This could take forever in a half sibling case.

However, it is very useful to delete trees when one person has tested multiple family members and they are all in the same tree. In that case I keep the tree for the person who is further up the line. Very conveniently you can click on the tree name to go to that match at Ancestry, as long as you are logged in there, so you can easily figure out which one to keep. But again, this is too lengthy a process for a half sibling case.

DNAgedcom client home window

I have long used the Match-O-Matic (M-O-M) feature in the DNAgedcom client (DGC) to get the lists of matches for just one side (the m_ file) in a spreadsheet for use to keep track of my research. However M-O-M does not work for the tree files (aka the a_ file – actually it has a list of ancestors and which trees they are from).

Sometimes it is good to be a programmer. I have put together a new tool that you can use with a list of matches, for example the match file from M-O-M, to create a new tree file with only those trees that are for the matches in the match file. Then you can upload that match file and the new tree file to DNAgedcom for use in GWorks.

Match O Matic form

For example, for two half siblings sharing an unknown father you could run M-O-M to get a new match file of just the matches they share. Then use that shared match file with the gathered trees file ( the a_ file) from one of them to generate a new tree file of the trees for the common matches.

The new tool for making tree files is called Extract Desired Lines – click the name to find it with some documentation included:

Are you ready for the step by step?

Start with step 1 in my post about GWorks – – but register a new username for this experiment.

(To create a new email address you can add to your gmail username by using a plus so for example someone+ABC@gmail.com would still go to the someone address as it appears the same to gmail but is a different address to the DNAgedcom website.)

For step 2, collect the match files (m_ files) and at least one trees file (a_ file). If you are working on finding the unknown parent when the half sibling is from the known parent, you need to be careful because they will not share all the same fourth cousins on the known side. For that case, I recommend excluding distant cousins and setting the cM (new feature) to at least 20, perhaps 30 or 40 for the person whose parent you are looking for when collecting the match file, while getting all the matches, even distant cousins, for the other.

The client to gather matches and trees from Ancestry

The next step (step 2a) is to run Match-o-Matic to get a file with either the shared matches or the matches not shared. Here is what that form looks like. You have to use the buttons to select each match file and then the folder to place the output file, followed by typing in a prefix. Next tell it which file(s) to generate. It runs very fast and does not tell you that it is done, so go look in the folder you selected for files with that prefix.

Step 2b is to run my new tool giving it the M-O-M match file and tree file for the person you are doing the search for. When it is done, you can right click the link and select”Save link as ..” to save that file wherever you wish to. If you just click the link, your browser will probably open it in your default spreadsheet program.

Proceed to step 3 in my GWorks post but use the M-O-M file for your m_ file and the file from my new tool for the a_ file. Soon you will have a database of the ancestors from just the side that you are interested in.

At the beginning of this article, I suggested that this technique can be used to search for an unknown grandparent. To do that, I would run M-O-M repeatedly on all the descendants I had in order to get a list of matches from just that grandparent. Then I would use that M-O-M file with one of the descendants tree files to generate only the trees of interest for GWorks using my new tool.

 

UPDATE 25-Aug-2018: Thanks to Don Worth we have now named the tool – Tree Slicer for GWorks – http://kittymunson.com/dna/ExtractDesiredLines.php

print

18 thoughts on “Eliminating the DNA Matches from One Side

Click here to add your thoughts at the end of the comments
  1. Hello Kitty,
    I am an adoptee, I had used a “no fee” search agency to find my biomom, she refused contact and would not tell me my biofather’s name. I had my DNA tested through 23andme in 2013. At that time, my closest match was 2%. The person shared with me several members of her family tree and I created her tree on Ancestry. All of my people in my trees had been thoroughly searched, I read the census’, the draft registrations, view directory listings looking for additional family members or any additional information I can find. (I am surprised and disappointed by the people who just copy names, as they have many mistakes.) As time went by, I made a tree for my maternal side, since I knew her name and where she had lived, and I had more DNA matches, so I made trees for everyone over 1.5%. I had about 15 trees going. As time went by, I realized that I’d already done the research on many “duplicate people” who were in different trees. I took my largest tree and added additional branches where my matches were adding on. I finally came down to 2 brothers. Then, a cousin turned up in a DNA match – I knew I had the correct match. My question of, “where did I come from?” was answered – about the same time as the computer programs you have shared with us – but I did it the long way, averaged 6 hours per day, almost every day, for about 5 years…

    • I need to add, I had a different “picture” for each DNA match and their direct parentage so I could view my final tree and see who was who. I then created another icon for people who matched person A and B, another for B and C, etc; before long, I had another icon for 3 matches, etc. I had written down the Icons and who they represented. Lots of work and thinking…

  2. Nancy Johnson, do you mind sharing the name of the “no fee” search agency? One of my recent matches is trying to find her bio father and I would like to help her? Thank you.

  3. Thank you so much for all these great tools you create for our genetic genealogy community! We’re fortunate to have your expertise directed to these innovations.

    • Thanks Heather, I have 4 different cases I am helping on where this was needed so after checking if Don W had anything, I quickly wrote it. All of these cases have way too many matches on the “colonial” side so needed to get rid of those … so far there are too few matches (Canadian, Mexican, and Norwegian unknown dads …) thus I guess I need to upload more trees vis the gedcom upload. Perhaps I need to blog about using pedigree thief and my ahnen2ged tool to collect gedcoms …

  4. Hi! I’m trying to use the “Extract Desired Lines” tool (Tree Slicer), but I don’t get a link or a file at all. Where should I see the file when the process completes?

    I’ve tried it twice on Chrome and once on Firefox.

    Thanks!

  5. “At the beginning of this article, I suggested that this technique can be used to search for an unknown grandparent. To do that, I would run M-O-M repeatedly on all the descendants I had in order to get a list of matches from just that grandparent. Then I would use that M-O-M file with one of the descendants tree files to generate only the trees of interest for GWorks using my new tool.”

    So after I run M-O-M on the descendant kits I have, I should copy and paste them into one M-O-M file, correct? Wouldn’t that produce duplicate matches? or is that irrelevant to finding other possible matches?

    • No Jason, no cut and paste.

      If you have multiple cousins from the unknown grandparent, what you want to do is create a list of just the matches on that line.

      You might start with all the cousins from that line and use M-O-M to get the common matches with the a cousin. Then run M-O-M again with the next cousin using the previous result of M-O-M

      Another way to do that is use M-O-M with a cousin from another line to remove his matches. Then do it again on the resulting file with another cousin.

      Once you have the list of matches to the mystery line, then you can use it with my too to extract just the trees for that line

  6. Hi again. First, thanks for the great self-guides. However, I’m having a technical issue now 🙁 I was using your treeslicer tool to create a new CSV file, but after the files upload, nothing happens. The percent in the bottom left goes up to 100% and then nothing happens. I left the web page open for about 30 minutes and still nothing changed. I tried it with the latest version of Chrome and Microsoft Edge browser. (It was working previously when I skipped distant cousins in my match gathering, but my resulting list was too short.)

  7. Hi Kitty,
    Thank you for ALL the information…..but where do I start …..I did my dna test through Ancestry. Do you offer or have a step by step on how to use Ancestry dna information. I’d like to compare matches between my sister and I and hopefully be on the right path to discovering my biological father. You talked about so many interesting ways but now I’m confused. Can you help me?
    Bewildered,
    Linda

  8. Kitty, I’m a data science intern working on a machine learning algorithm to guess the centimorgan (cM) distance between two individuals based on personal data like name, age and place of birth. To train the algorithm I need lots and lots of data. Are you aware of any GENESIS tools that would allow me to do a dump of cM data? It does not matter who the data is for. I just need to train the software to make guesses about cM distance based on non-DNA information. Does GENESIS have an API?

    • John –
      No Genesis does not have an API.
      23andme has an API but I am not sure it has what you want. You might also talk to the people at DNA.land who are collecting DNA for scientific research to see if they would help you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.