It was only a matter of time before the methodologies and technologies that have been developed to break genealogical brick walls and find unknown birth parents were used to identify victims and criminals. The use of DNA and genealogy to solve the horrific Golden State killer case has been sensationalized in the media for several days now. I even got a few calls from reporters as a DNA and GEDmatch expert. Also, just two weeks ago an unknown murder victim from 30 years ago, found in Florida, was finally identified from a DNA cousin match on a genealogy site.
Some of my friends and cousins are worried about the possible invasion of their DNA test privacy. Most just want to understand how this can be done, so I will try to explain that in this post. At the end of this post I will include links to other genetic genealogy blog posts that have wrestled with the issues raised.
Although I have sympathy with the concerns of people who fear false identification using DNA techniques, this is not my fear. The methodology used gets to a pool of possibles whose actual DNA is then collected and compared. I have confidence in that technology. My fear is that my cousins will stop testing their DNA to help my family projects or stop uploading their tests to my favorite tools site, GEDmatch, where the DNA test results from different companies can be compared.
Click here for an article at the LA Times which went into more of the technical details of the Golden State killer case for us genetic genealogists and here for a lengthy video interview with investigator Paul Holes on how it was done.
Let me start my article by reminding all of you that every human’s DNA is about 99% the same as every other human and about 98.5% the same as a chimpanzee. The companies who test your personal genome only test a small sample of that differing 1%. To put it in numbers, our genomes have about 3 billion base pairs and the tests cover about 700,000 of those, which comes out to about .02% of your genome. Not enough to clone you or worry about, in my opinion.
Next let me remind you that uploading your DNA results from Ancestry or 23andme or wherever you tested to GEDmatch does not expose even that little bit of your DNA to the public. What happens is that your “DNA cousins” will match long sections of your data, called segments, and they can see which locations on which chromosome(s) are the same between the two of you. Therefore they know what your actual DNA code is only on those pieces they share with you. When they match you in the GEDmatch database, they can see your email address, name or pseudonym, and your kit number. With that kit number they can see what color your eyes are, what ethnicities various calculators give, and who else you match. If you have connected a family tree to your DNA they can also see the non-living people in your tree. But they have to match your DNA significantly to see any of that! Click here for an article I wrote addressing privacy worries at GEDmatch
So how do you get from there to a killer?
You start by putting a DNA results data file of your suspect on GEDmatch that looks like a kit from one of the main testing companies.
The methodology involves building endless trees. This is much easier to do on Ancestry or MyHeritage which have good family trees and good DNA to tree matching tools. There is also WIKItree which connects their one world tree to GEDmatch, if their users have entered their kit number. Finally GEDmatch itself also has a family tree (GEDcom) upload and compare facility.
To start tree building, you need to find some people who match the DNA, predicted second or third cousins are best, but it can be done from fourths, as it was suggested may have been done in this case. It just takes longer. Next you build the trees of these cousins back to about 1800 looking for an ancestor or couple that is in more than one tree, a common ancestor(s). Sometimes the trees are already built. The next step is to build the tree of that common ancestor’s descendants down to the present day looking for someone of the right age in the right place. There are tools that can help with comparing trees to get to that couple or couples but there are no short cuts to building the tree of their descendants, unless some of them have already built large public trees.
I read that in the Golden State case they got down to a pool of 100 people over a four month period using this technique. I saw an article that said that another suspect’s DNA was tested a year ago and was negative. So this methodology was not a panacea. It got them to a pool of people whom they had to investigate using standard police work and direct DNA matching. Anyone who has ever watched the TV show Bones knows that DNA can be extracted from chewing gum and drink cans…
When I do unknown parentage work, this tree building methodology can get me to grandparents or great grandparents fairly quickly, provided enough relatives are tested (easiest for Amercans). I will be giving a presentation on this technique at the SCGS Jamboree in Burbank at the end of May. Here are some of my blog posts that describe the methodology:
Here are some DNA success stories I wrote up:
These articles quote me:
Here is what the Legal Genealogist, Judy Russell has to say about the issues raised:
Here is what Leah Larkin, the DNA geek has to offer:
Roberta Estes:
And finally be sure to watch what top genetic genealogist CeCe Moore has to say about all this, she taped segments for 20/20 and Good Morning America.
UPDATE 1-May-2018: The clip of CeCe was tiny yesterday (Monday morning) on GMA – it’s about 1/3 of the way in when they discussed the Golden State killer case. Click here for a good summary of the case from the Washington Post.
Some great perspective on a very complex story!
One big problem is that some dna is NOT tied to your match’s tree because they do not have a tree.
My experience: At Ancestry I had a 800 cMs match and when I looked at the tree, I KNEW the tree had nothing to do with my dna match because my tree is documented.
I messaged to explain the discrepancy and was advised by the person handling the account that I was correct. It was her tree, but the dna belonged to her son in law.
A possible reason for this Caith is undisclosed parentage where one or more of the people you think are in your tree are actually a different person. This happens more than you might think.
Excellent article that gives an accurate versus fear-based perspective that could keep our cousins from testing DNA. I hope that your article is widely circulated on Social Media sites.
Thanks for providing information some much needed information on just what was done.
Thank you so much for this post! I am fairly new to genetic genealogy and I really needed to hear some common sense information about this subject. There are so many out there stirring up panic. Gedmatch is a great tool and it will be really devastating if people stop using it. I will forward your article to anyone who asks me about this.
Thanks for this Kitty! A point of clarification: the false accusation was based on ySearch results, not GEDmatch.
Thanks Leah!
The YSearch match was never *accused*. A warrant was issued to obtain a DNA sample, but according to one news report, the man cooperated willingly when the situation was explained and was not even aware that a warrant existed.
Great blog Kity. Thank you. Personally, I think it is this simple. Law Enforcement did nothing wrong from what I know. There will always be conspiracy theorists and those uneducated in Privacy. Individual privacy is not covered by the Constitution.
Here is a great paper to read on the process. I also think that everyone that has tested and has put their raw data on Gedmatch is a hero! Familial DNA Searching in Law Enforcement
https://www.ncjrs.gov/pdffiles1/nij/grants/251081.pdf
Kitty, Thanks for such a clear description. However, Gedmatch is even more open than you describe. You say “But they have to match your DNA significantly to see any of that!”. This is not true. I can search Gedmatch just using your email address, and if you have a tree there I can find your kit knowing (or guessing) any name in your tree. Once I have your kit number I can find all those other things you list, even if I have not uploaded my own DNA.
Very good point Andrew, but you still cannot see my actual DNA codes! That seems to be what the general public is panicking about
True. I can’t see what use that would be to anyone anyway. If I had you DNA sequence I’d have to create my own matching process, and get other people’s sequences to compare. Much easier just to use Gedmatch!
Kitty,
thanks for a well written and well covered explanation of this sitation.
Doug Marker
Great blog post. After talking with the 20/20 producers, I have strong doubts that my section of the interview will be long enough to explain much, so I am going to repost this article for my followers. I think it is very well done. Thank you, Kitty!
Thank you so much Cece and sorry your clip on Monday’s GMA was so brief (it’s about 1/3 of the way along for those of you who recorded it, in the brief coverage of the Golden State Killer)
Great perspective! Guess it was only a matter of time. But then remember well 2001 when FTDNA arrived on the scene and thought the genealogy community was going to come unglued. Thanks for all you do!
I added a link to the Washington Post article at the end of the post, which seems the best mainstream news media summary to me.
Buckskin Girl was found in Ohio not Florida
– Also, just two weeks ago an unknown murder victim from 30 years ago, found in Florida, was finally identified from a DNA cousin match on a genealogy site.
Thank you for the clarification, it was the Florida case I was referring to I guess. The point being DNA can help ID victims too which is an important usage, so thanks for all that!
Kitty -how do you format DNA results to allow upload to GEDMatch? How hard is that process? If the DNA results (15 markers) are good enough for CODIS are they good enough for a GED? Thank you in advance!
JT –
Codis uses STR markers which are good for identifying a single individual. Genetic genealogy autosomal tests use SNP markers which mutate more slowly over time and are very useful for ancestry composition and finding relatives.
These tests are not compatible.
Any set of autosomal test results are about 700,00 lines of data in an excel sheet. A number of University and private research labs can use the same Illumina chip technology as the companies selling these tests to partially sequence the DNA. Those results would then need a program to make them look the same as the commercial test results, not that hard to write as it is just massaging data.
First joined a genealogical society back in 1979. There in their journal is my name and address. And the electoral roll was available to anyone.
This is no longer so. My details are not out there publicly.
Telephone books where they exist omit many people and my younger cousins are just on mobiles with no public index.
At the same time, many other people are putting their whole lives on social media. And I also know that many people can easily access my credit data – something that used to be much harder.
Privacy is different from what it was.
But I can assure you that it is much, much harder to find some people now than it was 40 years ago. Barriers to accessing vital statistics have gone up too. I spent years trying to find some cousins. Unless there are online trees it is now much harder. The resources I relied on are unavailable or require individual retrieval slips – if they are stored on site at all.
And even with online trees, the main benefit is that there is someone who put the tree up who can be asked: people are much better at hiding current generations. (That is good – as long as they reply!)
So, while some of us worry about law enforcement access, cousin finding access has changed too, and is often harder.
Unlike a lot of people in the genetic genealogy community who seem to applaud the appropriation of consumer DNA databases for crime-solving purposes, I am extremely concerned about it. Kitty, you mention the reluctance of people to test, and that’s certainly a rational fear (most people are not going to take the time to hunt down your article to reassure themselves) but my other big concern comes with the reluctance of people who have tested to communicate with matches.
I was adopted, and for over three years now, I have been depending on the kindness and trust of others to help me construct pedigree charts for the purposes of triangulation. At this point, having read how police agencies are using direct-to-consumer DNA databases to do investigations, how do my matches know that I’m not a cop? I’ve taken pains to scrub my profiles and communications of any reference to adoption, as I’ve learned that it makes matches pretty squirrelly about telling me anything, and now, they have to worry that they may be opening the family “can of worms” about Cousin John who ran around with a bad crowd a few decades ago.
Add to that the fact that the police just showed everyone how easy it was to fool the DNA companies and GEDmatch. I worry about people using this for nefarious purposes. Imagine a man in a bar who thought that buying a woman a few drinks entitled him to her company for a few more hours in privacy. After she’s turned him down, he could easily grab her last cocktail glass, and be able to use the DNA databases to stalk her.
Or, imagine that a serious crime suspect is caught through the use of consumer DNA (as opposed to the CODIS database), and this suspect has “associates” who would use the suspect’s own DNA to find the family member whose test “ratted” their criminal companion out. And the police just made it look easy.
I’m hoping that all of the DTC companies find a way to validate the identities of people submitting DNA, or we’ve opened a Pandora’s box of potential trouble. That may be very hard to do with DNA kits sold at drugstores.
Dear D.R.,
It is not easy at all. It took six months by a professional genetic genealogy team to work through the matches to get down to a few people to follow up on with police work and police type DNA testing.
Your scenarios are still pretty far fetched. The man in the bar could far more easily stalk her in other ways (get her license plate, snap her pic and google it). The associates would have to punish dozens of people because it is unlikely that one test did the job.
However I understand how you are feeling. In this modern day and age genetic privacy may well be a pipe dream.
Most genetic genealogists are concerned. Yes we applaud this case and the cases of unidentified victims being returned to their families but how far could this go? How would you feel about litterers being prosecuted from the DNA on their litter? Or …?
What do you think of the dog poop DNA typing service? https://www.seattletimes.com/seattle-news/dog-poop-dna-tests-nail-non-scoopers/
Hi,
Do you know if Machine Learning or any other AI has been used to help speed up the process?
Don – Not to my kowledge … this is partially an art, seeing connections and making guesses so hard to do that
Pingback: DNA and Law Enforcement | Kitty Cooper's Blog