By Jim Bartlett
Impressed by Jim Bartlett’s prose on various message boards and mailing lists, I asked him to do a guest blog post on using spreadsheets with autosomal DNA results, here it is – Kitty
Using autosomal DNA testing can be a challenge – but it doesn’t have to be. It can be intimidating – but by taking it a step at a time, you can break it down into bite-sized pieces that are much easier. When you decide to use autosomal DNA (atDNA), and to get the most out of it, I recommend three broad areas of focus right from the start:
- Learn all you can about DNA testing for genealogy and particularly about autosomal DNA (atDNA). The ISOGG wiki is a good place to find good articles, tools, blogs (to keep you up to date), etc. Join email lists and read and ask questions. This is definitely a “continuing education” hobby. We are on a frontier with genetic genealogy – and we are pushing the boundaries every day!
- Create as robust a Tree as you can – stretch as much as you can to 12 generations, or more. This is the net you need to catch cousins and find your Common Ancestors. This is very important – if you don’t have the ancestors in your Tree, you cannot expect to find a Common Ancestor with a Match.
- Set up a process for your autosomal DNA project. To determine Common Ancestors you have to share ancestry info with your DNA Matches – you’ll be sending (and receiving) a lot of emails and messages. You’ll want to keep track of what you do; to find info on your Matches; to remember the Common Ancestors you determine; new names, new emails, new links to Trees, etc., etc. You may want to use a spiral notebook as a Diary or Journal of your notes. Some people keep a notecard for each Match, or a folder. I now have over 3,000 matches at FTDNA and 23andMe, so I need something that can handle that many (and more) Matches. Many of us use a spreadsheet – read more to see how to set up one.
All of the above items can be set up and worked on while you are awaiting the results of your atDNA test. So like the ad: Just Do It! The rest of this article will focus on using a spreadsheet as a tool for atDNA I have found there are two fundamental uses of a spreadsheet for atDNA – it helps to understand both, particularly if you want to combine them into one spreadsheet – which I recommend.
- Management and Tracking
- Analyzing atDNA Segments – for Triangulation and Chromosome Mapping
- Management and Tracking is relatively easy – just download your list of Matches, and use this spreadsheet to keep track of them. A spreadsheet should be set up to match your management process, and new columns can always be added – just remember, you are probably the one who will have to enter the data into each column, so choose wisely. It’s also your spreadsheet, so set it up the way you want. Some suggested columns:
- Co (company): I use FF, 23, AD, Gm to indicate where the data is from
- Name (name of person who took the test – usually you; I use my initials, jvb)
- Match Name (two columns for first and last names, which permit a sort either way)
- Nickname (for Ancestry/GEDmatch/23andMe alternate names)
- POC (point of contact – sometimes someone else who manages the kit, or is the researcher)
- Send – when you send a message or email; e.g. e1/4/14
- Receive – when you receive a message or email [these help me remember and track]
- Notes – always a handy cell
- Date – I use the date the Match was made (a sort on this column lets me see the most recent)
Additional columns as you get the info:
- MRCA (Most Recent Common Ancestor) – I enter the 2 surnames of the couple [I also have column(s) for Ahnentafel numbers, helpful if you use them in your genealogy]
- Cousin (cousinship) – eg: 3C or 4C-1R, etc.- whatever works for you
- Tree – copy in the URL of any Tree or site for your Match’s Tree
- GEDmatch ID number (as you find them)
For this spreadsheet, you do not need to track Segments, cM, SNPs, etc. You are just trying to keep the information about your Matches in a handy place. I spent almost 2 years contacting Matches, sharing Trees, and determining Common Ancestors, before I moved to the next phase of using a spreadsheet for segments.
- Analyzing Segments – AFTER you have determined some MRCAs, and/or have some known close relatives tested (which give close MRCAs) you can start using a segment spreadsheet. A segment spreadsheet will let you sort your shared segments, and determine which segments are linked to which MRCAs (that’s why you need some MRCAs first). This same spreadsheet can also be used as a tool for advanced techniques such as Triangulation and Chromosome Mapping. Anyway – there are only a few columns needed to analyze segments:
- Match name (same as C above)
- Chr (chromosome number: 1-22 & X)
- Start point*
- End point*
* Because these numbers (between 1 and 250,000,000) are long, hard to read, etc., I round to tenths of a million, e.g. 143.7 (the units would be Mbp – Megabase pairs). Another reason to use millions, is because the segment end points are not precise anyway. Millions are much easier to read and analyze in a spreadsheet – for me; but, hey, it’s your spreadsheet, and you can enter the long numbers if you want. This is all you really need to analyze the segments (or to Triangulate or map Chromosomes). The concept is to sort your spreadsheet on Chr and StartPoint, and this will put all the segments in order within each Chr. From this sort, it is easy to see which segments overlap and can be analyzed for Triangulation. Many people will also want to add some other info to their spreadsheet:
- cM (centiMorgans) in the segment
- SNPs – in the segment
- Side (M for Maternal; P for Paternal) to indicate which parent this segment is from, as you determine this info [usually from determining the MRCA or having a Match InCommonWith a parent or other close relative]
- MRCA (see above)
NOTE: You need to decide which segments to include in a spreadsheet. To start, I’d recommend only segments over 7cM (many segments below this will be IBS – i.e. not from a specific ancestor). You will have many smaller segments at FTDNA, and some smaller at 23andMe, and any size you select at GEDmatch – I would only use those over 7cM. Once you have a lot of experience with segments, spreadsheets, Triangulation, etc., you may want to go back and see if some of the smaller segments “fit”. I’m not there yet. ALSO: some of your Matches (particularly the closer ones) will have more than one segment over 7cM, so you’ll need to enter them in separately – as separate rows – in your spreadsheet.
With the above info (A, B, C, D, G and H), you can create a nice graph of which ancestor gave you which DNA segment your 45 or 46 chromosomes using Kitty Cooper’s utility at:
Be sure to follow the instructions carefully, including saving a copy of your spreadsheet as a CSV file, and having a single header row with the titles I’ve listed above.
I have generally downloaded the above info from FTDNA, 23andMe, Ancestry and GEDmatch, and arranged & combined it into one spreadsheet. This then is my main atDNA tool – for tracking and analyzing. After the initial download, I generally add new Matches manually. But if a big update is necessary, I will download the info, arrange it to match the spreadsheet, and then add it to the Main spreadsheet (having first saved a copy of the Master, just in case). Sometimes (such as the change in segment data from Build 36 to 37), I’ll add the new info to the old; sort on names and then copy the changed data from the new to the old rows to preserve all my notes and send/receive dates, etc.
Comments and suggestions for improvements are welcomed. Segment analysis, Triangulation, Assigning segments to a side are topics for a different post; this one is just about spreadsheets.
Jim Bartlett firstname.lastname@example.org