Everybody following genealogy publications will have encountered the use of genetics in genealogy: hereditary diseases, genetic markers, etc.
Here we give an overview of what type of genetics data is important in genealogy.
Humans have 23 pairs of chromosomes which contain their DNA (deoxyribose nucleic acid). Each of us inherits half of each chromosome from one parent and the rest from the other parent. From the mother one also inherits a piece of genetic information that is contained outside the nucleus of the cell. This genetic material is not part of a chromosome and is called mitochondrial DNA (mtDNA) and is passed nearly unchanged from a mother to her child. 22 of the chromosomes are called autosomes, the 23rd chromosome is referred to as the sex chromosome. For females the sex chromosome contains 2 X chromosomes one from each parent. A male inherits an X chromosome from his mother and a Y chromosome from his father.
At each generation a child's chromosomes are formed by a random combination of their parents 23 chromosomes. The 22 autosomal chromosomes are formed of a random combinaton of the parents DNA. This process tends to copy segments rather than individual locations while discarding the unchosen DNA. This is how 2 siblings will share DNA but not have identical copies with one another but yet have significant identifiable in common. The 23rd chromosome whether X or Y is passed from the donor parent whole and is not subject to this recombining process. In a male tests can distinguish the X and Y parts of the chromosome thus identifying genetic material from the mother and father. In a female the test can identify the 2 separate Xs in the chromosome but cannot identify which was donated by each parent.
The other process that can alter DNA is called mutation. Mutation simply means change and does not imply any negative consequence, although sometimes an accumulation of mutations can be dangerous. Significant mutations are quite slow to appear, perhaps one or two every few generations. If there were no mutations it would be impossible to distinguish ancestral lines, we would all carry identical DNA. Fast mutations allow researchers to distinguish family groups in genealogical timeframes, several hundred to a couple thousand years. Slow mutations, such as those that appear in the mtDNA or on the Y chromosome, allow researchers to distinguish race groups by analyzing ancient migration patterns, e.g. the saxon migration into Europe.
There are many privacy issues with storing genetics information. Caution is advised. Some information is harmless (eg DIS information which comes from the junk part of your DNA, not used in the bleuprint) while other information can be dangerous to publish (eg hereditary diseases)..
Even the genetically harmless information can be privacy infringing, as it can prove two people not to be related, which in itself can be troublesome (divorce issues, etc.).
Hence some tips if you want to store this information:
- set this information as private
- never include private information in reports you publish
- if you share information with other researchers, only share the public data
For harmless genetics data, it is useful to publish the data anonymized on a public forum. You can do that eg here on this wiki, but note that your account details will be visible in the history, so you might consider to send it to one of the administrators, or create a fake login for this reason.
Eg, DYS information is usefull to relate family branches, so publishing 'last name, region of birthplace, DYS codes' gives other researchers a forum to see how related they are to you.
The Y chromosome plays an important role. It is only inhereted from father to son, and hence is in theory identical to the Y chromosome of the father, apart from possible variants. It is possible to commercially investigate the Y chromosome of a male, and receive marker information. A marker is a grouping on the Y chromosome, of which the structure is investigated, and cataloged. Normally markers are part of the junk DNA, so this information is in itself genetically harmless.
Y-line STR Markers
A STR marker has a specific place on the Y chromosome, indicated by a DYS# code (DNA Y-chromosome Segment), and by a specific value, called allele, which essentially is the number of repeats of a certain marker. Repeats are common in the junk part of the human DNA.
There are already a 100 possible DYS markers available. Typically nowadays 12 to 37 will be tested.
How to use DYS numbers
By comparing a persons DYS profile, with that of another male, one can determine if one is direct family (identical DYS profiles), or guess based on the number of mutations how many generations ago a common ancestor lived. For this a table must be made between people with a DYS profile, with the difference in allele number
The less differences, the more related a person is to the person of which the profile is investigated.
Less specific than STR markers, a haplogroup is defined by certain SNP mutations that occur infrequently. It is less useful for estimating degrees of relationship between individuals but useful in a broader way, since it deals with whole populations. A Y-DNA haplogroup will be the same for every male in the direct paternal line (father, father of father, father of father of father, etc), except when a "non-paternal event" occurs (i.e. there isn't a genetic contribution from the legal and recognised father).
One way to use this information in a genealogy program is to allow one to stipulate the haplogroup (I2, R1b, J1a1, etc.) for an individual, and cascade that change upwards through the male line. Then allow for a way to break this chain, possibly marking the fact with a flag or visual cue.
An example: a researcher tests for Y-DNA haplogroup and gets the result as R1b1. It fills the "Y-DNA Hg" attribute with "R1b1" and it is cascaded upwords (and downwards) through all the relevant lines. Donwards doesn't only apply to direct progeny: all the cousins that descend though a paternal line from a common ancestor should also be marked R1b1.
Six months latter a cousin (which should be R1b1) makes the same test and gets haplogroup J2. The initial assertion is clearly wrong, and going upwards the genealogy software could pinpoint the most recent male ancestor and flag the discrepancy. This isn't limited to this kind of situations: we now have the haplogroup of several individuals deceased a long time ago, so this "most remote ancestor" could be in the 16th century.
The relevance of this method is not limited to finding mismatches: the much more common situation is that different tests from different lines will yeld the same result. It even allows one to have a good idea of a certain person haplogroup without making a test.
Mithocondrial DNA Haplogroups
The same rationale and possibilities as given in Y-DNA Haplogroups apply, with the difference than Mithocondrial DNA (mtDNA) is transmitted trough the female line, and contrary to Y-DNA both males and females are able to test for it (even though it is female inherited, meaning that the mtDNA of a male is completely inherited from the mother).
In recent times, it has become possible to know the cause of death with a high degree of certainty. Also, many hereditary deviations are known, people talking about it openly. Many people will store this information in a genealogical application as eg an attribute (set private of course).
This information of today, can shed light on some strange facts in your family trees history, eg many male early deaths, ...
To be able to extrapolate known facts of today to the past, you need some knowledge:
- is it inherited from the father or mother?
- what is the possibility of inheriting the trait?
- under what conditions does the trait show itself?
Privacy: keep this information private. Although you might not mind letting the world know some of these details, your cousin twice-removed might think otherwise.
Family trees for genetic research
Many research institutions have a need for extensive reliable family trees, and will employ genealogists for this purpose.
It allows them to investigate traits and deviations in a broad testfield, while knowing how related the samples are.