In April 2003, scientists successfully completed the historic human genome project when they established the sequence of human genes (together known as the genome) for the very first time. Since then, researchers have been working with this blueprint for building a human being. However, reading the newly found schematic — insightfully comprehending it and fully understanding all that it implies — is another matter entirely and requires practice, experience, and comparative study. In other words, interpreting the genome is an art and not (entirely) a science. 23andMe, the personal genetics company, has advanced this new art by introducing an original algorithm called the HaploScore, which improves detection of identity-by-descent (IBD) or the segments of DNA inherited from a common ancestor.

“Improved IBD detection and DNA phasing will allow all researchers to more accurately identify genetic relationships between distantly-related individuals and allow for improved ancestry reports within 23andMe,” said Dr. Cory McLean, 23andMe computational biologist and study author. His work, accomplished with the help of colleagues, appears in Molecular Biology and Evolution.

The Art of Interpreting DNA

As time passes, scientists have become increasingly adept at interpreting the human genome. At this point, they handle it as if it were software code; they break the encrypted information down into segments and apply special techniques and algorithms to better understand the meaning behind these stretches of genetic information. In particular, detecting and interpreting lengths of IBD segments is fundamental to many of the new applications geneticists commonly explore; IBD segments have been used to identify genes linked to different diseases and traits, and also to estimate the heritability of these diseases and traits.

Yet IBD detection has been faulty at best, so 23andMe decided to explore existing algorithms in order to improve upon them and make it easier for people to understand their family bonds.

To begin their study of IBD detection, 23andMe researchers first determined the accuracy of existing IBD algorithms by extracting from the company dataset 25,432 genotyped European individuals containing 2,952 father-mother-child trios. Then, the team used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Next, they identified the rate for detection and how often inaccuracies occurred.

What did they discover? False positive appeared in more than 67 percent of all cases.

To address this issue, the team wrote an entirely new algorithm, the HaploScore, which more accurately ranks the likelihood that a stretch of DNA is inherited between two individuals.

As the number of people who request genetic testing increases, 23andMe believes the use of IBD segments in genetic analyses will become more common. It follows, then, that the company's new HaploScore may be used more and more frequently. Just as today, for example, you ask for and discuss certain medical tests by name — most people understand their blood pressure rate and cholesterol scores — similarly geneticists imagine people one day ask for and understand specific DNA tests by name. “What did my HaploScore say?” may someday be as common a question as “How did my EKG turn out?”

 

Source: Durand EY, Eriksson N, McLean CY. Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis. Molecular Biology and Evolution. 2014.