A large survey of human genetic variation, published today in the online version of the journal Science, shows that rare genetic variants are not so rare after all and offers insights into human diseases.
"I knew there would be rare variation but had no idea there would be so much of it," said the senior author of the research, John Novembre, an assistant professor of ecology and evolutionary biology and of bioinformatics at UCLA.
A team of life scientists studied 202 genes in 14,002 people. The human genome contains some 3 billion base pairs; the scientists studied 864,000 of these pairs. While this is only a small part of the genome, the sample size of 14,002 people is one of the largest ever in a sequencing study in humans.
"Our results suggest there are many, many places in the genome where one individual, or a few individuals, have something different," Novembre said. "Overall, it is surprisingly common that there is a rare variant in the population.
"This study doesn't tell us how to cure a particular disease but suggests that disease in general may be caused by rare variants, and if you're trying to find the genetic basis of disease, it's important to focus on those variants. Understanding the genetic basis of disease provides clues to how the diseases work and clues about how to treat them."
The scientists discovered one genetic variant every 17 bases, which was a dramatically higher rate than they expected, said Novembre, a population geneticist who is a member of UCLA's interdepartmental program in bioinformatics.
Most of the time, only one person has the genetic variant and the other 14,001 do not.
"We saw lots of that," he said. "We discovered there are many places in these 202 genes where there is variation and only a few individuals differ from the whole group, or only one differs. We also see evidence that a substantial fraction of these rare genetic variants appear to be deleterious in a long-term evolutionary sense and might impact disease."
The research team included Daniel Wegmann, a former UCLA postdoctoral scholar in Novembre's laboratory and a co-first author of the study; Darren Kessner, a UCLA graduate student in the bioinformatics interdepartmental Ph.D. program; colleagues from the University of Michigan, Ann Arbor (in fields including human genetics and biostatistics); and geneticists from international health care company GlaxoSmithKline, including project leader Matthew Nelson. The UCLA life scientists were involved in the population genetic analysis of the data.
In the study, 10,621 people had one of 12 diseases, including coronary artery disease, multiple sclerosis, bipolar disorder, schizophrenia, osteoarthritis and Alzheimer's disease; 3,381 did not have any of the diseases.
"The large sample size allows us to see patterns with more clarity than ever before," Novembre said. "If rare variants are like distant stars, this kind of large sample size is like having the Hubble Telescope; it's allowing us to see more than before. We see a ton of rare variation, and these rare variants more often make changes to proteins than not. In that way, this study has important implications for the genetic basis of disease in humans. It's consistent with the idea that many diseases may be partly caused by rare variants."
Human population growth helps to explain the large number of genetic variants, the scientists said.
"The fact that we see so many rare variants is in part due to the fact that human populations have been growing very rapidly," Novembre said. "Because the human population has grown so much, the opportunity for mutations to occur has also grown. Some of the variants we are seeing are very young, dating to population growth since the invention of agriculture and even the Industrial Revolution; this growth has created many opportunities for mutation in the genome because there are so many transmissions of chromosomes from parent to child in large populations."
The scientists isolated and sequenced the pieces of DNA from the 202 genes.
They estimated mutation rates from population genetic data, which has only rarely been done before.
"We have been able to estimate mutation rates for each of the genes, which has been difficult to do with smaller sample sizes," Novembre said. "In future research, we can study mutation rates not just in these 202 genes, but genome-wide."
Sequencing technologies are advancing rapidly, he said. "What seemed like science fiction in the past is science today."
Rare genetic variants would not have been detectable in most previous studies, whose samples usually had fewer than 1,000 people.
Typically, in population genetics, it is difficult to estimate mutation rates separately from population sizes, but when you get to very large sample sizes, you can estimate the two separately, Novembre said.
"We estimate 202 mutation rates, one for each gene," he said. "We show that the mutation rate varies from gene to gene. Follow-up studies may be able to reveal more about what factors affect mutation rates."
Rare genetic variants are frequently geographically localized to small pockets around the globe rather than being widespread, Novembre said.
In the image accompanying this release, each vertical line represents one of the 202 genes. For each gene, the scientists plotted, at the top of the image, the number of genetic variants that have a frequency greater than 0.5 percent. When variants are greater than 0.5 percent, previous studies have been able to find most of them.
"With our large sample size, we can detect variants at a frequency less than 0.5 percent, and we see all of these, which have never been seen before," Novembre said. "Previous studies have examined the tip of the iceberg of genetic variation, but there is all this rare variation that has been below the surface, below our threshold of detection. Now, with large sample sizes, we can see a more complete picture of human genetic diversity."
The genetic code has changes that are "nonsynonymous" (they change the meaning of a protein) and "synonymous" (they don't change the meaning of a protein).
"We see many nonsynonymous changes amongst the rare variants, and these are plausibly affecting disease in humans, though in ways that are not yet well understood," Novembre said.
Novembre's research was funded by the Searle Foundation. The central area of interest of his laboratory is the development of theory and statistical methods for analyzing genomic-scale population genetic data.