The ENCODE project is the largest genetic breakthrough since the sequencing of the human genome. But what is it and what does it mean? We at Medical Daily sifted through the information in an attempt to share the most important parts of the Encyclopedia of DNA Project.

1. ENCODE cost $196 million alone.


The National Human Genome Research Institute has tracked the cost associated with DNA sequencing performed by sequencing centers funded by the institute. Both the cost per raw megabase of DNA sequence and cost per genome has decreased in the past 11 years. In the "Cost per Megabase of DNA Sequence" graph (above), the data reflect the cost of generating raw, unassembled sequence data. 

The "Cost per Genome" graph (below) was produced using the same underlying data used to generate the "Cost per Megabase of DNA Sequence" graph (it reflects an estimate of the cost of sequencing a human-sized genome rather than the actual costs for specific genome-sequencing projects). With the cost of production dramatically decreasing and productivity of genome sequencing increasing, scientists can now focus their analysis on understanding the date produced in genome-wide studies. (Click the graphs to enlarge.)


2. DNA keeps you alive. DNA sequences may be responsible for gene regulation, disease onset and an individual's height. These regulator genes can serve as switches turning genes on and off. This can determine whether someone is vulnerable to high blood pressure or even cardiovascular disease, the number one killer in America. 

3. All genes are not the same. The ENCODE project revealed the purpose of 80 percent of the human genome, most of which was thought to be junk. In fact, the idea of "junk DNA" could not be further from the truth. Researchers uncovered various types and purposes for genes: pseudogenes, fossil genes and dead genes. While around 1.5 percent of our DNA makes proteins, as much as 18 percent regulates the protein-makers, acting as switches. In fact, genes can receive instruction from as many as a dozen of these so-called "switches." The switches act like a remote, turning genes on and off, turning the gene up or down, and controlling which gene sequence becomes a lock of hair or a kidney cell.

4. DNA is in 3-D. It had been thought that DNA's double helixes were linear, with genes all operating on the same line. The ENCODE project found that DNA is actually more 3-D than they thought. DNA wraps around proteins named histrones, like beads on a string do. They are then twisted, folded, and looped around intricately in multiple dimensions. Interestingly, that means that, when a person writes out a sequence, genomes that look like they are far away from another may actually be right next to each other. That also means that some of the switches regulate activity of genes that are far away from them.

5. Platelets Dr. John Stamatoyannopoulos, lead author of a study examining the connection between gene regulation and disease, and his coauthors compared gene regulation data from ENCODE and other studies. Their aim was to observe common variants located in regulatory regions of the genome. Researchers discovered one variant had been linked to genome-wide studies with platelet count was a part of the regulatory DNA, which assists to control distant genes involved in platelet production. 

6. We know that 80 percent is functional, but we've only seen 10 percent of it. Researchers have found 4 million switches. That sounds like a big number - and it is. But, though we now know the purpose of 80 percent of the human genome, we have only examined 10 percent of it. In fact, 1.5 percent of the genome manufactures proteins; those are the portions of the genome that we have already seen. ENCODE looked at portions of the genome where proteins were stuck to DNA, indicating regulatory activity. Those 4 million switches that we have examined only account for 8.5 percent of the genome. Since they believe that they have seen half of the genes that create or regulate proteins, the team from ENCODE estimates that the total percentage of manufacturers or regulators is 20 percent, doubling the 10 percent number. Aside from the switches and manufacturers, ENCODE also looked at how DNA is packaged and transcribed, bumping that number all the way from 20 percent of known, functional activity to 80 percent.

7. What's in the remaining 80 percent? According to researchers it is not junk either. Ewan Birney, the project's Lead Analysis Coordinator and self-described "cat-herder-in-chief," explains that ENCODE only observed 147 types of cells, and the human body has a few thousand. A particular part of the genome may control a gene in one cell, but not others. If every cell is included, functions may emerge for the phantom proportion. "It's likely that 80 percent will go to 100 percent," Birney said. "We don't really have any large chunks of redundant DNA. This metaphor of junk isn't that useful." The next phase for researchers is to understand how these genes may interact with one another. They hope it will provide treatment for cancers and answers to questions that were once unknown.

8. We know what this could mean. The ENCODE project will be very important for uncovering the future cures to troubling disease - everything from autism to Crohn's disease to cancer. Only a small portion of diseases are caused by genetic errors or issues with transcription, but researchers have found that many, many diseases come from regulatory activity. They found, for example, that rheumatoid arthritis, type 1 diabetes, lupus, and other autoimmune diseases all share the same regulatory activity. Though cures to these diseases, and others, may be a long way away, understanding the genetic underpinnings may lead to more satisfying answers.