Human genome. Where are we?

Roa Zubia, Guillermo

Elhuyar Zientzia

The main contribution of science in 2000 has been the sequencing of the human genome. However, the closure of this work has already been announced on three occasions: April 2000, June and February this year. For those who know how to read the small letter it is clear that genome sequencing is not over. So why does it often appear in the media? What is it for? Where are we with this research?

On 13 February, the most prestigious scientific journals in the world, Nature and Science, published a special issue to present two drafts of the human genome. The organization of the International Human Genome Project and the private company Celera Genomics were presented, respectively. On the eve of the event, presentations were held in five cities around the world to make the sequencing known to the press.

The trajectory of this gigantic work has been mentioned on numerous occasions. In 1990 the public project was launched with a strategy according to existing technology. They expected work to end in about fifteen years. As new technologies developed, the need for faster methodologies increased. Thus, the reading time of the entire sequence was reduced until 2003 at most.

In 1998, a public project worker, inventing an even faster methodology, created with the same objective the private company Celera Genomics. After testing the new method with the genome of other organisms (after sequencing the Drosophila melanogaster fly genome), the sequencing of the human genome begins. In June 2000 both entities announced their collaboration.

General map

Each entity has submitted its draft. They are therefore two sketches of the same genome. Two numbers have been published for the number of genes announced and the size of the genome. This result is the result of using two different approaches. In general, similar qualitative results have been obtained, but we must bear in mind that both techniques are not comparable.

The strategy selected by the public project is based on a previously elaborated map. Once the first map is completed, look for the sequence. This technique, although slow, has obtained good results. It is a clone methodology.

Scientific American Magazine July 2000.

Many copies of the genome are divided by restriction enzymes. These enzymes cut DNA in specific places. To prevent the formation of excessively small fragments of genome the reaction is cut. The result of this first step is to divide some 150,000 base pairs into parts. These parts are integrated into the artificial chromosomes of bacteria (Bacterial Artificial Chromosomes, BAC). Thus, when the bacteria reproduce, numerous copies of this fragment of DNA, the clones, are generated.

These clones are treated with restriction endonukleas to obtain small fragments. Clarifying what is repeated in these fragments, the "physical map" of the initial genome is formed. From there all BACs are fragmented and each part is sequenced. The map lets you know the genome sequence.

The methodology used by Celera Genomics does not include previous maps. From the initial DNA molecule small clones are prepared to start the sequence analysis. This path is much faster, but when most of the work is already done it is much more difficult to fill the missing gaps, as it is not assured that all the initial parts have been selected to analyze the sequence.

Genome structure

Large fragments of DNA that do not encode proteins are seen in the sketches. In fact, the genetic legacy of parasites is enormous. All those who are not genes have been called “junk DNA,” but it must be recognized that those long strands of DNA can have some function that is not known.

On the other hand, the press has given much importance to lower the number of genes expected. According to the public project there are about 31,000 genes and those of Celera about 39,000. But before giving for good any number you have to look at the way of counting.

Both organizations have used computer programs that look for genes. These programs have turned the sequences of genes already identified into bases. However, when this methodology has already been applied, an experimental error has been detected so that the result of the computer count has been added a correction factor. Thus, for example, public projects have "detected" about 24,500 genes and have recognized that there are 6,800 others that have not been found. In total there would be approximately 31,000 genes. Following similar calculations, the company Celera Genomics has published about 39,000.

DNA organization within the cell.
PNS

These numbers are not provisional. According to German scientists Peer Bork and Richard Copley, written in the journal Nature, these figures can vary greatly. In addition, the number of genes is not the only characteristic of a species. Vertebrates have not had to develop specific genes to become vertebrates. The function of each gene and replication complexities also have to do with nature's ability to generate biodiversity. The number of genes encoding the mouse genome with ours does not have to be representative.

Old ideas and new doubts

In general, a gene encodes a protein. This has been approved to date. But gradually biochemists are also investigating other alternatives. Human genes are not continuous. In the DNA molecule the parts that will encode the protein are cut and continued elsewhere. Interinterval sequences are called introns. The function of Introies is not yet understood. However, they are also transcribed, so the messenger RNA must "endure" before leaving the cytoplasm.

The more introspects a gene has, the more likely it is to create different RNA messengers. We know little about it, but it has been shown to be related to the complexity and diversity of proteins. The human genome has a high frequency of introi, higher than any other genome we know. This means that the diversity of RNA messengers is also very large.

Probably, introies also intervene in the regulation and activation of genes. This is confirmed by the study of the interaction between genes away from the DNA chain and therefore of the three-dimensional position and organization of nucleic acids within the nucleus. A curious research has recently been published related to the structure of the molecular motor that viruses use to introduce DNA into protein storage. The study of the functioning of this molecule could clarify the topology of DNA accumulation. Many genome research lines are open.

P53, a protein that binds to DNA.

It is clear that for the future the sequence must first be defined. Attendees also claim that the technology used is limited. Among other things, the analysis of the heterochromic component of the genome has been denied from the beginning, since in the solution used this component is not stable. Geneticists, however, have meant a part with few genes, but that is also about to be seen. However, there is the possibility to start working with the draft for scientists and, even if only to satisfy one's own curiosity, also to look at the draft published by the public project on the web http://genome.cse.ucsc.edu.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila