Undoubtedly, the subject of recent years in the world of science is genetics. The advance of the discovery of the genome has made the influence of genetics extensible in all areas. The research that shows this is what we have in our hands, computer science and genetics.
Pedro Larrañaga and Iñaki Inza, from the Computer Science Faculty of San Sebastian, have presented us with research in bioinformatics. Currently, gene computer chips allow obtaining the numeric value of each gene. Applying the chip indicated to a group of patients, the database obtained can be viewed as a color map. Thus, the intensity and color of each of the positions on this map encodes a patient's gene information. Through these databases and the application of statistical techniques, it is intended to detect which genes can cause a specific disease.
Contrary to what has been done so far, all the genes found in the databases have been analyzed together. To date, all the genes in the database were studied, but without taking into account the global perspective, that is, without taking into account the interactions between the genes. Now, thanks to advances in computer science, it is possible to analyze a number of genes in their entirety.
Pedro and Iñaki have worked with databases of between 2,000 and 7,000 genes of about 65 people. The study of all combinations would be a huge work, so methods of simplification, heuristics are used. The application of heuristics is not only inevitable, since it has been proven that the results obtained in recent years through gene discrimination are better. Thus, it has been shown that the probability models of prediction of a disease present a higher rate of invention. The key is the correct approach of these heuristic models, since we must foresee which of the thousands of genes must be discarded.
The work has focused on three cases: the separation of two types of leukemia, the detection of colon cancer or not and the classification of nine types of cancer. Four heuristic models have been used to analyze the three databases and the results obtained coincide with those of other research.
The main conclusion is that the causes of cancer, leukemia and any disease in general are very rare: between 2,000 and 7,000 genes, for example. In addition, the invention rate of heuristic models obtained ranges around 90%. This invention rate is validated, that is, the application of these heuristic models to any database would entail approximately this invention rate.
It is always necessary to keep in mind that in the databases there is still not all the information, since there are no numerical values of all the genes or samples of all the patients. We are therefore faced with a long developed computer method.