Today, spread across several European countries, 11 million Roma live (or, more directly, Romanesque). If they met in a single country, they would form a state up to Portugal or Belgium. This dispersion has made Roma today acquire enormous religious, linguistic and cultural diversity. However, romancies are characterized by their original language: Romanesque, society is organized into small hermetic groups (often nomadic), dedicated to certain professions such as music, blacksmithing, equestrian treatment or divination.
Where do romancies come from? For centuries the origin of this people has been a mystery. According to the legend of the Middle Ages, romancies were considered originating in Egypt, and hence those derived from the Egyptian word, such as the gypsy in Basque, used in various languages. Romanías have no written or verbal testimonies about the origin of their ancestors. The oldest reliable historical data we have on the presence of this people are the writings of the 14th century. According to these writings, although initially the romancies were only in the Balkans, after a hundred years they had already settled on the shores of the continent (including the Iberian Peninsula, Scandinavia and the British Isles).
Can the past of an undocumented collectivity be clarified? And to know the human migrations that occurred thousands of years before history itself? Genetics is a perfect tool to answer such questions. Thanks to the technological revolution, today it is possible to reach a genome at 100 euros in depth (through the genotyping technique) and 5,000 euros in total (sequencing). The study of human genomes taken worldwide has allowed us to know human migrations since the origin of our species, 200,000 years ago. We have known, among other things, how Homo sapiens created in East Africa and colonized other continents. However, although the genomic knowledge of Europeans is profound, we have exceptions to the romancies. The aim of this research is to take an important step in the construction of the European “genetic landscape”, with the greatest genetic research that has been done in the history of the Roma.
In 2001, the first results of the Human Genome Project were released. For the first time we described the DNA sequence that characterizes our species. DNA is the genetic information we receive from our parents, which keeps the necessary orders to create an organism from a single cell. We call a genome a person's DNA. Although they are parts that fulfill biological function, genes only account for 5% of our genome. However, the genome as a whole serves us to learn the past of our species. Each of us has between 60 and 100 new mutations. As new mutations have also occurred in each previous generation, we have genomes as deposits of mutations of our ancestors. Intuitively, therefore, in the same family, the brothers will share more mutations among themselves than their cousins. The same logic can be applied at the population level: members of the same human group will have an average of mutations equal to each other as with members of another population (i.e., minor genetic distances). Moreover, as we know the frequency with which mutations occur, we can calculate the distribution or association of the ancestors of two human populations and the temporal variations of their measurements. The branch of biology that has established these theoretical bases is usually called population genetics.
This study collected samples of 152 voluntary romancies (blood or txistu) in the Balkas peninsula (Greece, Bulgaria, Serbia and Croatia), Central Europe (Romania, Hungary and Slovakia), Eastern Europe (Ukraine), Baltic countries (Estonia and Lithuania), British Isles and Wales. After the extraction of DNA from the cells of these samples, the laboratory analyzed the genomes of the participants, by genotyping a million polymorphisms (mutations that may be different between two human beings). To contextualize the genetic diversity of the Romani in other human beings, along with the Roma, the human genomes of the five continents (about 4,500 people publicly available) were analyzed.
With this data a genetic map was built. As a traditional map shows geographical distances, in genetic maps (Multidimentsional Scaling Plot) each point represents a human being and the distances between points represent genetic distances. Most Roma in different European areas are on the map, suggesting they have the same origin (see figure). In addition, they are located to the right of the European, Caucasian and Middle Eastern populations, showing that the origin of the Roma is located further to the east (in Pakistan or in India). Few individuals (about 25% of the population), however, are on the map closer to Europeans than to other Romanesques. This closeness indicates that the ancestors of some Romanesques have recently been confused with other Europeans, demonstrating that the genetic isolation between both communities has not been total.
Then, in the attempt to geographically situate the origin of romancies in the Indian subcontinent, they were compared with the genomes of individuals from 19 populations in India and Pakistan. Specifically, the genetic model of coalescence and the statistical method Approximate Bayesian Computation (ABC) were used to calculate the probability of each alleged parental population. According to the data, the Romani come from Indo-European populations residing in the current border states of India-Pakistan (Punpro and Rajasta), with strong statistical support of 94%. The ABC analysis showed that the Roma suffered a strong founding effect, that is, that the current Roma come from a small number of ancestors of Indian origin (lost 47% of the genetic diversity of the parenteral population). The statistical technique serves, in turn, to date the moment in which the separation between the populations of the Romanesque area and the Pundom occurred. The genetic distances found between Romanism and Indian ancestors adapt well to what one would expect from a separation 1,500 years ago.
This discovery coincides with the XIX. In the nineteenth century it was described by linguists with a close relationship between the Romanesque and the languages of northwest India. However, the stories of languages and genes do not always coincide. The use of a language can take place in two ways: the learning of the new language by former speakers or the speaker's own mobility. In the first case it would be a linguistic substitution but a genetic continuity, while in the second case it would be a linguistic and genetic substitution. The genetic and linguistic data of romancies coincide with those of northwestern India. On the contrary, the profound influence of Middle Eastern languages (blinds and Armenians, among others) on the Romanesque language is not reflected in the Romanesque genomes. Everything seems to indicate, therefore, that the relations with the populations found on the road to Europe were only cultural.
With the same statistical technique and genetic model, genetic differences between 13 Romanesque groups were analyzed below. The Balkan groups show the greatest genetic diversity, suggesting that the Romani entered Europe through the Balkans. On the contrary, apart from the Balkans, genetic diversity is very small and shows a singular structure: diversity is less in central Europe, less as it goes west and north. This structure gives us an idea of the journey of the romancies through the continent. Although most of the groups settled in the Balcan area, a few headed north, from where they dispersed through Central Europe. Finally, genetic diversities can be concluded that a few Central European groups headed west (conquering the Iberian peninsula) and later north (Baltic countries). As a result of this migration, Portuguese and Lithuanian Roma, currently far from their geographical location, are genetically more similar (as they come from the same Central European route) than Bulgarian Roma. According to the data, the dispersion began about 900 years ago and from its origin, probably from the current Bulgaria to the destinations, a loss in the genomes of 30% of the initial diversity is detected. However, the scarce diversity shown by some Romanesque groups cannot be attributed solely to dispersion. The Nazis and their allies killed a million Romances in a holocaust known as Porrajmos. The case of Croatia was one of the hardest, as almost all of its Romanesque population (~ 95%) was destroyed in the 1940s. This coincides with the scarce genetic diversity shown by the Croatian sampling romancies, suggesting that they come from a smaller parental population than the romancies of other countries.
Although the main events of the Romanesque past have been revealed through genetics, there are other shocking questions in the air. It is not known whether romancies come from both the Indo-European linguistic family and the social branches and castes. On the other hand, although the genetic legacy inherited from the Indian and European payos by the European Roma has been determined, it remains to be investigated the genetic contribution of them to other Europeans. As technologies are shrinking to resolve the DNA sequence, the genomes of more and more individuals will be accessible. The genetic response to these and other questions is probably a matter of time.
My thanks to María José Ezeizabarrena and Urko M. For the advice offered to Marigorta for the realization of this work, as well as to the Romanists and collaborators who have participated in the original research, David Comas (Universitat Pompeu Fabra, Barcelona), Óscar Laori and Manfred Kayserri (Erasmus Medical Center Rotterdam, Netherlands). The thesis has been financed through the training program of researchers of the Department of Education, Universities and Research of the Basque Government.