Ber2Tek: Another step in technologies for the Basque language

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

Little by little, computers are dominating the languages of the human being and, unconsciously, we communicate more and more with them using natural language and machines help us more and more with linguistic problems (translating, correcting…). But, also in Basque? Fortunately, thanks to research projects like Ber2Tek, digital devices also do so in Basque.
ber2tek-beste-aurrerapauso-bat-euskararentzako-tek
A demo 3D avatar will teach us Basque. Ed. Elhuyar R&D

Linguistic and speech technologies are what enable machines to understand, translate, or create a natural language. Linguistic technologies refer to the ability to interact with texts (correct, understand, translate, manage…) and speech technologies to the ability to handle speech (understand, create…). As is logical, the most advanced technologies are found in languages with greater diffusion and, therefore, with more resources (English, Spanish, Chinese…). However, in the Basque language and in other languages, the situation is not so good. However, the Basques cannot complain: taking into account our minorized situation and the small number of speakers, the Basque language does not work so badly, at least proportionally.

Ber2Tek Project

In fact, many agents of Euskal Herria have been researching language and speech technologies for the Basque language. For example, we have been collaborating for many years on a strategic project the Elhuyar Foundation, the research groups IXA and Aholkularitza of the University of the Basque Country and the technology centers Vicomtech-IK4 and Tecnalia. This collaboration was previously materialized in three projects funded by the Basque Government through the Etortek program: Programs Hizking XXI (2002-2004), AnHitz (2006-2008) and BerbaTek (2009-2011). The final result of this collaboration is the Ber2Tek project, developed between 2012 and 2014, coordinated by Elhuyar R&D.

Throughout these years we have worked hard on the research of the aforementioned technologies, continuing with the improvement of some of the technologies already developed and generating many new ones. Many general resources (corpus, ontologies, dictionaries…) have been created or improved; techniques of automatic creation of these resources have been worked; the tools of analysis of the Basque language have been improved (morphological, syntactic and semantic labellers, correctors, knowledgeable entities...); progress has been made in machine translation; technologies of content management have been developed; teaching technologies have been developed; progress has been made in the creation and knowledge

But Ber2Tek and his predecessors not only seek research: we want to make these technologies known and turn research results into applications and make them available to the public. As a colophon to the project, we build a series of demos or demonstrators that show the contribution of these technologies to a given field. In this case, we wanted to show what these technologies can contribute to the sector of the Language Industry, that is, to the sector formed by the areas of translation, content and teaching. Demos are available in http://www.ber2tek.eus/es/demoak.

Demos of practical applications

Research Project Working Group Ber2Tek. Ed. Danel Solabarrieta/Elhuyar

As a sample of what can be done in the content sector, we have mounted a demo that shows us what is the technology of extraction of opinions or analysis of feelings. The extraction of opinions consists of automatically extracting, from a text, if you have a subjective opinion and, if you have it, what is its polarity (positive or negative). This technology can have multiple applications, for example, so that companies can easily know what is said on the network about them or their products (in many places and in different languages). In the demography that we have carried out we have taken the newspaper library of Critiques of the web Armiarma.eus, which collects more than 5.000 literary criticisms in Basque from various media and publications, and each of them has been automatically assigned a score applying the technology of extraction of opinions in Basque developed in Ber2Tek. By making a selection of authors, works, years or other parameters in the web of the demo, you can visualize the scores in a graphic way, see the review itself and analyze the positive and negative words. In fact, in those words, technology is based on assigning scores.

Another demo shows what can be done in the field of translation through a multimedia search engine. Several videos have been received in both Spanish and Basque and have been automatically transcribed with voice knowledge. When you get the text of these videos, you can search them and, if desired, jump instantly in which the word is indicated. Transcripts of the videos are automatically translated into Spanish, Basque or English, in which we can show subtitles. Once translated, we also generate audio in those other languages, using the technology of the synthesis of speech, while in the case of presentations of certain speakers, the voice of the speaker produced in the other language is produced imitating the technology of the transformation of the voice.

Finally, we have carried out the demo of a personal language teaching tutor for this sector. Three years ago, at the end of the BerbaTek project, we did something similar, but this time it has more intentions and possibilities; moreover, it was a desktop application, and this time it is online and can be tested by anyone. The demo tutor is a 3D avatar with which we communicate in Basque, orally. The tutor guides us in verbal, declining or comprehension exercises created automatically; he evaluates our pronunciation; we can ask him about the decline of certain verbs and how a certain number is written; we can tell him to look for a word in the dictionary; he will show us results of several dictionaries…

These demos that we have mounted, as their own name indicates, are only demos, but serve to have an approximate idea of the current state of technologies and what they can do, and we hope we will soon see it applied to real tools, as previously created real applications from other technologies.

As we will see through these demos, it is true that language and voice technologies for the Basque language are quite advanced. However, there is still a long way to go if we want to get to the situation of other languages and if we really want to do it in Basque with electronic devices in all areas of everyday life. At least the organizations that have carried out the Ber2Tek project have not finished our work at the end of the project and we continue to work to carry it out sometime.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila