…and working on language technologies 20!

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

The Elhuyar Foundation is celebrating its 50th anniversary this year, but it is also 20 years since we started research, development and marketing of language technologies in Elhuyar. 20 years, with the sole intention and aim of developing and making available to society language and speech technologies, as necessary as any other language. The result of this activity are, among others, our corpus, dictionaries, correctors, automatic translators (Lia.eus), automatic transcripts (Jakin.eus) or speech synthesizers that have become essential for many and well known.

eta-hizkuntza-teknologiak-lantzen-20
Ed. Elhuyar

The year 2002 was an important milestone for Elhuyar. [The transition from the cultural association that gave the same year that it turned 30 to the foundation]. But, in addition, aware of its future importance and anticipating the future, it started working in the field of language technologies. And it looks like one wasn't enough, and he did it on two fronts.

Eleka and Elhuyar R & D

On the one hand, the creation of Eleka. The IXA Group of the UPV/EHU had been conducting basic research on profile technologies for the Basque language for several years, and had already carried out the labelling (then Euslem, then Eustagger), the corrector (Xuxen) and the automatic translator (Matxin). However, the work required for their availability or marketing (adaptations to different platforms, new versions, etc.). were out of the usual tasks of a university research team, so it was undesirable and performed correctly. Elhuyar was willing to help fill this gap and Eleka was set up jointly. Since then, not only have these early ones continued to market many other tools based on language and speech technologies to the present day. And over time, Elhuyar's own research has increasingly focused on socialization, in close and fruitful collaboration.

In fact, within Elhuyar, a new department was created, Elhuyar R & D, whose objective was the research and development of the linguistic technologies that other departments in Elhuyar needed. Initially, he oriented his activity towards the elaboration of tools especially for the elaboration of dictionaries, an important department for Elhuyar: In 1996 it was clear the Basque Spanish/Spanish-Euskara Elhuyar Hiztegia was a reference, it was intended to extend the production of dictionaries to more languages, to make more terminological dictionaries… Thus, Elhuyar I+D started work on collection of textual corpus (for example, corpus ZT), developed techniques and tools for automatic collection of corpus of different types. As you can see, these well-known tools from other departments in Elhuyar were an excellent showcase and reflection of what Elhuyar R&D was doing. But activity hasn't confined itself to it, and it has diversified a lot over the last 20 years.

As in all research groups, research has been conducted following international trends in Elhuyar R&D, based on the latest academic research and contributing to the field. Thus, it has published more than 100 scientific articles in congresses or specialized journals and 7 doctoral theses have been performed in the Department. Elhuyar R & D has maintained close and continuous cooperation with the IXA Group. And for over 15 years he has collaborated with the IXA Taldea Group, the Aholab Research Group of the UPV/EHU and the research groups Tecnalia and Vicomtech in several collaborative projects in strategic research, in many cases as leaders.

Current time of deep neural networks

During the 20 years, there has been a great evolution in the techniques used in profile technologies. When we started this, they dominated the techniques that were called rules. In these cases, language knowledge (words, rules of decline…) was placed in languages and understandable structures for computers. With these methods, some things were done very well (labelling, spelling correction, search, etc. ), but not as well as machine translation or knowledge of speech. Later on, there were machine learning or statistical methods that learned from the examples, but neither did they obtain sufficient results in some tasks, at least in the Basque language.

Orai is developing a smart speaker in Basque Mycroft. Ed. Elhuyar

Five or six years ago, methods known as deep neural networks or deep learning appeared on the scene. In fact, they're a particular case of machine learning, and also neural networks have been around for a long time, but when they didn't perform well, they were discarded. The advancement of technology allowed the use of bigger and more complex neural networks (hence the name “deep”), GPU or graphics cards greatly accelerated their driving or training times, and the great advance of digitization and the Internet provided much more data for their training and, in sum, many complex tasks, such as machine translation or automatic transcription, the result of which was much more recent than before. And after the tests we saw that we did the same with the Basque. Almost overnight, in many studies the results are acceptable. Finally!

Since then we have successfully developed various technologies for the Basque language, which we are making available to society: The automatic translator Lia.eus, the automatic transcript Jakin.eus, the personalized spoken synthesis, the neural models of BERT languages, the bots chats, the smart speaker in Basque Mycroft.eus… all of them have had a notable influence on Basque society and the digital situation of the Basque language, and probably more intense in the future.

The future is Orai

With the explosion caused by deep neural networks, Elhuyar's activity in language and speech technologies and the team, which bring together people engaged in research and the development of tools and services, has experienced significant growth.

And this year, 2022, so significant to Elhuyar, we've made another significant leap. Elhuyar’s R&D department now has new names and brands, Orai NLP technologies. But it's more than just a change of name, it's also changed being. Without abandoning the work of creating the necessary linguistic resources and tools for the Basque language, it is intended to increase the effort in the application of the potential of artificial intelligence and profile technologies to Basque companies, by conducting a tailored research and contributing to make them more competitive and overcome linguistic barriers. In addition, it is intended to deepen the path already undertaken to be a demonstration and accompanying vehicle for the other minority languages, and, as we have developed automatic translators and speech synthesizers for the Occitan and Aragonese members, we want to continue developing more tools for them or for other minority languages. As the slogan of the anniversary of Elhuyar says, 50 years later in Orai, we look to the Basque society, the Basque language and other small languages to have a bright future.

Elhuyar has always been able, attentive to the evolution of society, to win new spaces for the Basque language and to respond to the new needs of the Basque language. Twenty years ago he demonstrated a strong instinct for the future in taking a firm and determined step in favour of language technologies and, by faith or generosity, he has maintained his commitment over these years (despite being years a journey of real desert: crises, promises of good results that were not fulfilled in machine translation and other tasks…). We are receiving the fruits, and we ourselves are congratulating, because in Elhuyar we are doing many interesting and surprising things, but also the Basque, who is acquiring tools so that I am not left behind in the digital world today and in the future, and if I may be bold, and this writes those lines, because from almost the beginning I have had the privilege and the honour of working in the R&D group of linguistic technologies. Therefore, congratulations and thank you, Elhuyar! And congratulations also to Ori and the group of profiling technologies, and for many years!

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila