Elhuyar Language Technologies: Multilingual Searches

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

One of the fields we work in the R&D department of linguistic technologies of the Elhuyar Foundation is IR (Information Retrieval), a computer area that is responsible for facilitating the management and search of digital content. In recent years we have been developing two very useful technologies in this regard: Multilingual search engine and Dousare related multilingual documents. We have recently implemented both on the Zientzia.net portal and presented them publicly in July.
elhuyarren-hizkuntza-teknologiak-bilaketa-eleanizt
Thanks to dokusare technology, other content related to the article the user is reading is recommended in the right column.

Two years ago, in this same section of this magazine, we present the evolution of Internet search engines and their new future capabilities. We mentioned then that the Elhuyar Language Technologies R&D department was researching techniques to better navigate and search multilingual content. These technologies are already reality and here we will explain in more detail what they are and what they are useful for.

Elezkari, multilingual search engine

People who have the habit of speaking in Basque online have two main problems when we want to search for content. One, when we want to search for content in Basque: if what we are looking for is a special, technical, or short name, there are many options to say the same in other languages, and results will appear in other languages instead of in Basque. The other, when we search for content about something, if it is possible in Basque but it is not possible in another: the first search will be done in Basque; if we do not find suitable results (unfortunately what can happen often, because the content in Basque is not as abundant as we would like), we will search in another language that suits well, such as Spanish or French, translating the search terms (something that is often not easy); and if we do not go back to English.

To avoid this, we have developed the technology called Elezkari. In it, we will perform a single search in Basque, which will be responsible for translating the words into other languages and searching in the places where you are going to search, to then translate the most significant results in the language in which they are located.

The strong point of the tool is the translation of search terms. It combines dictionaries and linguistic technologies to give a proper translation, and this is not a Baladi topic: ambiguities are resolved to find adequate remuneration, synonyms are used to obtain more results but rejecting unwanted results... The tool is very useful in many cases: web pages with content in several languages, specialized portals that want to allow search on several websites, intranets of companies, etc. And although in the example of use mentioned above the initial language was Basque, it can be any other. In addition, a possible dissemination of the tool can be the translation into the starting language of the results in other languages through machine translation, technology that we also work with. Elezkari lives up to similar existing tools, but it is the only one that takes Basque into account.

Dokusare, multilingual documentary relator

In the online versions of the media, blogs and websites with a great content it is very common that, being in a certain news or article, links to similar contents are presented at the end that allow to deepen the topic. These links are placed by automatic methods, but they are usually self-contained and are in the same language, so they are very simple, based on mere word matching.

Dokusare technology does the same, but it is able to relate the content in multiple languages and look for the closest. Media and websites that have content in more than one language, or those that want to display related content on external pages, can do so.

Both at Zientzia.net

Dokusare and Elezkari were born as research projects and for years we have been researching them and presenting advances in international congresses, but they are technologies that work today. They were first launched on the Zientzia.net website of the Elhuyar Foundation. Zientzia.net has the vocation of being the science portal in Basque, so it does not limit to internal content links to contents as its search engine. It also includes the content of several international science reference websites in both cases: Nature, Science, Physics World, Futurity... Thus, in addition to the content of Zientzia.net, we can access and search the content of these websites, always starting from the Basque language.

Dokusare and Elezkari are excellent examples of what language technologies can bring. These technologies represent a breakthrough for users and the Basque language. Therefore, we hope that in the future these technologies will be seen in more places and that these types of technologies will become everyday.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila