The magic limits of machine translation

Cortés Etxabe, Itziar

Elhuyarreko itzulpengintza-teknologien arduraduna

itzulpen-automatikoaren-magia-eta-mugak
Ed. Stockmonkeys.com/CC-BY

By definition, machine translation is a computer system that translates from one language to another, without human intervention during the translation process. This translation, which occurs automatically, can be used as a means of comprehension or translation of texts, which may be due to frequent confusion between machine translation systems and dictionaries. But both resources have differences. For example, the results obtained when searching in dictionaries have been manually produced by professionals. On the other hand, in the machine translation systems, although the data used as a base are elaborated by professionals, the answer always generates the machine itself, which is generated automatically. However, the main difference from the point of view of common users is that machine translation systems can translate full sentences or words.

Among the best-known machine translation systems is Google Translator. This system allows to automatically translate up to 70 languages, including the Basque language since 2010. But why do curious translations come out in such systems?

When we travel online, it is quite common to find the information we seek in another language and use an automatic translation system to understand this information. For this reason, the user must know the advantages and drawbacks of the use of machine translation. A system of this type does not always provide an accurate translation and is available to the user what to do with that translation obtained automatically. The results obtained automatically have been seen in advertising and other posters.

Some well-known examples are: "Buses, morning and afternoon", in Basque language "Buses morning and afternoon"; "Combination of trains", "Subcommission of trains"; and "First floor", "First floor".

In the same month of January, Google mixed the machine translation systems of Guardiola and Iniesta with their countries of origin, well known in the world of football, in the translation of Catalan to English. But, why do these types of translations automatically arise?

Machine translation systems can be classified into two groups according to the techniques used for their creation: systems based on statistics and rule-based systems.

The above mentioned Google Translate is one of the machine translation systems based on the most well-known statistics. This type of systems are based on statistical models that are generated from the use of text collections. For example, if we want to create a system between Basque and Spanish, we should rely on a collection of texts of the type: each sentence in Basque should be translated into Spanish. From these collections of texts, statistical models are created which will be the core of the machine translation system.

In the Elhuyar unit of Language and Technology we also have this type of systems, and this year we are implementing a system of this type in the MINHAP (Ministry of Finance and Public Administrations of Spain). The web pages will use this system to translate from Spanish to Basque and English, and will review the results obtained automatically through a manual review.

On the contrary, rule-based systems have a linguistic basis, that is, they are based on dictionaries and resources such as language related rules. Although they usually offer more reasonable results than statistics, this type of systems also presents difficulties and rare results. Polysemic words, for example, give a lot of work when it comes to returning. Take, for example, the word "time" in Spanish, with twelve meanings in Basque (hiztegiak.elhuyar.org/es/time): time, time...; we must teach the machine translation system to choose which of the meanings is most appropriate.

Matxin is an example of a system that automatically translates from Spanish to Basque from the rules The Matxin system (http://matxin.elhuyar.org) has been developed jointly by Elhuyar Hizkuntza eta Teknologia and the Ixa group of the UPV/EHU and, in addition to translating the mere web pages, serves to translate different documents and web pages. However, as mentioned above, when we use machine translation, we must not forget that the results generated automatically must be reviewed.

To see the difference between statistical and linguistic base systems, here are a couple of examples. For example, Google Traslat translates the phrase "The man who came was my uncle" as follows: "The man came my uncle" and Matxin 2.0: "The man who came was my uncle." On the other hand, Google Translat returns the phrase "They have said that tomorrow will do good time" and Matxin 2.0, "They have said that tomorrow will do good time".

When we talk about machine translation, there are those who look at us with suspicion. However, understanding the topic can change the approach of using this resource. By saying that the translation is done automatically, we do not mean that the result is a direct translation, that is, a professional translation. What we want to convey is that it will offer an automatically generated result, which transforms a text from a language chosen by us to a different language. If an opinion is requested on machine translation systems, we would logically get answers based on the user who is using the system. Therefore, the conscious use of this type of resources is essential.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila