Linguistic engineering Hizking21 XXI. at the door of the century

Saiz Elizondo, Rafa

Itsas Enara Ornitologia Elkartearen lehendakaria

Beyond the text process, linguistic engineering addresses what computers can do in the language field. The general objective is that the medium of communication with machines is becoming more and more natural, among other things, because the user will speak or write as usual and the machines will understand it and, if it is programmed, they will obey. Instead of playing recorded messages, machines will create communicative messages, both written and verbal. For this it is necessary a long way and a great basic work. Project details Corpus
(Photo: G. Andonegi).

The project started with the name Hizking21 aims to: For the year 2005 have the currently available tools for English in Basque. Many of them will be created specifically for Basque, others will have been adapted to other languages. The morphology and special syntax of the Basque language will make problems that have not been treated before and that the technology developed to overcome them can make Euskal Herria a world reference in this field.

What is there today

Today, reference to linguistic computer technologies in Basque is essential for the IXA and Aholkularitza groups of the University of the Basque Country. They have developed various computer tools for the treatment of the language: spell checker, lematizer, disambiguator, etc. will be largely the starting point of the project. However, for these to work they need references, lexicons, and to complete and update them, the corpus has become an indispensable tool, a repository of classified, labeled and ordered texts reflecting the real language.

The more natural language processing develops, the easier it will be to use working computer tools. (Photo: G. Andonegi).

The work carried out by Elhuyar over the years in the elaboration of language dictionaries, as well as in technical dictionaries, will serve to complete and feed these lexicons. The material produced and collected in the field of Science and Technology will also be valuable in the creation of very special corpus.

Its main activity is the analysis and treatment of the voice. It has tools to move from a file from voice format to written text and vice versa. For this it is also essential to have references: to teach the machine how to know what it ‘hears’ and how it ‘writes’.

The tools and resources offered are very related to Basque. Therefore, most programs are made with own developed technology. As for interfaces, although some of the work already advanced in other languages is useful, the design of avatars is very advanced, you have to make them speak in Basque. On that road they have also come their way and will continue on.

The Robotiker Foundation, a leader in equipment connectivity in the Basque Country, will be responsible for basic technology at Hizking21. In Euskal Herria, however, there are other agents working in this field such as ASP, Diana Technology...

What to do

The aim is to communicate with machines as naturally as possible. (Photo: G. Andonegi).

Today the need for a general corpus of reference in Basque is undeniable, even more so if we adhere to the field of linguistic engineering. However, one of the objectives of Hizking21 is to offer a consensus and proven methodology that can be the basis for achieving this global goal of the future and develop corpus tools for it, along with the offer of partial resources (specialized corpus) that are constituted in this way.

Intermediate Tools Key
Tools in the Project: Lematizer, disambiguator, syntactic analyzer, etc., which must be continuously supplemented, adapted and improved. Additionally, tools for the correct exploitation of the generated linguistic resources (text analyzers, word extractors, etc.) will also be created.

Language should not be an obstacle to access to progress.

Communication with machines will be somewhat visual and verbal. As technology advances, the results will be better, especially in the representativeness of 3D images. Today good results are obtained with recorded information, but it must be borne in mind that immediacy is essential for speech to be natural: the system ‘understands’ messages, has to create and issue a response, but the answer will not be just a phrase, but must be transmitted with gestures, intonations and special expressions. All this requires great computational needs, both in linguistic treatment and in sound and image synthesis.

And then what?

As mentioned above, the result of the Hizking21 project will not be the creation of specific computer applications, but make available to applicators the tools and technologies that allow them. Destination of software companies for the realization of applications in Basque with linguistic capacity. What applications? There is no shortage of ideas: systems that receive orders telephonic (such as those of domotics), information systems that must answer the questions of the users, aids for machine translation, automatic dictatorship, readers for the blind, systems of help to conduct visits in public places, systems of management of notices in airports and stations, etc. The options are endless. Just run them.

The Hizking21 project has a budget of 7,600,000 €. The Department of Industry, Commerce and Tourism of the Basque Government named Linguistic Infoengineering as a line of research of strategic interest, supported by the Etortek program.

Hizking21 brings together five partners: The Elhuyar Foundation, the IXA and Aholkularitza groups of the University of the Basque Country, the Vicomtech association and the Robotiker Foundation. Eleka S.L. the company also participates in this project, created between IXA and Ehuy. For their part, they have the knowledge and capacity to design systems with linguistic capacity. The work of all consortiums will allow the availability of computer tools that can be incorporated into daily applications.

Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila