The quality of the work of the human translator will undoubtedly be better and richer, but today it is possible to create documents in a specific and technical field such as meteorology, using automatic techniques. In
this article we present the interactive system Multimeteo that uses multilingual textual creation in the field of meteorology, as well as the adaptation we have made to the creation in Basque. The developed system offers daily weather forecasts at the following web address: http://www.ingurumena.net/udala //www.inm.es/wwi/Multimeteo/Multimeteo.html
Although automatic text creation is not used, a system that automatically translates weather predictions must be mentioned here. The METEO system created by the Montreal TAUM group has been the most successful translation system of all time. It was difficult to find translators for boring translations that looked like daily, and Canada's official weather service began investigating automatic routes. The METEO system obtained has been translating meteorological newsletters from English into French since 1977, and 80% of its translation is totally direct. However, the success of meteorology has not spread, since although the system has adapted to other issues, no results of equal quality have been obtained. It seems that the field of weather predictions has a special adaptation to this type of automatic processes.
The Forecast Generator (FoG) work environment was also launched in Canada in 1993. In this system, the meteorologist uses a graphical editor to adapt the map showing the weather data and subsequently the system automatically generates the weather forecast in English and French for the region.
In 1995 the French Meteorological Service (Meteo France) promoted the MultiMeteo project for the publication of weather forecasts in several languages. He contacted the National Meteorological Institute (INM) of Spain, the Royal Meteorological Institute (RMI) of Belgium, the Zentralanstallt für Meteorologie und Geodynamik of Austria (ZAMG) and two companies specialized in linguistic creation: Lexiquest, based in Paris, and CL Language Services in Madrid. The German Meteorology Service (DWD) also joined initially, but was subsequently abandoned.
These associations presented the project called “Multilingual Production of Weather Forecasts” and obtained community funding. The system was developed in four languages: French, English, Spanish and German. The results of the evaluation carried out in February 1999 were very positive.
In 2000 INM and Lexiquest reached an agreement to extend the system to four more languages: Dutch, Catalan, Galician and Basque. The Ixa Group and the UZEI Terminology Center of the Faculty of Computer Science of San Sebastian have been in charge of broadcasting to Basque, and at this moment we are about to finish the development phase of the project.
Two sources are used for collecting meteorological data: surface data collection and spatial collection. Surface data are taken at meteorological observatories, where physical variables describing the state of the atmosphere are measured and collected at all times. Other data obtained from space are meteorological satellites, geostationary satellites METEOSAT and polar satellites of the TIROS-NOAA series, which do not stop sending information.
All numerical data obtained are processed by complex mathematical models. Automatic processes simulate the evolution of physical variables in the coming days, generating data matrices for meteorological predictions. The meteorologist then has the opportunity to retouch these data matrices, that is, to complete and round the forecast with his experience. As a conclusion, as seen in Table 1, the matrices present data of temperature (Te), wind direction (DD) and force (FF), clouds, rain, etc. for different hours (periods of 3 hours in the case of the INM system). For each point of the map, an array of this type is obtained.
With this data meteorologists create weather forecasts manually. This work is very long and expensive, especially when a single prediction has to be made several versions in different languages or styles (general predictions, beaches, sea, mountain, by community, by province...).
There is the interest of MultiMeteo. It is not about replacing the work of meteorologists, but about contributing in an interactive way to their tasks, so that predictions can be disseminated in different languages and styles. In addition, it allows you to make predictions for different places on the map.
This technique, first, by automatic creation, generates a draft from perhaps incomplete input data. Although it has the ability to create text in several languages, the meteorologist, to act as a corrector, is offered only in his native language. If the meteorologist wants to make a correction in a text snippet, click on the part you want to modify. Then the pop-up menu will offer you a number of options and alternative modifiers, choosing one of them to perform the correction comfortably. Taking into account the changes made, the system will generate predictive texts in all languages.
The advantages of this technique are the speed (to produce each text in each language it takes about 2 seconds; a human translator needs about 10 minutes); the feasibility of creation, although some data has not yet been collected, the high quality of the texts created (sometimes with human touches); the ease of maintenance and adaptation; and finally, the acceptance by human users (meteorologists will not them to write in foreign languages).
MultiMeteo creates two ways:
Weather forecast *IS *CO. *MO *FD.
Local time: *FP.
Ad value: *TT.
where:
The generation engine used by the system was developed in 1994 in French for the automatic generation of commercial cards. In 1995 it extended to English by integrating into a prototype translation of technical manuals. And the same year was also integrated into the project “Multilingual Production of Weather Forecasts” to incorporate new languages and functionalities in the creation of meteorological newsletters (interactive creation and management of stylistic knowledge).
The system architecture can be seen in figure 2. The first phase consists of obtaining and reformatting a meteorological database that allows the use of generation modules. Subsequently, the task of the creation module is divided into two parts: plan and execute.
Planning uses knowledge bases of concepts and styles (EU) and is divided into two phases:
The event is a conceptual object associated with the meteorological situation or evolution of the situation. The phenomena are of two types: atomic and molecular.
The atomic event represents a meteorological parameter without evolution, with a single associated value ( Value attribute). For example, the atomic event representing the covered sky is:
Event_CloudCovering4: Event{} Value=Class CloudCovering_code4 is a set of simple concepts: Overcast, NoSun and VeryCloudy-Overcast. Each of these concepts is associated with a term in each language.
The molecular event indicates more than one parameter. For example, when we talk about wind we can have strength, direction and evolution data. They can carry several values ( Value0, Value1, etc. attributes), as well as an operator (Operator attribute) that specifies how to collect these values. For example, the molecular event to describe the cloudless sky to be covered is:
Cloudier_Min0: Event_mol{ Value0= Event_CloudCovering0;This molecular event is manifested by two atomic episodes and an operator. It serves to situate the events time - representation in time (present, past or future) and indicates the period (day, morning, afternoon, night...).
At the exit of the planning module a concept is selected for each atomic event and for each class of Operator attribute of molecular events. In addition, other attributes can be added (automatically or in interaction with the meteorologist): probability index, phase, period...
The module to materialize linguistically the concepts obtained in each language is based on the Theory of Meaning - Text (Mel’cuk 1988, Polguère 1988). This phase uses a linguistic knowledge base that is divided into five stages: predenotation, semantics, deep syntax, surface syntax and morphology.
The computational work for the diffusion of the MultiMeteo system into Basque has been developed by the IXA group and the terminological work has been done by UZEI. The adaptations to Galician and Catalan have been made from the Castilian version, and they have had to work mainly the lexicon, since no major changes in syntax and morphology were required. For Basque, although we have left Spanish (and sometimes French), most of the sentence structures have been modified and we have had to work especially with morphological declination marks.
We started our work in three phases:
The adaptation is carried out in three subphases: first we approach the atomic events (for example, the “sky, covered”), then the molecular events that were easy (for example, the “wind, weak, from the north”), and finally, the molecular events that presented special difficulties (for example, the sky, initially covered, with rain, later very temporarily covered).
In each of the adaptation phases, a previous linguistic analysis, an analysis and design of the information to be included in the knowledge base, an introduction and proof of the information of a representative example for each event and, finally, an introduction and proof of all the possibilities for each type of event.
The main characteristics of this adaptation are:
If you would later like to expand the system with other styles, more cases of decline should be used, so these cases should be introduced in the dictionary. Let us see, for example, the introduction of the vocabulary of the word rain:
BA_Euri1: LexemeTable 3 shows how several atomic concepts have materialized in Basque (including Spanish and French reference).
Table 4 shows the execution of several molecular concepts. The variables indicate, when indicated, the values of this event: Variables N state of the clouds (oscarbia, under cloud, covered...); Variables DD wind direction (north, southwest, etc. ); FF variables are wind force (moderate, strong,...); Variables TS precipitation (rain, sirimiri...), PER period (mornings...)...
The project is currently in the last stages of development. The next step is a massive test to analyze possible system errors. Then make the necessary changes and final evaluation. However, the adaptation is already integrated into the INM system and the weather forecasts of the Spanish state communities are offered every day on the web http://www.inm.es/wwi/ MultiMeteo/Multimeteo.html.
In addition to the telegraphic writing of the general objective, the realization of special purpose predictions (for beaches, mountaineers, skiers...) and the elaboration of richer writings (for example, the introduction of verbs with complete sentences) would be feasible steps in the medium term. This type of complete versions have been made in French and are currently used. At the moment it would be enough to analyze the usefulness of the system developed for the Basque language, and if later the need was detected, then the organization of the aforementioned improvements should be addressed.