Informatikaria eta ikertzailea
Elhuyar Hizkuntza eta Teknologia
Speech technologies are those developed and used by machines and computers to create and understand human speech. They are divided into two main technologies: the creation or synthesis of speech (that machines are capable of speaking), called TTS or Text-To-Speech; and the knowledge of speech (that machines are able to understand what they are told), also expressed by the acronym ASR or Automatic Speech Recognition. At the Elhuyar Foundation we have been working with them for a while, especially for the Basque language, and we have developed various tools to help all people have full inclusion in many areas.
As for the synthesis of speech in Basque, the research group of the University of the Basque Country/Euskal Herriko Unibertsitatea Aholkularitza is the main reference. For years they have been working on the linguistic creation of the Basque language, and their Canal TTS system is the one that gets the best result in Basque. At Elhuyar we have developed various solutions and tools for accessibility, based on the system | TTS.
One of them allows you to listen to web pages instead of reading them. Five years ago we inform you of this product in this same section. At that time we tell you that the magazine Elhuyar was installed in the web pages and Zientzia.eus, but since then we have put it in other sites: in eitb.eus, in several sections of the web of the UPV/EHU, in the web Sara.com... This tool allows us to read the content of these websites, turning the text of them into a speech, so it is very interesting for blind or disabled visuals, as well as for anyone, to make the content of the small screens of the mobile. However, it has other uses that go beyond accessibility: walking down the street or driving by car, to listen to the content with the headphones without having to be fixed on the screen, going by train or bus, appropriating the contents, without discomfort of the triki-traqua, etc. This tool, which appears as a bar player, allows you to select between two voices (male or female) and adjust the reading speed.
The Wikimedia Foundation wants to do the same in one of its most well-known projects, Wikipedia: implement the technology so that all people can access its content without obstacles. To do this, he launched a couple of years ago the Wikispeech project, which aims to create a bar player to read Wikipedia articles through TTS. This project is quite advanced and is expected to be implemented soon. Initially it was developed in a few languages (English, Arabic, Swedish and Norwegian), but in Elhuyar, commissioned by EWKE, the Basque Cultural Association of Wikispeech, we have already made the synthesis in Basque (already mentioned 2015-TTS) for integration in Wikispeech. Therefore, when Wikipedia implements and presents Wikispeech, the Basque language will be among the few initial languages.
The Digital Reader is another instrument that we have worked for inclusion through the synthesis of speech, for the Berritzegune Nagusia of the Department of Education of the Basque Government. The Local Inclusive School or PCPI needed a solution to help children with dyslexia in their learning and education process. Dyslexia, of neurological origin, is an alteration that affects linguistic competencies related to reading and writing. The TTS based tools are very useful to help people who have this problem, and many of them existed, but they did not exist in Basque. Thus, the children who used these tools had to listen to texts in Basque with the TTS in Spanish, which caused problems: accents and erroneous intonations, inappropriate pronouncements of the consonants g, z, x, tz, ts and tx...
The developed tool is an additive that works on web browsers Digital Reader, Firefox and Chrome, which reads the web pages we open in the browser, PDF documents or text documents (including Google Docs documents, so used in the educational world). You can also choose one of the two voices and adjust the speed, but also mark the word you are reading and read it literally, which also helps in the sharpest cases of dyslexia.
Recently we have designed and launched the product Bidaia. With Bidaide, anyone can freely use and enjoy tourist and cultural resources (museums, tourist and cultural routes...) as well as public buildings. It has three main components: language and speech technologies for content creation and management, accessibility advice and an app for mobile phones.
In terms of content management, as has already been indicated, linguistic and speech technologies are used to promote accessibility. Accessibility has to do, among other things, with linguistic options, since if there is a tour, a web or anything else in one or a few languages, it is not accessible to those who do not know these languages. Therefore, in order to make a tour or a building as accessible as possible, the explanations or orientations and orientations of the centers of interest of them must be in the largest possible number of languages.
Bidaide makes available to the manager a web platform for the management of the texts of the explanations and allows to have content in several languages and use the machine translation. However, if the contents and explanations are in text format, they are not accessible to blind or visually impaired people. Therefore, the web content management platform manages audios in different languages and, if you wish, can be created automatically, using the synthesis of Elhuyar's speech.
In the field of accessibility consulting, the collaboration of a company specializing in it is required. On the one hand, they propose or carry out the necessary adaptations to make the route or building accessible. On the other hand, it adds additional optional accessibility information to the explanations of the critical points of interest and travel, such as architectural barriers, steep slopes, descriptions of samples, notes for contact with sculptures, etc. In addition, if they wish, they write expository texts following easy reading guidelines for people with cognitive disabilities or language comprehension difficulties. Finally, when everything is ready, they perform accessibility tests with users with different features and functional diversity.
As for the mobile phone application, once installed on the phone, it is the application itself that is responsible for informing each type of user according to its characteristics: exposure of explanatory texts or reproduction of audios, additional information of accessibility, etc. The application itself is accessible, conjugated with the user's accessibility implantations, with color contrasts and pictograms… And, finally, guide people with visual impairment or blindness along the route, explaining them in the most important places: turn left, follow another 30 meters forward... For this purpose, GPS technology is used on the exterior routes and inside routes some beacons are placed in the key points that emit the Bluetooth signal and that can detect when the mobile phones are near.
The Bidaia project, therefore, is a pioneer, since it aims to guarantee access to all people to culture, tourism and public services, respecting and recognizing human diversity. Our intention is to open it to the maximum to become the largest possible number of accessible and inclusive spaces. Recently we have implemented the Harria Hitz route of Usurbil. The Harria Hitz tour aims to make known the role of Usurbil in the recovery of contemporary Basque culture through a series of elements that can be seen in the urban core. In this case, the seven points of the tour have been shown in six languages; those of Catalan and Galician have been created directly by machine translation; all audios have been created by TTS and the explanations are written following the guidelines of a simple reading.
Beyond the creation of speech, it is clear that knowledge of speech or ASR can contribute a lot to the inclusion of all people. For example, it can help people with physical or motor disabilities work with computers, understanding and executing speech orders: “Open the browser”, “save the file”... Do not forget that when you have to write long texts, the knowledge of the speech allows not to use the keyboard but a dictation system. On the other hand, at present, through mobile phones and smart speakers, interaction is achieved through speech, which are more and more used for comfort, but for many people with functional diversity is the only way to use these devices, which is essential. It can also be useful for deaf or hearing impaired people to have access to audiovisual content, as subtitles can be created automatically through the ASR. In this way, it can facilitate the process of creating subtitles to the creators of content and, in cases where the creator does not offer them, create automatically the user directly, which, without being perfect, can be enough (and better than nothing) to understand the content.
There are already tools and services of this type that we all know: The Windows operating system can be controlled for a long time by speech; on Youtube you can also activate automatic subtitles if you do not understand the language of the video... But unfortunately they do not work in Basque. In Elhuyar we also work on the knowledge of Basque, in order to offer the tools mentioned in Basque as soon as possible. We hope that in a later article we can wait a little soon and we will inform you of this type of solutions.