New neuronal speech synthesis of Elhuyar

Leturia Azkarate, Igor

Informatikaria eta ikertzailea

Elhuyar Hizkuntza eta Teknologia

In 2014, Elhuyar launched a new technological service: speech synthesis. This audio converting technology has since been used by our customers to offer different services. Language and speech technologies have advanced a great deal thanks to the technology of neural networks, and at Elhuyar we have developed a neuronal technology of speech synthesis, with greater quality and new possibilities. Let us know the new web service of Neo-speaking synthesis.

elhuyarren-hizketa-sintesi-neuronal-berria
Ed. Jackie Niam/Shutterstoc.com

Two technologies predominate among speech technologies: ASR (Automatic Speech Recognition) or speech knowledge, which consists in transcription or conversion of speech audio into text, called TTS (Text-To-Speech) or speech synthesis, which consists in reading or converting a text into audio. For a few years we have been working for Euskera in the Orai artificial intelligence center, created by Elhuyar, and based on both, Elhuyar socializes services in Euskera. As for the ASR, in 2020 we launched the transcription, subtitling and dictation service Aditu .eus. As for TTS, since 2014 there has been the online text-to-speech service and the website.

This service was based on the Aho TTS technology developed by the research group Aholab of the University of the Basque Country, which was developed with the best techno-technical logic of the time and was the only one working in Euskera. In these years, technology has been used in several places and cases: to allow you to listen to the content of some websites (Elhuyar aldizkaria, Zientzia. eus, EITB.eus, Sarean .eus, the service for people with disabilities of the UPV, so that teachers can share desired material and resources, the Department of Education's Ama Rauna website...), to help students to express their words in Basque.

New neural technology TTS

Since then, all language and speech technologies, including TTS, have become operational through technology known as deep neural networks or deep learning, which delivers much better results. And while it was said then that the speech we created synthetically was quite natural (and so it was for the standards then), what today is achieved with neural networks is much more natural, from almost authentic language to infinity.

Well, over the past few years, we in Orai have been developing a neural synthesis of speech in Euskera, and we already have our own system. It is of very good quality what seems a real speech in pronunciation, intonation, prosody... In addition, with today's technologies, we can do things that used to cost more and have new features. For example, in the previous system, for every different voice that was intended to be created, a model was trained and enough recordings were needed. Today, however, in the same model we can have far more voices and much less recording time, so we can more easily create new synthetic voices.

In addition, you can create multilingual models that we have created in six languages: Basque, Spanish, French, English, Catalan and Galician. Through them you can get the recordings to be made with a person in a given language (say, Basque), but then the model trained with those recordings is able to make a synthesis in another language (say, English, French, or Catalan) with the voice of that person! That is, you can put a person “speaking” in another language, without having any idea of that language!

Optional multipurpose web service

This year Elhuyar launched the web service based on the new neural technology in https://ttsneuronala.elhuyar.eus/. Among the six languages mentioned and in each of them we can choose between two or four different voices, give a text and turn it into a speech. The quality of the voices can be checked through the web text box.

Moreover, if we want to, we can also create our personalized voice, which only we can use. To do this, just record about ten minutes by reading a few sentences and then we can make a speech synthesis with our voice in the recorded language or any other. Examples of custom voices thus created can be heard on the websites of Elhuyar or Goiena magazine (in the case of Elhuyar you can also switch to other options).

There are several ways of using technology. The simplest and easiest thing is through the text box, where we will glue the desired text and create audio. We also offer an API that allows us to access our application or service. And if instead of reading a website you want to give the opportunity to listen to it, we also offer the code of a breeding bar, which is easily inserted into the web.

Customers before the TTS service have already migrated to new services and voices, and are also benefiting from new companies (Tokikom, Skura, Batasuna, Ulma, Ibil, Naiz...), many of them with custom voices.

And what use does TTS technology actually have, what are these customers using? It has many possible uses. One of the most common is to make websites more accessible and/or accessible by means of the breeding bar (e.g. walking on mobile or public transport). Through the API, and in combination with the ASR, interaction with machines or apps through speech is also allowed. Using the text box we can create an audible podcast without having to record it directly from the text, or create voices off for our audiovisual. In the near future, it will also be possible to double (semi-automatic), adding the TTS to our service of subtitling and automatic translation Aditu.

At the moment, our TTS technology creates neutral speech, which is enough to read the content of a media or web page, to speak a machine or for voices off. But in Orai we continue to research in many directions, to also have emotional voices, to be able to parametrize and model in the letter the speech that occurs (speed of each interval, intonation, bolu­me...), in order to make a synthesis imitating a voice with a small sample without the need to train our own models through recordings... Fully observable, so that these kinds of tools are also present in an increasingly technological and technological world.

Babesleak
Eusko Jaurlaritzako Industria, Merkataritza eta Turismo Saila