Text to Speech – let your computer speak

Fr, 10. September 2021, Ralf Hersel

Text-to-Speech (TTS), Speech-to-Text (STT) – whichever way you look at it, the topic is interesting. You shouldn’t expect miracles in this post, but a comparison between the robot and the (almost) natural voice. I compare David to Goliath, namely eSpeak-NG with gTTS.

eSpeak-NG is a compact text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages ​​and accents. It is based on the eSpeak engine developed by Jonathan Duddington and runs locally on your computer.

This is countered by the Google-TTS Engine, behind the massive server capacities and AI language models. While the tool runs locally, it connects to Google’s servers to provide the text-to-speech translation.

Installation

eSpeak-NG is installed as follows:

sudo apt install espeak-ng

For non-Debian-based distributions, use the respective package manager.

Google-TTS (gTTS) is installed like this:

sudo apt install gtts-cli

Falls pip or pip3 is not yet available, a: sudo apt install python3-pip

This command can be used to test whether the two tools have been installed correctly or whether they have already been installed:

espeak-ng --version; gtts-cli --version
Ausgabe:
eSpeak NG text-to-speech: 1.50  Data at: /usr/lib/x86_64-linux-gnu/espeak-ng-data
gtts-cli, version 2.2.3

Use

Both tools can be used directly in the terminal or in an application (Python). Corresponding examples can be found in the sources listed below. Here are two calls in the terminal to try out the output of the TTS tools.

espeak-ng -v de -s 140 'Mein Name ist Ralf. Ich lebe in der Schweiz.'

Don’t forget to turn on the speakers or put on the headphones so that you can hear the voice output. For the Google tool, the sample output is as follows:

gtts-cli 'Mein Name ist Martina. Ich lebe in der Schweiz.' --lang de --output hello.mp3; mpv hello.mp3

Both commands can be enriched with various options. at espeak-ng it is possible to choose between different male or female voices; that works gtts-cli Unfortunately not; there is only the female voice. You can look up the parameters in the linked sources. The example command for gtts-cli actually consists of two commands separated by a semicolon. The tool cannot output the result directly as a sound, it can only save it in an mp3 file. Therefore you have to see which audio playback application is installed on you, for example: mpv, aplay, cvlc, or others. In the example I am using mpv; you can adjust that according to your player.

Conclusion

The aim of the exercise was to demonstrate the very different quality of the two audio outputs. espeak-ng sounds like a robot while gtts-cli comes pretty close to normal language. For this, however, you accept the sending of the input to the Google cloud. An alternative would be to use the applications and examples from the Mozilla Deepspeech project. We also have an article about this project.

Sources:

See also  Text - World premiere for the new Toyota bZ4X Concept

https://github.com/espeak-ng/espeak-ng

https://gtts.readthedocs.io/en/latest/index.html

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.