Fr, 10. September 2021, Ralf Hersel
Text-to-Speech (TTS), Speech-to-Text (STT) – whichever way you look at it, the topic is interesting. You shouldn’t expect miracles in this post, but a comparison between the robot and the (almost) natural voice. I compare David to Goliath, namely eSpeak-NG with gTTS.
eSpeak-NG is a compact text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine developed by Jonathan Duddington and runs locally on your computer.
This is countered by the Google-TTS Engine, behind the massive server capacities and AI language models. While the tool runs locally, it connects to Google’s servers to provide the text-to-speech translation.
eSpeak-NG is installed as follows:
sudo apt install espeak-ng
For non-Debian-based distributions, use the respective package manager.
Google-TTS (gTTS) is installed like this:
sudo apt install gtts-cli
Falls pip or pip3 is not yet available, a: sudo apt install python3-pip
This command can be used to test whether the two tools have been installed correctly or whether they have already been installed:
espeak-ng --version; gtts-cli --version Ausgabe: eSpeak NG text-to-speech: 1.50 Data at: /usr/lib/x86_64-linux-gnu/espeak-ng-data gtts-cli, version 2.2.3
Both tools can be used directly in the terminal or in an application (Python). Corresponding examples can be found in the sources listed below. Here are two calls in the terminal to try out the output of the TTS tools.
espeak-ng -v de -s 140 'Mein Name ist Ralf. Ich lebe in der Schweiz.'
Don’t forget to turn on the speakers or put on the headphones so that you can hear the voice output. For the Google tool, the sample output is as follows:
gtts-cli 'Mein Name ist Martina. Ich lebe in der Schweiz.' --lang de --output hello.mp3; mpv hello.mp3
Both commands can be enriched with various options. at espeak-ng it is possible to choose between different male or female voices; that works gtts-cli Unfortunately not; there is only the female voice. You can look up the parameters in the linked sources. The example command for gtts-cli actually consists of two commands separated by a semicolon. The tool cannot output the result directly as a sound, it can only save it in an mp3 file. Therefore you have to see which audio playback application is installed on you, for example: mpv, aplay, cvlc, or others. In the example I am using mpv; you can adjust that according to your player.
The aim of the exercise was to demonstrate the very different quality of the two audio outputs. espeak-ng sounds like a robot while gtts-cli comes pretty close to normal language. For this, however, you accept the sending of the input to the Google cloud. An alternative would be to use the applications and examples from the Mozilla Deepspeech project. We also have an article about this project.