Python wrapper for Espeak and Mbrola, for simple local TTS

hadware hadware Last update: May 31, 2022

Voxpopuli

PyPIPyPIBuild StatusDocumentation Statuslicense

A wrapper around Espeak and Mbrola.

This is a lightweight Python wrapper for Espeak and Mbrola, two co-dependent TTS tools. It enables you torender sound by simply feeding it text and voice parameters. Phonemes (the data transmitted by Espeak tombrola) can also be manipulated using a mimalistic API.

This is a short introduction, but you might want to look at the readthedoc documentation.

Install

These instructions should work on any Debian/Ubuntu-derivative

Install with pip as:

pip install voxpopuli

You have to have espeak and mbrola installed beforehand:

sudo apt install mbrola espeak

You'll also need some mbrola voices installed, which you can either get on their project page,and then uppack in /usr/share/mbrola/<lang><voiceid>/ or more simply byinstalling them from the ubuntu repo's. All the voices' packages are of the formmbrola-<lang><voiceid>. You can even more simply install all the voices availableby running:

sudo apt install mbrola-*

In case the voices you need aren't all in the ubuntu repo's, you can use this convenient little scriptthat install voices directly from Mbrola's voice repo:

# this installs all british english and french voices for instancesudo python3 -m voxpopuli.voice_install en fr

Usage

Picking a voice and making it say things

The most simple usage of this lib is just bare TTS, using a voice anda text. The rendered audio is returned in a .wav bytes object:

from voxpopuli import Voicevoice = Voice(lang="fr")wav = voice.to_audio("salut c'est cool")

Evaluating type(wav) whould return bytes. You can then save the wav using the wbfile option

with open("salut.wav", "wb") as wavfile:    wavfile.write(wav)

If you wish to hear how it sounds right away, you'll have to make sure you installed pyaudio via pip, and then do:

voice.say("Salut c'est cool")

Ou can also, say, use scipy to get the pcm audio as a ndarray:

import scipy.io.wavfile import read, writefrom io import BytesIOrate, wave_array = read(BytesIO(wav))reversed = wave_array[::-1] # reversing the sound filewrite("tulas.wav", rate, reversed)

Getting different voices

You can set some parameters you can set on the voice, such as language or pitch

from voxpopuli import Voice# really slow fice with high pitchvoice = Voice(lang="us", pitch=99, speed=40, voice_id=2)voice.say("I'm high on helium")

The exhaustive list of parameters is:

  • lang, a language code among those available (us, fr, en, es, ...) You can listthem using the listvoices method from a Voice instance.
  • voice_id, an integer, used to select the voice id for a language. If not specified,the first voice id found for a given language is used.
  • pitch, an integer between 0 and 99 (included)
  • speed, an integer, in the words per minute. Default and regular speedis 160 wpm.
  • volume, float ratio applied to the output sample. Some languages have presetsthat our best specialists tested. Otherwise, defaults to 1.

Handling the phonemic form

To render a string of text to audio, the Voice object actually chains espeak's outputto mbrola, who then renders it to audio. Espeak only renders the text to a list ofphonemes (such as the one in the IPA), who then are to be processed by mbrola.For those who like pictures, here is a diagram of what happens when you runvoice.to_audio("Hello world")

phonemes

phonemes are represented sequentially by a code, a duration in milliseconds, anda list of pitch modifiers. The pitch modifiers are a list of couples, each couplerepresenting the percentage of the sample at which to apply the pitch modification andthe pitch.

Funny thing is, with voxpopuli, you can "intercept" that phoneme list as asimple object, modify it, and then pass it back to the voice to render it toaudio. For instance, let's make a simple alteration that'll double theduration for each vowels in an english text.

from voxpopuli import Voice, BritishEnglishPhonemesvoice = Voice(lang="en")# here's how you get the phonemes listphoneme_list = voice.to_phonemes("Now go away or I will taunt you a second time.") for phoneme in phoneme_list: #phoneme list object inherits from the list object    if phoneme.name in BritishEnglishPhonemes.VOWELS:        phoneme.duration *= 3        # rendering and saving the sound, then saying it out loud:voice.to_audio(phoneme_list, "modified.wav")voice.say(phoneme_list)

Notes:

  • For French, Spanish, German and Italian, the phoneme codesused by espeak and mbrola are available as class attributes similar to the BritishEnglishPhonemes class as above.
  • More info on the phonemes can be found here: SAMPA page

What's left to do

  • Moar unit tests
  • Maybe some examples

Subscribe to our newsletter