Python Text To Speech Local - Coqui TTS Python Guide

In this tutorial, we will show you how to generate speech from text using Python. We’ll be using the Coqui TTS package and Gradio to create a user-friendly interface. This allows you to input any text and generate audio on your system without needing to pay for a service. The generated audio can be downloaded and used as you wish.

If you prefer watching a video tutorial on TTS in python here is a link to that.

Dependencies

To begin, let’s take a look at the dependencies for this project:

Torch: A deep learning framework that we’ll use for handling the model.
Coqui TTS: This package allows us to generate speech from text. To get the library click here.
Gradio: This package creates a web-based GUI to interact with our code. To get the library click here.

You can install these packages using pip:

pip install torch TTS gradio

Setting Up Coqui TTS

First, we need to set up the TTS model. We’re using a pre-trained model from the Coqui TTS library.

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"

def generate_audio(text="A journey of a thousand miles begins with a single step."):
    tts = TTS(model_name='tts_models/en/ljspeech/fast_pitch').to(device)
    tts.tts_to_file(text=text, file_path="outputs/output.wav")
    return "outputs/output.wav"

In this code snippet:

We import the necessary libraries.
We check if a CUDA-enabled GPU is available and set the device accordingly.
We define the generate_audio function which initializes the TTS model and generates speech from the input text. The generated audio is saved to a file.

Creating the Interface with Gradio

Now, let’s create a user-friendly interface using Gradio. This will allow us to input text and receive the generated audio.

import gradio as gr

demo = gr.Interface(
    fn=generate_audio,
    inputs=[gr.Text(label="Text")],
    outputs=[gr.Audio(label="Audio")]
)

demo.launch()

In this part:

We import the Gradio library.
We define an interface using gr.Interface.
- The fn parameter is set to the generate_audio function.
- The inputs parameter specifies that the input will be text.
- The outputs parameter specifies that the output will be an audio file.
Finally, we launch the interface using demo.launch().

Running the Application

To run the application, simply execute the Python script. This will start a local web server and open a browser window with the Gradio interface. Here, you can input any text and click the button to generate the audio.

python script_name.py

Replace script_name.py with the name of your Python script.

Get Source Code for free:

Source Code

Conclusion

By following this tutorial, you have successfully created a text-to-speech application using Python. This application leverages Coqui TTS and Gradio to provide a simple and effective solution for generating speech from text. You can further enhance this application by integrating it into other projects or by customizing the interface to suit your needs.

Happy Coding…!