In this tutorial, we will show you how to generate speech from text using Python. We’ll be using the Coqui TTS package and Gradio to create a user-friendly interface. This allows you to input any text and generate audio on your system without needing to pay for a service. The generated audio can be downloaded and used as you wish.
Dependencies
To begin, let’s take a look at the dependencies for this project:
- Torch: A deep learning framework that we’ll use for handling the model.
- Coqui TTS: This package allows us to generate speech from text. To get the library click here.
- Gradio: This package creates a web-based GUI to interact with our code. To get the library click here.
You can install these packages using pip:
pip install torch TTS gradio
Setting Up Coqui TTS
First, we need to set up the TTS model. We’re using a pre-trained model from the Coqui TTS library.
import torch
from TTS.api import TTS
device = "cuda" if torch.cuda.is_available() else "cpu"
def generate_audio(text="A journey of a thousand miles begins with a single step."):
tts = TTS(model_name='tts_models/en/ljspeech/fast_pitch').to(device)
tts.tts_to_file(text=text, file_path="outputs/output.wav")
return "outputs/output.wav"
In this code snippet:
- We import the necessary libraries.
- We check if a CUDA-enabled GPU is available and set the device accordingly.
- We define the
generate_audio
function which initializes the TTS model and generates speech from the input text. The generated audio is saved to a file.
Creating the Interface with Gradio
Now, let’s create a user-friendly interface using Gradio. This will allow us to input text and receive the generated audio.
import gradio as gr
demo = gr.Interface(
fn=generate_audio,
inputs=[gr.Text(label="Text")],
outputs=[gr.Audio(label="Audio")]
)
demo.launch()
In this part:
- We import the Gradio library.
- We define an interface using
gr.Interface
.- The
fn
parameter is set to thegenerate_audio
function. - The
inputs
parameter specifies that the input will be text. - The
outputs
parameter specifies that the output will be an audio file.
- The
- Finally, we launch the interface using
demo.launch()
.
Running the Application
To run the application, simply execute the Python script. This will start a local web server and open a browser window with the Gradio interface. Here, you can input any text and click the button to generate the audio.
python script_name.py
Replace script_name.py
with the name of your Python script.
Get Source Code for free:
Conclusion
By following this tutorial, you have successfully created a text-to-speech application using Python. This application leverages Coqui TTS and Gradio to provide a simple and effective solution for generating speech from text. You can further enhance this application by integrating it into other projects or by customizing the interface to suit your needs.
Happy Coding…!
Leave a Reply