Audiobook Maker Demo | RCV

Introduction

In this article, we'll delve into the functionality and features of the Audiobook Maker tab, exploring its various options for creating audiobooks with high-quality synthesized voices. This guide will walk through the steps necessary to produce an audiobook, including selecting models, adjusting settings, inputting text, and blending voices with background music.

Getting Started with Audiobook Maker

Upon entering the Audiobook Maker tab, users are greeted with a user-friendly interface containing various dropdowns and sliders. Here's an outline of the key elements of the interface and how to use them:

Selecting the Voice Model

Model Tab: The first step is to select a trained voice model. For this demonstration, we'll choose "Azuro," a female voice model we previously trained in the training tab.
TTS Speaker: Next, navigate to the TTS speaker dropdown. This dropdown lists all the default audio devices on our system, offering options such as AETing End and Library options. Here, we'll select the IND speaker.
Pitch Extraction Method: We will utilize RMPE, a robust and efficient pitch extraction method known for its accuracy in audio signal processing.

Adjusting Settings with Sliders

Index Rate: This slider controls the level of detail in the voice model features. Setting this slider to 1 will enable all features of Azuro, creating a perfect clone of her voice for precise cloning.
Pitch Slider: Since both the model and TTS speaker are female voices, we will retain the pitch slider at its default value of 0.
Speed Slider: The speed percentage controls how fast or slow the voice will be. A setting of -10 will result in a slower and more deliberate speech.
Protect Slider: This slider influences the stability and clarity of speech synthesis. For optimal clarity, we will balance the protection level to ensure the generated speech sounds clear and stable.

Text Input

In the text box, input the main text for the audiobook. For our example, we have selected a model designed for Japanese, paired with a Hindi speaker. Once the text is entered, pressing the convert button allows the system to synthesize the speech.

Listening to Synthesized Speech

After the conversion, users can listen to the synthesized speech. Listening to the outputs allows for evaluation of each voice model's performance. For instance, we will first listen to the synthesized TTS voice before comparing it with the cloned voice from Azuro.

Creating a Different Voice Output

To explore different combinations, we can select a female voice as the TTS speaker and a male voice model, such as "Obama." After adjusting the pitch slider to -12 to lower the pitch for the female speaker, we again press the convert button to synthesize this new voice output.

Background Music Selection

The next step involves selecting a background music track. The interface provides a volume slider to control the music level—higher slider values will lower the music volume, while lower values increase it. We will set this slider to around 9, ensuring the background music complements the synthesized voice without overpowering it.

Combining Voices with Music

Once all settings are adjusted and the voice has been synthesized, pressing the combine button will merge the cloned voice with the selected background music. The final output will display here for review.

Result Evaluation

For demonstration purposes, the merge results in "Obama" narrating a short story in Hindi overlaid with background music, creating an engaging audiobook experience.

Keyword

Audiobook Maker
TTS (Text-to-Speech)
Voice Model
Synthesis
Background Music
AETing End
Azuro
Pitch Extraction
Index Rate
Clarity

FAQ

What is the Audiobook Maker?
- The Audiobook Maker is a tool that allows users to create audiobooks by synthesizing speech from text using various voice models and settings.
How do I select a voice model?
- To select a voice model, navigate to the Model tab and choose from the list of trained models.
What does the index rate slider do?
- The index rate slider adjusts the level of detail applied to the synthesized voice features, with a setting of 1 utilizing all features of the selected model.
Can I use background music in my audiobook?
- Yes, the Audiobook Maker allows users to select background music and adjust its volume to create a more engaging listening experience.
What is RMPE?
- RMPE stands for Robust Multi-Pitch Estimation, a technique used for accurately extracting pitch from audio signals, improving the quality of synthesized speech.

Audiobook Maker Demo | RCV | TTS