My AI Audiobook Maker - Demo and Installation
Science & Technology
Introduction
Today, I am officially releasing my open-source audiobook maker as a package. In this article, I'll provide a demonstration of how it works, walk you through the installation process, and highlight some key features.
Demonstration
To start, I’ll clear the default background image of the application. Next, I will select a text file for the audiobook. Let's name the book "Myself," as I will be using my own voice for this project. After selecting the voice model, I click to start the generation process.
What happens next is quite fascinating. The sentences from the text file are sent over to Tortoise, the text-to-speech engine, which processes them and returns audio. For instance, one of the generated sentences reads:
"In my younger and more vulnerable years, my father gave me some advice that I’ve been turning over in my mind ever since..."
I can pause the audio, and if I decide to close the application, I can choose to load an existing audiobook I created. This will allow me to continue generating the audio from where I left off without losing any progress.
The audio quality is remarkable, especially since it utilizes both Tortoise and the RVC (Real-Time Voice Cloning) for voice modeling.
Installation Process
Now, let’s talk about how to install this audiobook maker:
Prerequisites
Before installation, ensure you have the following:
- An Nvidia graphics card (Windows environment).
- Tortoise installed on your computer. If you don't have it, check out my previous video for a comprehensive installation guide.
Next, visit my GitHub page for the audiobook maker to find the prerequisites, such as the SZip Extractor, which you will need to download and install.
- Download the 64-bit version of the SZip Extractor.
- For CUDA support, ensure you have CUDA version 11.8 installed from the official CUDA website.
Installing the Package
On my GitHub releases page, download the version 1.0 package. This file will be relatively large, so make sure you have adequate disk space.
Once downloaded, right-click the file, navigate to the SZip option, and extract it to a designated folder. It may take a few minutes depending on your CPU speed.
After extraction, look for
start package.bat
in the audiobook maker folder and double-click to run it.The audiobook maker application should launch, alongside a command line window for any error debugging.
Configuring Tortoise
Before generating any audiobooks, launch Tortoise and navigate to http://localhost:7860
. Here, you will need to find the voice you plan to use and note it down for configuration in the tor.yaml
file located in the audiobook maker folder.
Open the tor.yaml
with a text editor and input the trained voice name, adjust sample iteration rates, and configure other settings as desired. Save the changes.
Additionally, make sure you set the results folder as an absolute path where the audio files will be stored. Finally, configure the autoregressive model that you used when training Tortoise.
Creating Your Audiobook
To create an audiobook, you need a text file formatted appropriately. You can use a basic text editor like Notepad to compile your text, saving it in a .txt
format.
Once prepared, relaunch the audiobook maker, select your text file, and enter your audiobook metadata such as voice models and pitch settings.
Hit the "Start Audiobook Generation" button to begin the process. You can play the audio as it generates in real-time or wait until the complete generation is finished.
The interface provides options for regenerating audio for specific sentences, editing existing sentences, as well as exporting the complete audiobook to a single file, which includes customizable pauses between sentences.
Conclusion
In summary, the audiobook maker is a fantastic tool for creating audiobooks using AI. It's user-friendly, efficient, and leverages advanced voice synthesis technologies for high-quality output.
Keywords
- Audiobook Maker
- Open-Source
- Tortoise
- RVC
- Text-to-Speech
- Installation
- CUDA
- AI Voice Cloning
FAQ
1. What are the system requirements for the audiobook maker?
You need a Windows PC with an Nvidia graphics card and Tortoise installed.
2. Can I use voices other than my own?
Yes, you can select from various voice models available through RVC.
3. What if I encounter errors during installation?
Check the GitHub issues tab for troubleshooting or report a new issue for assistance.
4. Does the audiobook maker support file formats other than text files?
Currently, it only supports plain text files for input.
5. How are audio generation settings stored?
The settings are saved in JSON files that track voice model, pitch, and other parameters.