Interesting Research: "OpenVoice: Versatile Instant Voice Cloning"
Open-Source Project to clone any voices and style changes
Hello all! Welcome back to weekly Interesting Research. For today's research, I want to discuss the open-source voice cloning research development by Qin et al. (2023).
As we know, AI is become a popular term that everyone is throwing around. I am sure most of my readers are already using the ChatGPT and might benefit from it.
However, AI tools are not limited to text generation. There is much more, e.g., images, videos, and audio. We will talk about audio generation AI research as today’s topic.
The audio generation AI tools, but many were not publicly available and didn’t have the quality we needed. That’s why OpenVoice research is an interesting approach for open-source projects.
So, what is OpenVoice, and how can we benefit from them? Let’s get into it.
For my paid subscribers, you can read the next section anytime. For the free subscriber, the post would be unlocked in 3 days.
OpenVoice
OpenVoice is an open-source research by Qin et al. (2023) to perform voice cloning using a short audio clip. OpenVoice could replicate their voice from the input and generate speech in multiple languages.
Additionally, OpenVoice enables a flexible voice style change in the emotion, accent, rhythm, pauses, and intonation. The flexibility also extends to their zero-shot capability to clone the voice into another language.
Computationally wise, it’s also efficient enough. I have tried the OpenVoice performance in the Google Colab with GPU, which doesn't spend much memory compared to many generation AI activities.
In general, the OpenVoice framework is shown in the image below.
Let’s see an example for the OpenVoice. Here is the sample voice coming from the OpenVoice example resource.
With OpenVoice, we try to clone the voice and use it to say the following text:
"Hi everyone. This is an audio example generated with OpenVoice and in whispering tone."
Sounds amazing, right? Well, I try to produce my own example with a voice from Japanese language to English. Here is an example voice.
Then I try to make them cheerful but say the text more slowly. The text I want to say is "This audio is generated by open voice. I am excited to meet you all".
The voice cloning is not bad, but I think it could still be more perfect. If I had a clearer voice source, it would probably be better.
To experiment with the research, visit their repository and follow the procedure. It’s very easy and not taking too much time.
Thank you, everyone, for subscribing to my newsletter. If you have something you want me to write or discuss, please comment or directly message me through my social media!