Voice Converter

Avatar Styled Voice plugin with A.I. voice Style-Transfer

Spock.mp4

Custom avatars can include a reference.wav (voice sample) to create a instant voice impression without any need for adjusting voice changer parameters, adds no latency to the recording!

Download the plugin from the Link below

The Styled Voice plugin is a open-source State Of The Art A.I. automated voice style-transfer converter that gives avatars in APS a unique voice resembling any short sample .wav file. The voice style-transfer plugin uses deep learning and Transformers models that are capable of deep-coping a style from one voice (target) to the player's recorded voice (source). Creating multi-avatar conversations that are more expressive and entertaining. Using tools integrated with APS for a seamless workflow. Voice styles stay with the Avatar! And provides creators the ability to voices many characters with zero hassle.

Generate accurate impressions of target voices using AI speech synthesis. With target voices linked directly to custom avatars saves time and confusion by never having to mess with voice profiles parameters or deal with external voice changer apps.

Inference Plugin Download

DOWNLOAD

Windows 10 and Windows 11
CUDA enabled GPU recomended

The Inference Plugin.zip voice style-transfer adds the functionality to create different voices from familiar actors or styled character voices in APS for voice auto-stylization and exports, by simply placing a target .wav in the avatar folder you can generate unique stylized voice recordings for each avatars, automatically.

Knight Voice-1.mp4

Knight Voice
Example of a noble knight character using voice stylization.

Ogre Voice-1.mp4

Ogre Voice
Example of a grumpy ogre character using voice stylization.

Tomb 1 Tutorial-1.mp4

Professor Voice
Archeologist in a ancient tomb opens a large stone door - using voice synthesis.

Creators can leverage a powerful (integrated) voice style transfer tool that uses a trained ML model of the HierSpeechpp implementation and runs inference locally on the GPU, instantly and at absolutely no cost!

Python and PyTorch are already bundled in the addon! This makes installing the plugin simple. And requires no knowledge of Python. It's simple, just extract the contents of the .zip to the "Mocap Fusion Plugins/" folder and that's it! Read the tutorail below.

In this tutorial I will show how you can setup the voice stylization plugin that makes it possible to create storyboard conversations using voice styles of familiar actors. And even how to create a stylized version of any .wav for any propose!

Automatic / Manual Voice Conversion - The voice Inference Plugin typically is automated by APS (or synced) with the mocap to apply the voice style-transfers automatically just after recording if Auto Stylize Voice is enabled. This will use the Target voice from the custom avatar's folder. Targets will change automatically when different avatars are loaded without the user needing to worry about constantly changing voice profiles!

The UI also provides some additional flexibility, for example you can change the target then re-run voice conversion to modify the previous recording mocap audio. Or change the Source to any voice recording of choice to quickly create a stylized version for any purpose, simple!

Main panel - This is where you can configure the voice server address and enable the auto-stylizer. Or delete the stylized audio and/or re-stylize the audio manually.

Auto Stylize Voice - When enabled will run after recording.
Stylize Button - Manually converts the audio to styled.
Delete Button - Restores the original microphone recording.
Play Reference - Plays reference.wav audio clip.
Log - Where activity messages are displayed.

Look for the Play Reference button, this is important!!

It's recommended to play the refence just before recording, and try to adjust your own voice to sound close to the target voice, and let the AI do the rest!

Avatars With Unique Voices - You can give avatars a unique stylized voice by creating a `voice_sample` folder in the custom avatar folder, or click the open folder button, and place a short sample .wav (5-10 seconds or so) into the folder, then that voice style always follows that avatar, so it's possible to record many avatars and conversations while the auto-stylizer ensures the proper style is always applied to the corresponding avatar, making it fast, simple and lots of fun!

Open Folder - Click this button to open the voice_samples folder. This button will also create the folder if it doesn't exist.

Voice Samples - To give a avatar a custom voice you must create a folder named "voice_sample" in the avatar folder:

The user must create this folder manually as the APS_SDK does not currently do this!
This folder is optional and deleting it will not affect the avatar.

%USERPROFILE%\appdata\LocalLow\Animation Prep Studios\LUXOR\VR_MocapAssets\

reference.wav - This file should be only a few seconds in duration and contain spoken samples of the target voice.

Try to avoid including background music.
Noise is fine, eg. low-fi crackling and adds to the style!
High quality recordings are best for voice impressions.

2024-12-18 17-06-06.mp4

Voice style-transfer in multi-avatar scenes

Lower Console - There is a redundant console on the lower menu bar that show log messages as the audio is processing. You can click the log area to close the view , but it's recommended to wait and the view will close automatically after the stylized audio file has been received and aplplied.

Exported Files - When exporting motion capture you should see the stylized audio in the export folder and the original audio file.

Multi-avatar scene exports include stylized audio for all avatars.
The original un-stylized audio files are included as "_original.wav".

Check out some examples

Full body tracking was not used when creating these benchmarks, these were created only to demonstrate/compare the audio voice transfer. These avatars didn't even include lipsync so the animation quality is very poor, but the speech is what you should focus on!!

UnStyled.mp4

Un-stylized Audio

This animation used the "_original.wav" exported files.

This is just the un-stylized voice recordings.

Styled.mp4

Stylized Audio

I rendered this animation using the stylized exported audio.

Results of voice style transfer.

Installation Instructions

Now look for the Voice and style-transfer options panel in APS. The Voice panel is located under the settings tab. If you see a "Plug-In Not Found" message displayed then you need to add the Inference Plugin into the Plugins folder, and involves simply downloading a .zip and extracting the contents to the Plugins folder:

Installing The Plugin

APS includes a Plugins folder where plug-ins can be placed to add new tools and functionality to APS. If you are seeing the "Plug-In Not Found" message you can easily add plugins by downloading the APS plug-in and adding it to the Plugins folder.

⚠️The "Plug-In Not Found" message is displayed when the plugin is not found in the Plugins folder.

Inference Plugin Download

DOWNLOAD

Windows 10 and Windows 11
CUDA enabled GPU recomended

Plugins Folder - This folder is located in Steam's game data folder. Make sure to select the correct folder and not the actual game folder!

By default the plugin folder is located in Steam's games folder:
C:\Program Files (x86)\Steam\steamapps\common\Mocap Fusion Plugins\

Extract Zip - Unzip the contents of the "Inference Plugin.zip" to the Plugins folder.

The plugin is built using the PyTorch library, and includes a bundled version of Python with a fully functional web view UI.

After downloading, extract the contents to the Mocap Fusion Plugins folder:

After extracting the Inference folder from Inference Plugin.zip to the Plugins folder nothing more needs to be done. If you navigate into the Inference folder there you should see files including infrence_aps.exe which can be used as a standalone style-transfer voice converter if needed when not running APS. More importantly it's use as the background app that synchronizes the audio processing after mocap recordings!

This folder also contains the Transformers model cache, a bundled Python environment and minimal PyTorch libraries.

Plugin Installation Complete! Now you can launch APS from Steam (but don't put on the VR headset yet), once the game finishes loading you should see the new Inference Plugin UI appear on the desktop. Note that the very first time the plugin starts it downloads a model to the cache folder, so it may take up to 30 seconds after the game launches before the UI appears.

⚠️The Desktop Window opens after launching APS

Desktop Window - A very simple UI has been developed to include status and audio files path information.

The front-end is just for convenience. Since the game will automate the UI during mocap without the need for user input.

However the front-end offers some functionality:

By selecting audio clips from the UI, this allows running the loaded style-transfer any custom clip on the PC.
Or select a different Target voice for on-the-fly editing of the previous mocap recorded audio.

Click on the speaker buttons to play the samples for review.

You should now be able to record motion capture (and audio), once capture is completed then try clicking on the Stylize button from the Voice panel and you should see the progress begin increasing and some status messages printed after each processing step.

Horse Town-1(1).mp4

Example of "Auto Stylize Voice"

📝Noteworthy: The (optional) Standalone capability of the Desktop Window can be useful for generating clips from almost any .wav file independently of APS. Since the interface can be launched and controlled by the user manually this provides a quick way to create voice style transfers without needing to setup python or anything! And makes a handy utility for generated high quality stylized audio clips using your own target and source files .

If everything is working then you can enable Auto Stylize Voice which will stylize the voice automatically after recordings.

Note the value setting for Auto Stylize Voice is saved, and restored when the game is launched.

That's it, now all that's needed is to add cool voice samples to your custom avatars and have fun!

Thanks for reading ❤️

Google Sites

Report abuse