Voice Converter

Avatar Styled Voice plugin with A.I. voice Style-Transfer


Custom avatars can include a reference.wav (voice sample) to create a instant voice impression without any need for adjusting voice changer parameters, adds no latency to the recording!

Download the plugin from the Link below

The Styled Voice plugin is a open-source State Of The Art A.I. automated voice style-transfer converter that gives avatars in APS a unique voice resembling any short sample .wav file. The voice style-transfer plugin uses deep learning and Transformers models that are capable of deep-coping a style from one voice (target) to the player's recorded voice (source). Creating multi-avatar conversations that are more expressive and entertaining. Using tools integrated with APS for a seamless workflow. Voice styles stay with the Avatar! And provides creators the ability to voices many characters with zero hassle. 

Generate accurate impressions of target voices using AI speech synthesis. With target voices linked directly to custom avatars saves time and confusion by never having to mess with voice profiles parameters or deal with external voice changer apps.

Creators can leverage a powerful (integrated) voice style transfer tool that uses a trained ML model of the HierSpeechpp implementation and runs inference locally on the GPU, at absolutely no cost!

Python and PyTorch are bundled! This makes installing the plugin simple, And requires no knowledge of Python. Simply extract the contents of the .zip to the "Mocap Fusion Plugins/" folder and that's it!

In this tutorial I will show how you can setup the voice stylization plugin that makes it possible to create storyboard conversations using voice styles of familiar actors. And even how to create a stylized version of any .wav for any propose!

Automatic / Manual Voice Conversion - The voice Inference Plugin typically is automated by APS (or synced) with the mocap to apply the voice style-transfers automatically just after recording if Auto Stylize Voice is enabled. This will use the Target voice from the custom avatar's folder. Targets will change automatically when different avatars are loaded without the user needing to worry about constantly changing voice profiles!

The UI also provides some additional flexibility, for example you can change the target then re-run voice conversion to modify the previous recording mocap audio. Or change the Source to any voice recording of choice to quickly create a stylized version for any purpose, simple!

Main panel - This is where you can configure the voice server address and enable the auto-stylizer. Or delete the stylized audio and/or re-stylize the audio manually.

Look for the Play Reference button,  this is very important!! 

It's recommended to play the refence just before recording, and try to adjust your own voice to sound as close to the target voice as possible, and let the AI do the rest!

Avatars With Unique Voices - You can give avatars a unique stylized voice by creating a `voice_sample` folder in the custom avatar folder and placing a short sample .wav (5-10 seconds or so) into the folder, then that voice style always follows that avatar, so it's possible to record many avatars and conversations while the auto-stylizer ensures the proper style is always applied to the corresponding avatar, making it fast, simple and lots of fun!

Voice Samples - To give a avatar a custom voice you must create a folder named "voice_sample" in the avatar folder:

%USERPROFILE%\appdata\LocalLow\Animation Prep Studios\LUXOR\VR_MocapAssets\ 

reference.wav - This file should be only a few seconds in duration and contain spoken samples of the target voice.

2024-12-18 17-06-06.mp4

Voice style-transfer in multi-avatar scenes

Lower Console - There is a redundant console on the lower menu bar that show log messages as the audio is processing. You can click the log area to close the view , but it's recommended to wait and the view will close automatically after the stylized audio file has been received and aplplied.

Exported Files - When exporting motion capture you should see the stylized audio in the export folder and the original audio file.

Check out some examples

Full body tracking was not used when creating these benchmarks, these were created only to demonstrate/compare the audio voice transfer.  These avatars didn't even include lipsync so the animation quality is very poor, but the speech is what you should focus on!! 


Un-stylized Audio

This animation used the "_original.wav" exported files.

This is just the un-stylized voice recordings.


Stylized Audio

I rendered this animation using the stylized exported audio.

Results of voice style transfer.

Installation Instructions

Now look for the Voice and style-transfer options panel in APS. The Voice panel is located under the settings tab. If you see a "Plug-In Not Found" message displayed then you need to add the Inference Plugin into the Plugins folder, and involves simply downloading a .zip and extracting the contents to the Plugins folder:

Installing The Plugin

APS includes a Plugins folder where plug-ins can be placed to add new tools and functionality to APS. If you are seeing the  "Plug-In Not Found" message you can easily add plugins by downloading the APS plug-in and adding it to the Plugins folder.

⚠️The "Plug-In Not Found" message is displayed when the plugin is not found in the Plugins folder.

Inference Plugin Download

The Inference Plugin.zip voice style-transfer and AI voice changer adds the functionality to create different voices from familiar actors or styled character voices. Fully supporting in APS for voice auto-stylization and exports, by simply placing a target .wav in the avatar folder allows generating unique stylized voice recordings, for any number of avatars, automatically.

Plugins Folder - This folder is located in Steam's game data folder. Make sure to select the correct folder and not the actual game folder! 

By default the plugin folder is located in Steam's games folder:
C:\Program Files (x86)\Steam\steamapps\common\Mocap Fusion Plugins\

Extract Zip - Unzip the contents of the "Inference Plugin.zip"  to the Plugins folder. 

The plugin is built using the PyTorch library, and includes a bundled version of Python with a fully functional web view UI.

After downloading, extract the contents to the Mocap Fusion Plugins folder:

After extracting the Inference folder from Inference Plugin.zip to the Plugins folder nothing more needs to be done. If you navigate into the Inference folder there you should see files including infrence_aps.exe which can be used as a standalone style-transfer voice converter if needed when not running APS. More importantly it's use as the background app that synchronizes the audio processing after mocap recordings!

Plugin Installation Complete! Now you can launch APS from Steam (but don't put on the VR headset yet), once the game finishes loading you should see the new Inference Plugin UI appear on the desktop. Note that the very first time the plugin starts it downloads a model to the cache folder, so it may take up to 30 seconds after the game launches before the UI appears.

⚠️The Desktop Window opens after launching APS

Desktop Window - A very simple UI has been developed to include status and audio files path information.

The front-end is just for convenience. Since the game will automate the UI during mocap without the need for user input.

However the front-end offers some functionality:

You should now be able to record motion capture (and audio), once capture is completed then try clicking on the Stylize button from the Voice panel and you should see the progress begin increasing and some status messages printed after each processing step.

📝Noteworthy: The (optional) Standalone capability of the Desktop Window can be useful for generating clips from almost any .wav file independently of APS. Since the interface can be launched and controlled by the user manually this provides a quick way to create voice style transfers without needing to setup python or anything! And makes a handy utility for generated high quality stylized audio clips using your own target and source files .

If everything is working then you can enable Auto Stylize Voice which will stylize the voice automatically after recordings.

Note the value setting for Auto Stylize Voice is saved, and restored when the game is launched.

That's it, now all that's needed is to add cool voice samples to your custom avatars and have fun!

Thanks for reading ❤️