VoxCPM2 – Voice and TTS cloning in 30 languages, free and open source
Main chat
A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.
If you need a voice for a project — voice acting, a live speech bot, a clone of your own voice for automation — VoxCPM2 from China’s OpenBMB lab is now one of the strongest open options available. The 22.9k stars on GitHub, Apache-2.0, are set by a single command.
Repository: github.com/OpenBMB/VoxCPM
What is it
VoxCPM2 is a TTS model with 2 billion parameters, trained on more than 2 million hours of speech. Tokenizer-free architecture: The model does not translate text into tokens, but works directly in the audio space through a diffusion autoregression approach. In practice, this gives more natural intonation and better preservation of voice details during cloning.
Built on the basis of the language model MiniCPM-4, delivers audio in 48kHz studio quality.
Four modes of use
Voice Design – create a voice from a text description without reference audio. Describe the character of the voice in parentheses directly in the text, the model generates the appropriate:
wav = model.generate(
text="(молодой мужчина, спокойный и уверенный голос)Привет, я ваш ассистент.",
cfg_value=2.0,
inference_timesteps=10,
)
*Controllable Cloning – clone a voice from a short audio clip, while controlling the style: tempo, emotion, expression. The timbre is maintained, the voice behavior is flexible:
wav = model.generate(
text="(чуть быстрее, бодрый тон)Добрый день!",
reference_wav_path="speaker.wav",
)
*Ultimate Cloning - maximum cloning accuracy: transmit audio and its transcription, the model continues to speak as a continuation of the original, preserving every detail - rhythm, timbre, emotion.
*Basic TTS is simply text-to-speech synthesis, with no references, in any of the 30 supported languages.
Supported languages
30 languages: arabic, burmese, vietnamese, greek, danish, hebrew, indonesian, spanish, italian, chinese (including 9 dialects: sichuan, cantonese, shanghai and others), korean, malay, netherlands, german, norwegian, polish, portuguese, russian, swahili, tagalog, thai, turkish, finnish, french, hindi, swedish, japanese, english, as well as khmer and lao.
There is no need to specify the language tag - the model determines the language automatically.
Installation and quick start
pip install voxcpm
Requirements: Python 3.10–3.12, PyTorch ≥ 2.5.0, CUDA ≥ 12.0. VRAM: ~8 GB for VoxCPM2.
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM.from_pretrained(
"openbmb/VoxCPM2",
load_denoiser=False,
)
wav = model.generate(
text="Привет! Это VoxCPM2 — синтез речи на русском языке.",
cfg_value=2.0,
inference_timesteps=10,
)
sf.write("output.wav", wav, model.tts_model.sample_rate)
To run the web interface locally:
python app.py --port 8808
# открыть в браузере: http://localhost:8808
Productivity and production
On NVIDIA, the RTX 4090 RTF (real-time factor) is about 0.3—that is, one second of speech is generated in about 0.3 seconds. With Nano-vLLM, it accelerates to ~0.13 RTF, which makes real-time streaming work.
For production-deploy supported vLLM-Omni with OpenAI-compatible API /v1/audio/speech - you can connect as a replacement for ElevenLabs in any service:
vllm serve openbmb/VoxCPM2 --omni --port 8000
curl http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model":"openbmb/VoxCPM2","input":"Привет из VoxCPM2!","voice":"default"}' \
--output out.wav
Fine tuning
The model supports LoRA and full file tuning – enough 5-10 minutes of audio to adapt to a specific voice or domain. For this, there is a ready-made WebUI:
python lora_ft_webui.py # http://localhost:7860
What projects are suitable for
VoxCPM2 is well suited if you need to: create a voice for a Telegram bot or voice assistant, add voiceover to an application without buying an API from ElevenLabs, clone your voice to automate content, or embed TTS in production with an OpenAI-compatible API.
The Apache-2.0 license allows commercial use – there are no restrictions on monetization.
** Repository:** github.com/OpenBMB/VoxCPM · 22.9k ** Demo:** huggingface.co/spaces/OpenBMB/VoxCPM-Demo ** Documentation:** voxcpm.readthedocs.io