在当今科技飞速发展的时代,AI 语音合成技术正逐渐改变着我们的生活。今天,就为大家介绍一款卓越的语音合成工具——CosyVoice。
强烈建议下载预训练的CosyVoice - 300M、CosyVoice - 300M - SFT、CosyVoice - 300M - Instruct模型和CosyVoice - ttsfrd资源。
- from modelscope import snapshot_download
- snapshot_download('iic/CosyVoice - 300M', local_dir='pretrained_models/CosyVoice - 300M')
- snapshot_download('iic/CosyVoice - 300M - SFT', local_dir='pretrained_models/CosyVoice - 300M - SFT')
- snapshot_download('iic/CosyVoice - 300M - Instruct', local_dir='pretrained_models/CosyVoice - 300M - Instruct')
- snapshot_download('iic/CosyVoice - ttsfrd', local_dir='pretrained_models/CosyVoice - ttsfrd')
-
- mkdir -p pretrained_models
- git clone https://www.modelscope.cn/iic/CosyVoice - 300M.git pretrained_models/CosyVoice - 300M
- git clone https://www.modelscope.cn/iic/CosyVoice - 300M - SFT.git pretrained_models/CosyVoice - 300M - SFT
- git clone https://www.modelscope.cn/iic/CosyVoice - 300M - Instruct.git pretrained_models/CosyVoice - 300M - Instruct
- git clone https://www.modelscope.cn/iic/CosyVoice - ttsfrd.git pretrained_models/CosyVoice - ttsfrd
-
- cd pretrained_models/CosyVoice - ttsfrd/
- unzip resource.zip -d.
- pip install ttsfrd - 0.3.6 - cp38 - cp38 - linux_x86_64.whl
-
- export PYTHONPATH=third_party/Matcha - TTS
-
- from cosyvoice.cli.cosyvoice import CosyVoice
- from cosyvoice.utils.file_utils import load_wav
- import torchaudio
-
- cosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M - SFT')
- # sft usage
- print(cosyvoice.list_avaliable_spks())
- # change stream=True for chunk stream inference
- for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
- torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)
-
- cosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M')
- # zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
- prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
- for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
- torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
- # cross_lingual usage
- prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
- for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):
- torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)
-
- cosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M - Instruct')
- # instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
- for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):
- torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)
-
可以使用 Web 演示页面快速熟悉 CosyVoice,支持 sft/零样本/跨语言/指令推理。具体详情请参考演示网站。
示例命令:python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice - 300M(可根据需要更改模型)。
对于高级用户,examples/libritts/cosyvoice/run.sh中提供了训练和推理脚本,可以按照此示例熟悉 CosyVoice。
若要使用 grpc 进行服务部署,可执行以下步骤,否则可忽略此步骤。
- cd runtime/python
- docker build -t cosyvoice:v1.0.
-
- docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice - 300M && sleep infinity"
- cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
-
- docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && python3 server.py --port 50000 --model_dir iic/CosyVoice - 300M && sleep infinity"
- cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
-
CosyVoice 以其强大的功能和灵活的使用方式,为我们带来了全新的语音合成体验。快来尝试吧!