<p> </p> <p style="text-align:center"><img alt="非死book开源VoiceLoop,根据开放场景语音文字合成新语音" src="https://simg.open-open.com/show/181c60b9fd639296266ac248df40e051.jpg" /></p> <p><a href="/misc/goto?guid=4959010596809650528">PyTorch通过语音循环实现了野外演讲者的语音合成中描述的方法。</a></p> <p style="text-align:center"><img alt="非死book开源VoiceLoop,根据开放场景语音文字合成新语音" src="https://simg.open-open.com/show/054f8903fdc37c0b4913bb545cbb3649.jpg" /></p> <p>VoiceLoop是一种神经文本到语音(TTS),能够在野外采样的语音中将文本转换为语音。 一些演示样品可以在<a href="/misc/goto?guid=4959010596904752839">这里找到</a>。</p> <h2>快速链接</h2> <ul> <li><a href="/misc/goto?guid=4959010596904752839">Demo Samples</a></li> <li><a href="/misc/goto?guid=4959010597005332521">Quick Start</a></li> <li><a href="/misc/goto?guid=4959010597097717936">Setup</a></li> <li><a href="/misc/goto?guid=4959010597186326568">Training</a></li> </ul> <h2>快速开始</h2> <p>按照安装程序中的说明,然后简单地执行:</p> <pre> python generate.py --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth</pre> <p>结果将放在models / vctk / results中。 它将生成2个样本:</p> <ul> <li>The <a href="/misc/goto?guid=4959010597284644995">generated sample</a> will be saved with the gen_10.wav extension.</li> <li>Its <a href="/misc/goto?guid=4959010597370135642">ground-truth (test) sample</a> is also generated and is saved with the orig.wav extension.</li> </ul> <p>You can also generate the same text but with a different speaker, specifically:</p> <pre> python generate.py --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 18 --checkpoint models/vctk/bestmodel.pth</pre> <p>Which will generate the following <a href="/misc/goto?guid=4959010597467728044">sample</a>.</p> <p>Here is the corresponding attention plot:</p> <h2>安装</h2> <p>Requirements: Linux/OSX, Python2.7 and <a href="/misc/goto?guid=4959010597544472141">PyTorch 0.1.12</a>. The current version of the code requires CUDA support for training. Generation can be done on the CPU.</p> <pre> git clone https://github.com/非死bookresearch/loop.git cd loop pip install -r scripts/requirements.txt</pre> <h3>Data</h3> <p>用于训练本文中模型的数据可以通过以下方式下载:</p> <pre> bash scripts/download_data.sh</pre> <p>The script downloads and preprocesses a subset of <a href="/misc/goto?guid=4959010597633912820">VCTK</a>. This subset contains speakers with american accent.</p> <p>The dataset was preprocessed using <a href="/misc/goto?guid=4959010597722901893">Merlin</a> - from each audio clip we extracted vocoder features using the <a href="/misc/goto?guid=4959010597807645675">WORLD</a> vocoder. After downloading, the dataset will be located under subfolder <code>data</code> as follows:</p> <pre> <code>loop ├── data └── vctk ├── norm_info │ ├── norm.dat ├── numpy_feautres │ ├── p294_001.npz │ ├── p294_002.npz │ └── ... └── numpy_features_valid </code></pre> <p>The preprocess pipeline can be executed using the following script by Kyle Kastner:<a href="/misc/goto?guid=4959010597899575985">https://gist.github.com/kastnerkyle/cc0ac48d34860c5bb3f9112f4d9a0300</a>.</p> <h3>预训模型</h3> <p>Pretrainde models can be downloaded via:</p> <pre> bash scripts/download_models.sh</pre> <p>After downloading, the models will be located under subfolder <code>models</code> as follows:</p> <pre> <code>loop ├── data ├── models ├── vctk │ ├── args.pth │ └── bestmodel.pth └── vctk_alt </code></pre> <h3>SPTK and WORLD</h3> <p>Finally, speech generation requires <a href="/misc/goto?guid=4959010597985636440">SPTK3.9</a> and <a href="/misc/goto?guid=4959010597807645675">WORLD</a> vocoder as done in Merlin. To download the executables:</p> <pre> bash scripts/download_tools.sh</pre> <p>Which results the following sub directories:</p> <pre> <code>loop ├── data ├── models ├── tools ├── SPTK-3.9 └── WORLD </code></pre> <h2>训练</h2> <p>在vctk上训练一个新的模型,首先使用4的噪声级别和100的输入序列长度训练模型:</p> <pre> python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90</pre> <p>然后,继续训练模型使用2的噪声水平,完整序列:</p> <pre> python train.py --expName vctk_noise_2 --data data/vctk --checkpoint checkpoints/vctk/bestmodel.pth --noise 2 --seq-len 1000 --epochs 90</pre> <h2>引文</h2> <p>如果您发现这段代码在您的研究中有用,请引用:</p> <pre> <code>@article{taigman2017voice, title = {Voice Synthesis for in-the-Wild Speakers via a Phonological Loop}, author = {Taigman, Yaniv and Wolf, Lior and Polyak, Adam and Nachmani, Eliya}, journal = {ArXiv e-prints}, archivePrefix = "arXiv", eprinttype = {arxiv}, eprint = {1705.03122}, primaryClass = "cs.CL", year = {2017} month = July, } </code></pre> <h2>许可</h2> <p>Loop has a CC-BY-NC license.</p> <p> </p> <p>代码地址:<a href="/misc/goto?guid=4959010598092446744">https://github.com/非死bookresearch/loop</a></p> <p>论文地址:https://arxiv.org/abs/1707.06588</p> <p> </p>