saillab/vits_multi_cv_15_validated_dataset

Prerequisities:
Python >= 3.9
Espeak-NG : sudo apt install -y espeak-ng
TTS (from the repo): pip install -U pip setuptools wheel git clone https://github.com/coqui-ai/TTS pip install -e TTS/
Setup Environment
init.sh--> add "sudo apt update && sudo apt upgrade -y
sudo apt install -y python3.10 python3.10-dev python3.10-venv /usr/bin/python3.10 -m venv /opt/python/envs/py310 /opt/python/envs/py310/bin/python -m pip install -U pip setuptools wheel /opt/python/envs/py310/bin/python -m pip install -U ipykernel ipython ipython_genutils jedi lets-plot aiohttp pandas

sudo apt install -y espeak-ng"

in attached data --> on file environment.yml -->change datalore-base-env:"minimal" to "py310"
background computation --> Never Trun off
git clone https://github.com/coqui-ai/TTS.git
navigate to TTS (cd TTS)
pip install -e.
Run Multi-GPU
CUDA_VISIBLE_DEVICES="0,1" accelerate launch --multi_gpu --num_processes 2 multi-speaker.py
For avoiding any intruption over your training use trainer = Trainer( TrainerArgs(use_accelerate=True), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples, ) trainer.fit()
Faster train with more than 1 num_loader_workers=4 in advance you should do sudo mount -o remount,size=8G /dev/shm
Run with one GPU
!nvidia-smi(status of GPU)
os.environ["CUDA_VISIBLE_DEVICES"] = "7" which GPU you intend to run your code
How to fix Error over runtime
Error :"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 9.77 GiB of which 52.31 MiB is free. Including non-PyTorch memory, this process has 8.68 GiB memory in use. Of the allocated memory 8.25 GiB is allocated by PyTorch, and 155.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" Solution : reduce batch_size
Error: Can't find some wavs although they are existed Solution : your wavs might be nested in files and cannot find nested files
if you use common-voice as your formatter your wavs must be store in clips
Error :dimension Solution : mix precesion =false
Tensorboard :
For tensorboard download the latest output and un zip it go to that file run this command on window shell go cd to the address you stored the file : tensorboard --logdir=. --bind_all --port=6007 --> url open in your browser
Use wandb -->import wandb Start a wandb run with sync_tensorboard=True if wandb.run is None: wandb.init(project="persian-tts-vits-grapheme-cv15-fa-male-native-multispeaker-RERUN", group="GPUx8 accel mixed bf16 128x32", sync_tensorboard=True)
For Multi-Speaker
use_speaker_embedding=True
speaker_manager = SpeakerManager() speaker_manager.set_ids_from_data(train_samples + eval_samples, parse_key="speaker_name") config.num_speakers = speaker_manager.num_speakers
model = Vits(config, ap, tokenizer, speaker_manager=speaker_manager)

saillab
/

vits_multi_cv_15_validated_dataset

README