### Dataset is about ~2000 hours of speech and vocals ### Supported languages (english or spanish?) who ever moves first is: ~800 hrs of English (with vast verity of speakers and every emotion) ~200 Spanish ~42 French ~188 Russian ~70 Arabic ~140 Japanese ~70 Chinese (Mandarin) ~80 Korean ~30 Hindi ~53 Indonesian ~30 Tagalog ~40 Portuguese ~35 German ~190 singing (all languages) common language (I don't remember how much data was there) ## Type: big-base for finetuning Batch: 2-40-80 # Sampling frequency: 32k 40k Total steps count: 371406 # Hardware used: 1 - h100, 4 - L40s Expected release date - 22 july ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png) ()