MUSTAR's picture
Create README.md
16a9f96 verified
|
raw
history blame
No virus
752 Bytes

Dataset is about ~2000 hours of speech and vocals

Supported languages (english or spanish?) who ever moves first is:

~800 hrs of English (with vast verity of speakers and every emotion)

~200 Spanish

~42 French

~188 Russian

~70 Arabic

~140 Japanese

~70 Chinese (Mandarin)

~80 Korean

~30 Hindi

~53 Indonesian

~30 Tagalog

~40 Portuguese

~35 German

~190 singing (all languages)

common language (I don't remember how much data was there)

Type: big-base for finetuning

Batch: 2-40-80

Sampling frequency: 32k 40k

Total steps count: 371406

Hardware used:

1 - h100, 4 - L40s

Expected release date - 22 july

image/png ()