File size: 752 Bytes
16a9f96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
### Dataset is about ~2000 hours of speech and vocals
### Supported languages (english or spanish?) who ever moves first is:

~800 hrs of English (with vast verity of speakers and every emotion)

~200 Spanish

~42 French

~188 Russian

~70 Arabic

~140 Japanese

~70 Chinese (Mandarin)

~80 Korean

~30 Hindi

~53 Indonesian

~30 Tagalog

~40 Portuguese

~35 German

~190 singing (all languages)

common language (I don't remember how much data was there)

## Type: big-base for finetuning
Batch: 2-40-80
# Sampling frequency: 32k 40k
Total steps count: 371406
# Hardware used:
1 - h100, 4 - L40s

Expected release date - 22 july

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)
()