abscheik commited on
Commit
ec58b47
1 Parent(s): 71d4d4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -37
README.md CHANGED
@@ -4,13 +4,33 @@ license: apache-2.0
4
  base_model: openai/whisper-small
5
  tags:
6
  - generated_from_trainer
 
 
 
 
7
  metrics:
8
  - wer
 
 
 
 
9
  model-index:
10
- - name: whisper-small-hassaniya
11
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
-
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
@@ -18,24 +38,58 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 1.2149
22
- - Model Preparation Time: 0.0065
23
  - Wer: 52.3635
24
  - Cer: 19.0151
25
 
 
26
  ## Model description
27
 
28
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- ## Intended uses & limitations
31
 
32
- More information needed
33
 
34
- ## Training and evaluation data
35
 
36
- More information needed
 
 
37
 
38
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ### Training hyperparameters
41
 
@@ -51,35 +105,34 @@ The following hyperparameters were used during training:
51
  - training_steps: 2000
52
  - mixed_precision_training: Native AMP
53
 
54
- ### Training results
55
-
56
- | Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Wer | Cer |
57
- |:-------------:|:-------:|:----:|:---------------:|:----------------------:|:-------:|:-------:|
58
- | 0.9126 | 2.2472 | 100 | 0.8441 | 0.0065 | 65.8109 | 25.6766 |
59
- | 0.202 | 4.4944 | 200 | 0.9230 | 0.0065 | 57.5795 | 21.3726 |
60
- | 0.0854 | 6.7416 | 300 | 1.0011 | 0.0065 | 58.8020 | 21.1788 |
61
- | 0.0497 | 8.9888 | 400 | 1.0513 | 0.0065 | 57.2535 | 20.7988 |
62
- | 0.0358 | 11.2360 | 500 | 1.0700 | 0.0065 | 57.9055 | 21.7216 |
63
- | 0.026 | 13.4831 | 600 | 1.0964 | 0.0065 | 56.2755 | 20.8918 |
64
- | 0.0199 | 15.7303 | 700 | 1.1063 | 0.0065 | 55.9495 | 20.2482 |
65
- | 0.0114 | 17.9775 | 800 | 1.1709 | 0.0065 | 56.0717 | 20.8530 |
66
- | 0.0084 | 20.2247 | 900 | 1.1633 | 0.0065 | 56.6830 | 20.4653 |
67
- | 0.005 | 22.4719 | 1000 | 1.1659 | 0.0065 | 54.8900 | 20.4110 |
68
- | 0.0026 | 24.7191 | 1100 | 1.1591 | 0.0065 | 54.6455 | 20.2714 |
69
- | 0.001 | 26.9663 | 1200 | 1.1771 | 0.0065 | 54.3602 | 19.6278 |
70
- | 0.0005 | 29.2135 | 1300 | 1.1900 | 0.0065 | 53.9527 | 19.4959 |
71
- | 0.0004 | 31.4607 | 1400 | 1.1971 | 0.0065 | 53.6675 | 19.3641 |
72
- | 0.0003 | 33.7079 | 1500 | 1.2049 | 0.0065 | 52.5672 | 19.0694 |
73
- | 0.0003 | 35.9551 | 1600 | 1.2069 | 0.0065 | 52.6895 | 19.1082 |
74
- | 0.0002 | 38.2022 | 1700 | 1.2107 | 0.0065 | 52.6487 | 19.0151 |
75
- | 0.0002 | 40.4494 | 1800 | 1.2125 | 0.0065 | 52.4450 | 19.0151 |
76
- | 0.0002 | 42.6966 | 1900 | 1.2145 | 0.0065 | 52.4450 | 19.0539 |
77
- | 0.0002 | 44.9438 | 2000 | 1.2149 | 0.0065 | 52.3635 | 19.0151 |
78
-
79
 
80
  ### Framework versions
81
 
82
  - Transformers 4.44.1
83
  - Pytorch 2.3.1+cu121
84
  - Datasets 2.21.0
85
- - Tokenizers 0.19.1
 
4
  base_model: openai/whisper-small
5
  tags:
6
  - generated_from_trainer
7
+ - ASR
8
+ - Hassaniya
9
+ - Mauritanian Arabic
10
+ - Arabic Dialects
11
  metrics:
12
  - wer
13
+ - cer
14
+
15
+ pipeline_tag: automatic-speech-recognition
16
+
17
  model-index:
18
+ - name: whisper-samll-hassaniya
19
+ results:
20
+ - task:
21
+ name: Automatic Speech Recognition
22
+ type: automatic-speech-recognition
23
+ dataset:
24
+ name: Hassaniya Audio Dataset
25
+ type: private
26
+ metrics:
27
+ - name: Word Error Rate
28
+ value: 52.3635
29
+ type: wer
30
+ - name: Character Error Rate
31
+ value: 19.0151
32
+ type: cer
33
  ---
 
34
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
35
  should probably proofread and complete it, then remove this comment. -->
36
 
 
38
 
39
  This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset.
40
  It achieves the following results on the evaluation set:
 
 
41
  - Wer: 52.3635
42
  - Cer: 19.0151
43
 
44
+
45
  ## Model description
46
 
47
+ This model utilizes state-of-the-art techniques in AI and NLP to provide efficient and accurate automatic speech recognition for the Hassanya dialect. This initiative addresses both a technological need and a cultural imperative to preserve a linguistically unique form of Arabic.
48
+
49
+ ## Intended Uses & Limitations
50
+
51
+ This model is intended for use in professional transcription services and linguistic research. It can facilitate the creation of accurate textual representations of Hassanya speech, contributing to digital heritage preservation and linguistic studies. Users should note that performance may vary based on the audio quality and the speaker's accent.
52
+
53
+ ## Training and Evaluation Data
54
+
55
+ The model was trained on a curated dataset of Hassanya audio recordings collected through AudioScribe, an application dedicated to high-quality data collection. The dataset is divided into three subsets with the following total audio lengths:
56
+
57
+ - **Training set**: 5 hours 30 minutes
58
+ - **Testing set**: 4 minutes
59
+ - **Evaluation set**: 18 minutes
60
 
61
+ This diverse dataset includes various speech samples from native speakers across different age groups and genders to ensure robust model performance.
62
 
63
+ ## Model Performance
64
 
65
+ The model has been evaluated at several stages to assess its accuracy and efficiency both before and after training. Below are the key performance metrics:
66
 
67
+ - **Pre-training Evaluation on Eval Set**
68
+ - WER: 108.7612
69
+ - CER: 63.9705
70
 
71
+ - **Post-training Evaluation on Eval Set**
72
+ - WER: 52.3634
73
+ - CER: 19.0151
74
+
75
+ - **Post-training Evaluation on Test Set**
76
+ - WER: 53.4791
77
+ - CER: 19.6108
78
+
79
+ These results show a significant improvement in all metrics over the course of training, especially in terms of the Word Error Rate and Character Error Rate, demonstrating the model's growing accuracy and efficiency in recognizing Hassanya speech.
80
+
81
+ ## Training Procedure
82
+
83
+ ### Resource Capacity During Training
84
+
85
+ The training session was conducted on Google Colab Pro, which provided the following resource capacities:
86
+
87
+ - **System RAM**: The environment was equipped with 51 GB of system memory, offering ample capacity for handling data processing and model operations during training.
88
+ - **GPU RAM**: The GPU provided had a capacity of 15 GB of memory, which is well-suited for training large models or handling substantial batch sizes, especially when using advanced techniques like mixed-precision training.
89
+ - **Disk Space**: A total of 201.2 GB of disk space was available, ensuring sufficient storage for datasets, model checkpoints, and logs throughout the training process.
90
+
91
+ ### Training Duration
92
+ The training of the model was completed in 160.32 minutes (approximately 2 hours 40 minutes).
93
 
94
  ### Training hyperparameters
95
 
 
105
  - training_steps: 2000
106
  - mixed_precision_training: Native AMP
107
 
108
+ ### Training Results
109
+
110
+ | Step | Epoch | Training Loss | Validation Loss | Wer | Cer |
111
+ |:-----:|:-------:|:-------------:|:---------------:|:-------:|:-------:|
112
+ | 100 | 2.2472 | 0.9126 | 0.8441 | 65.8109 | 25.6766 |
113
+ | 200 | 4.4944 | 0.202 | 0.9230 | 57.5795 | 21.3726 |
114
+ | 300 | 6.7416 | 0.0854 | 1.0011 | 58.8020 | 21.1788 |
115
+ | 400 | 8.9888 | 0.0497 | 1.0513 | 57.2535 | 20.7988 |
116
+ | 500 | 11.2360 | 0.0358 | 1.0700 | 57.9055 | 21.7216 |
117
+ | 600 | 13.4831 | 0.026 | 1.0964 | 56.2755 | 20.8918 |
118
+ | 700 | 15.7303 | 0.0199 | 1.1063 | 55.9495 | 20.2482 |
119
+ | 800 | 17.9775 | 0.0114 | 1.1709 | 56.0717 | 20.8530 |
120
+ | 900 | 20.2247 | 0.0084 | 1.1633 | 56.6830 | 20.4653 |
121
+ | 1000 | 22.4719 | 0.005 | 1.1659 | 54.8900 | 20.4110 |
122
+ | 1100 | 24.7191 | 0.0026 | 1.1591 | 54.6455 | 20.2714 |
123
+ | 1200 | 26.9663 | 0.001 | 1.1771 | 54.3602 | 19.6278 |
124
+ | 1300 | 29.2135 | 0.0005 | 1.1900 | 53.9527 | 19.4959 |
125
+ | 1400 | 31.4607 | 0.0004 | 1.1971 | 53.6675 | 19.3641 |
126
+ | 1500 | 33.7079 | 0.0003 | 1.2049 | 52.5672 | 19.0694 |
127
+ | 1600 | 35.9551 | 0.0003 | 1.2069 | 52.6895 | 19.1082 |
128
+ | 1700 | 38.2022 | 0.0002 | 1.2107 | 52.6487 | 19.0151 |
129
+ | 1800 | 40.4494 | 0.0002 | 1.2125 | 52.4450 | 19.0151 |
130
+ | 1900 | 42.6966 | 0.0002 | 1.2145 | 52.4450 | 19.0539 |
131
+ | 2000 | 44.9438 | 0.0002 | 1.2149 | 52.3635 | 19.0151 |
 
132
 
133
  ### Framework versions
134
 
135
  - Transformers 4.44.1
136
  - Pytorch 2.3.1+cu121
137
  - Datasets 2.21.0
138
+ - Tokenizers 0.19.1