whisper-small-hassaniya

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

Wer: 52.3635
Cer: 19.0151

Model description

This model utilizes state-of-the-art techniques in AI and NLP to provide efficient and accurate automatic speech recognition for the Hassanya dialect. This initiative addresses both a technological need and a cultural imperative to preserve a linguistically unique form of Arabic.

Intended Uses & Limitations

This model is intended for use in professional transcription services and linguistic research. It can facilitate the creation of accurate textual representations of Hassanya speech, contributing to digital heritage preservation and linguistic studies. Users should note that performance may vary based on the audio quality and the speaker's accent.

Training and Evaluation Data

The model was trained on a curated dataset of Hassanya audio recordings collected through AudioScribe, an application dedicated to high-quality data collection. The dataset is divided into three subsets with the following total audio lengths:

Training set: 5 hours 30 minutes
Testing set: 4 minutes
Evaluation set: 18 minutes

This diverse dataset includes various speech samples from native speakers across different age groups and genders to ensure robust model performance.

Model Performance

The model has been evaluated at several stages to assess its accuracy and efficiency both before and after training. Below are the key performance metrics:

Pre-training Evaluation on Eval Set
- WER: 108.7612
- CER: 63.9705
Post-training Evaluation on Eval Set
- WER: 52.3634
- CER: 19.0151
Post-training Evaluation on Test Set
- WER: 53.4791
- CER: 19.6108

These results show a significant improvement in all metrics over the course of training, especially in terms of the Word Error Rate and Character Error Rate, demonstrating the model's growing accuracy and efficiency in recognizing Hassanya speech.

Training Procedure

Resource Capacity During Training

The training session was conducted on Google Colab Pro, which provided the following resource capacities:

System RAM: The environment was equipped with 51 GB of system memory, offering ample capacity for handling data processing and model operations during training.
GPU RAM: The GPU provided had a capacity of 15 GB of memory, which is well-suited for training large models or handling substantial batch sizes, especially when using advanced techniques like mixed-precision training.
Disk Space: A total of 201.2 GB of disk space was available, ensuring sufficient storage for datasets, model checkpoints, and logs throughout the training process.

Training Duration

The training of the model was completed in 160.32 minutes (approximately 2 hours 40 minutes).

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 2000
mixed_precision_training: Native AMP

Training Results

Step	Epoch	Training Loss	Validation Loss	Wer	Cer
100	2.2472	0.9126	0.8441	65.8109	25.6766
200	4.4944	0.202	0.9230	57.5795	21.3726
300	6.7416	0.0854	1.0011	58.8020	21.1788
400	8.9888	0.0497	1.0513	57.2535	20.7988
500	11.2360	0.0358	1.0700	57.9055	21.7216
600	13.4831	0.026	1.0964	56.2755	20.8918
700	15.7303	0.0199	1.1063	55.9495	20.2482
800	17.9775	0.0114	1.1709	56.0717	20.8530
900	20.2247	0.0084	1.1633	56.6830	20.4653
1000	22.4719	0.005	1.1659	54.8900	20.4110
1100	24.7191	0.0026	1.1591	54.6455	20.2714
1200	26.9663	0.001	1.1771	54.3602	19.6278
1300	29.2135	0.0005	1.1900	53.9527	19.4959
1400	31.4607	0.0004	1.1971	53.6675	19.3641
1500	33.7079	0.0003	1.2049	52.5672	19.0694
1600	35.9551	0.0003	1.2069	52.6895	19.1082
1700	38.2022	0.0002	1.2107	52.6487	19.0151
1800	40.4494	0.0002	1.2125	52.4450	19.0151
1900	42.6966	0.0002	1.2145	52.4450	19.0539
2000	44.9438	0.0002	1.2149	52.3635	19.0151

Framework versions

Transformers 4.44.1
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1

abscheik
/

whisper-samll