Edit model card

whisper-small-hassaniya

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

  • Wer: 52.3635
  • Cer: 19.0151

Model description

This model utilizes state-of-the-art techniques in AI and NLP to provide efficient and accurate automatic speech recognition for the Hassanya dialect. This initiative addresses both a technological need and a cultural imperative to preserve a linguistically unique form of Arabic.

Intended Uses & Limitations

This model is intended for use in professional transcription services and linguistic research. It can facilitate the creation of accurate textual representations of Hassanya speech, contributing to digital heritage preservation and linguistic studies. Users should note that performance may vary based on the audio quality and the speaker's accent.

Training and Evaluation Data

The model was trained on a curated dataset of Hassanya audio recordings collected through AudioScribe, an application dedicated to high-quality data collection. The dataset is divided into three subsets with the following total audio lengths:

  • Training set: 5 hours 30 minutes
  • Testing set: 4 minutes
  • Evaluation set: 18 minutes

This diverse dataset includes various speech samples from native speakers across different age groups and genders to ensure robust model performance.

Model Performance

The model has been evaluated at several stages to assess its accuracy and efficiency both before and after training. Below are the key performance metrics:

  • Pre-training Evaluation on Eval Set

    • WER: 108.7612
    • CER: 63.9705
  • Post-training Evaluation on Eval Set

    • WER: 52.3634
    • CER: 19.0151
  • Post-training Evaluation on Test Set

    • WER: 53.4791
    • CER: 19.6108

These results show a significant improvement in all metrics over the course of training, especially in terms of the Word Error Rate and Character Error Rate, demonstrating the model's growing accuracy and efficiency in recognizing Hassanya speech.

Training Procedure

Resource Capacity During Training

The training session was conducted on Google Colab Pro, which provided the following resource capacities:

  • System RAM: The environment was equipped with 51 GB of system memory, offering ample capacity for handling data processing and model operations during training.
  • GPU RAM: The GPU provided had a capacity of 15 GB of memory, which is well-suited for training large models or handling substantial batch sizes, especially when using advanced techniques like mixed-precision training.
  • Disk Space: A total of 201.2 GB of disk space was available, ensuring sufficient storage for datasets, model checkpoints, and logs throughout the training process.

Training Duration

The training of the model was completed in 160.32 minutes (approximately 2 hours 40 minutes).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 2000
  • mixed_precision_training: Native AMP

Training Results

Step Epoch Training Loss Validation Loss Wer Cer
100 2.2472 0.9126 0.8441 65.8109 25.6766
200 4.4944 0.202 0.9230 57.5795 21.3726
300 6.7416 0.0854 1.0011 58.8020 21.1788
400 8.9888 0.0497 1.0513 57.2535 20.7988
500 11.2360 0.0358 1.0700 57.9055 21.7216
600 13.4831 0.026 1.0964 56.2755 20.8918
700 15.7303 0.0199 1.1063 55.9495 20.2482
800 17.9775 0.0114 1.1709 56.0717 20.8530
900 20.2247 0.0084 1.1633 56.6830 20.4653
1000 22.4719 0.005 1.1659 54.8900 20.4110
1100 24.7191 0.0026 1.1591 54.6455 20.2714
1200 26.9663 0.001 1.1771 54.3602 19.6278
1300 29.2135 0.0005 1.1900 53.9527 19.4959
1400 31.4607 0.0004 1.1971 53.6675 19.3641
1500 33.7079 0.0003 1.2049 52.5672 19.0694
1600 35.9551 0.0003 1.2069 52.6895 19.1082
1700 38.2022 0.0002 1.2107 52.6487 19.0151
1800 40.4494 0.0002 1.2125 52.4450 19.0151
1900 42.6966 0.0002 1.2145 52.4450 19.0539
2000 44.9438 0.0002 1.2149 52.3635 19.0151

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for abscheik/whisper-samll

Finetuned
this model

Evaluation results