JohnnyBoy00 commited on
Commit
9756ca1
1 Parent(s): 6932723

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -78,14 +78,14 @@ The following hyperparameters were used during training:
78
 
79
  ## Evaluation results
80
 
81
- The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
- | Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
- | --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
87
- | test_unseen_answers | 43.6 | 45.3 | 57.4 | 55.0 | 81.0 | 79.4 | 71.3 |
88
- | test_unseen_questions | 3.0 | 4.2 | 19.9 | 16.1 | 60.0 | 54.4 | 53.2 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
 
78
 
79
  ## Evaluation results
80
 
81
+ The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
+ | Split | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
+ | --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
87
+ | test_unseen_answers | 43.6 | 45.3 | 57.4 | 55.0 | 81.0 | 79.4 | 71.3 |
88
+ | test_unseen_questions | 3.0 | 4.2 | 19.9 | 16.1 | 60.0 | 54.4 | 53.2 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.