leaderboard-pr-bot commited on
Commit
c4695db
1 Parent(s): 4e5eb39

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -5,16 +5,119 @@ language:
5
  - 'no'
6
  - da
7
  license: mit
8
- models:
9
- - timpal0l/Mistral-7B-v0.1-flashback-v2-instruct
10
  tags:
11
  - pretrained
12
  - flashback
13
  - web
14
  - conversational
 
 
15
  pipeline_tag: text-generation
16
  widget:
17
  - text: Jag tycker att det är roligt med
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
  # 🐈‍⬛ Mistral-7B-v0.1-flashback-v2
@@ -180,4 +283,17 @@ Tack och lov för apparna som jag kunde leda oss efter. Att åka kollektivt hade
180
 
181
  Tack ska ni ha för tipsen, igen. Tack till Stockholm för att ni tog emot oss med respekt han var så nöjd med resan.
182
  Hej så länge, vi kommer åter i framtiden! 😁
183
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - 'no'
6
  - da
7
  license: mit
 
 
8
  tags:
9
  - pretrained
10
  - flashback
11
  - web
12
  - conversational
13
+ models:
14
+ - timpal0l/Mistral-7B-v0.1-flashback-v2-instruct
15
  pipeline_tag: text-generation
16
  widget:
17
  - text: Jag tycker att det är roligt med
18
+ model-index:
19
+ - name: Mistral-7B-v0.1-flashback-v2
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: AI2 Reasoning Challenge (25-Shot)
26
+ type: ai2_arc
27
+ config: ARC-Challenge
28
+ split: test
29
+ args:
30
+ num_few_shot: 25
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 57.17
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: HellaSwag (10-Shot)
43
+ type: hellaswag
44
+ split: validation
45
+ args:
46
+ num_few_shot: 10
47
+ metrics:
48
+ - type: acc_norm
49
+ value: 80.74
50
+ name: normalized accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: MMLU (5-Shot)
59
+ type: cais/mmlu
60
+ config: all
61
+ split: test
62
+ args:
63
+ num_few_shot: 5
64
+ metrics:
65
+ - type: acc
66
+ value: 59.98
67
+ name: accuracy
68
+ source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: TruthfulQA (0-shot)
76
+ type: truthful_qa
77
+ config: multiple_choice
78
+ split: validation
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: mc2
83
+ value: 40.66
84
+ source:
85
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: Winogrande (5-shot)
92
+ type: winogrande
93
+ config: winogrande_xl
94
+ split: validation
95
+ args:
96
+ num_few_shot: 5
97
+ metrics:
98
+ - type: acc
99
+ value: 77.19
100
+ name: accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: GSM8k (5-shot)
109
+ type: gsm8k
110
+ config: main
111
+ split: test
112
+ args:
113
+ num_few_shot: 5
114
+ metrics:
115
+ - type: acc
116
+ value: 29.42
117
+ name: accuracy
118
+ source:
119
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
120
+ name: Open LLM Leaderboard
121
  ---
122
 
123
  # 🐈‍⬛ Mistral-7B-v0.1-flashback-v2
 
283
 
284
  Tack ska ni ha för tipsen, igen. Tack till Stockholm för att ni tog emot oss med respekt han var så nöjd med resan.
285
  Hej så länge, vi kommer åter i framtiden! 😁
286
+ ```
287
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
288
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_timpal0l__Mistral-7B-v0.1-flashback-v2)
289
+
290
+ | Metric |Value|
291
+ |---------------------------------|----:|
292
+ |Avg. |57.53|
293
+ |AI2 Reasoning Challenge (25-Shot)|57.17|
294
+ |HellaSwag (10-Shot) |80.74|
295
+ |MMLU (5-Shot) |59.98|
296
+ |TruthfulQA (0-shot) |40.66|
297
+ |Winogrande (5-shot) |77.19|
298
+ |GSM8k (5-shot) |29.42|
299
+