Yuma42 leaderboard-pr-bot commited on
Commit
a6ea7a8
1 Parent(s): 54f56d4

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (7c5beb74ce47eada2e620857fc335f63059c2421)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -114,6 +114,98 @@ model-index:
114
  source:
115
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
116
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ---
118
 
119
  # KangalKhan-RawRuby-7B
@@ -202,3 +294,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
202
  |Winogrande (5-shot) |78.69|
203
  |GSM8k (5-shot) |62.02|
204
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  source:
115
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
116
  name: Open LLM Leaderboard
117
+ - task:
118
+ type: text-generation
119
+ name: Text Generation
120
+ dataset:
121
+ name: IFEval (0-Shot)
122
+ type: HuggingFaceH4/ifeval
123
+ args:
124
+ num_few_shot: 0
125
+ metrics:
126
+ - type: inst_level_strict_acc and prompt_level_strict_acc
127
+ value: 54.77
128
+ name: strict accuracy
129
+ source:
130
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: BBH (3-Shot)
137
+ type: BBH
138
+ args:
139
+ num_few_shot: 3
140
+ metrics:
141
+ - type: acc_norm
142
+ value: 26.39
143
+ name: normalized accuracy
144
+ source:
145
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
146
+ name: Open LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: MATH Lvl 5 (4-Shot)
152
+ type: hendrycks/competition_math
153
+ args:
154
+ num_few_shot: 4
155
+ metrics:
156
+ - type: exact_match
157
+ value: 5.97
158
+ name: exact match
159
+ source:
160
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
161
+ name: Open LLM Leaderboard
162
+ - task:
163
+ type: text-generation
164
+ name: Text Generation
165
+ dataset:
166
+ name: GPQA (0-shot)
167
+ type: Idavidrein/gpqa
168
+ args:
169
+ num_few_shot: 0
170
+ metrics:
171
+ - type: acc_norm
172
+ value: 5.03
173
+ name: acc_norm
174
+ source:
175
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
176
+ name: Open LLM Leaderboard
177
+ - task:
178
+ type: text-generation
179
+ name: Text Generation
180
+ dataset:
181
+ name: MuSR (0-shot)
182
+ type: TAUR-Lab/MuSR
183
+ args:
184
+ num_few_shot: 0
185
+ metrics:
186
+ - type: acc_norm
187
+ value: 7.64
188
+ name: acc_norm
189
+ source:
190
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
191
+ name: Open LLM Leaderboard
192
+ - task:
193
+ type: text-generation
194
+ name: Text Generation
195
+ dataset:
196
+ name: MMLU-PRO (5-shot)
197
+ type: TIGER-Lab/MMLU-Pro
198
+ config: main
199
+ split: test
200
+ args:
201
+ num_few_shot: 5
202
+ metrics:
203
+ - type: acc
204
+ value: 22.48
205
+ name: accuracy
206
+ source:
207
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Yuma42/KangalKhan-RawRuby-7B
208
+ name: Open LLM Leaderboard
209
  ---
210
 
211
  # KangalKhan-RawRuby-7B
 
294
  |Winogrande (5-shot) |78.69|
295
  |GSM8k (5-shot) |62.02|
296
 
297
+
298
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
299
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Yuma42__KangalKhan-RawRuby-7B)
300
+
301
+ | Metric |Value|
302
+ |-------------------|----:|
303
+ |Avg. |20.38|
304
+ |IFEval (0-Shot) |54.77|
305
+ |BBH (3-Shot) |26.39|
306
+ |MATH Lvl 5 (4-Shot)| 5.97|
307
+ |GPQA (0-shot) | 5.03|
308
+ |MuSR (0-shot) | 7.64|
309
+ |MMLU-PRO (5-shot) |22.48|
310
+