pszemraj leaderboard-pr-bot commited on
Commit
7a21a72
1 Parent(s): 650dbcf

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (b4eb72d30124e838e8808edb79c2b0ff85162a60)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +148 -37
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
  license: apache-2.0
3
- base_model: BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile
4
  tags:
5
  - generated_from_trainer
6
  - smol_llama
7
  - llama2
8
  metrics:
9
  - accuracy
 
10
  inference:
11
  parameters:
12
  max_new_tokens: 64
@@ -17,43 +17,140 @@ inference:
17
  eta_cutoff: 0.001
18
  renormalize_logits: true
19
  widget:
20
- - text: My name is El Microondas the Wise and
21
- example_title: El Microondas
22
- - text: Kennesaw State University is a public
23
- example_title: Kennesaw State University
24
- - text: >-
25
- Bungie Studios is an American video game developer. They are most famous
26
- for developing the award winning Halo series of video games. They also
27
- made Destiny. The studio was founded
28
- example_title: Bungie
29
- - text: The Mona Lisa is a world-renowned painting created by
30
- example_title: Mona Lisa
31
- - text: >-
32
- The Harry Potter series, written by J.K. Rowling, begins with the book
33
- titled
34
- example_title: Harry Potter Series
35
- - text: >-
36
- Question: I have cities, but no houses. I have mountains, but no trees. I
37
- have water, but no fish. What am I?
38
-
39
- Answer:
40
- example_title: Riddle
41
- - text: The process of photosynthesis involves the conversion of
42
- example_title: Photosynthesis
43
- - text: >-
44
- Jane went to the store to buy some groceries. She picked up apples,
45
- oranges, and a loaf of bread. When she got home, she realized she forgot
46
- example_title: Story Continuation
47
- - text: >-
48
- Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
49
- and another train leaves Station B at 10:00 AM and travels at 80 mph, when
50
- will they meet if the distance between the stations is 300 miles?
51
-
52
- To determine
53
- example_title: Math Problem
54
- - text: In the context of computer programming, an algorithm is
55
- example_title: Algorithm Definition
56
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ---
58
 
59
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -129,3 +226,17 @@ The following hyperparameters were used during training:
129
  - Pytorch 2.1.0
130
  - Datasets 2.15.0
131
  - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - generated_from_trainer
5
  - smol_llama
6
  - llama2
7
  metrics:
8
  - accuracy
9
+ base_model: BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile
10
  inference:
11
  parameters:
12
  max_new_tokens: 64
 
17
  eta_cutoff: 0.001
18
  renormalize_logits: true
19
  widget:
20
+ - text: My name is El Microondas the Wise and
21
+ example_title: El Microondas
22
+ - text: Kennesaw State University is a public
23
+ example_title: Kennesaw State University
24
+ - text: Bungie Studios is an American video game developer. They are most famous for
25
+ developing the award winning Halo series of video games. They also made Destiny.
26
+ The studio was founded
27
+ example_title: Bungie
28
+ - text: The Mona Lisa is a world-renowned painting created by
29
+ example_title: Mona Lisa
30
+ - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
31
+ example_title: Harry Potter Series
32
+ - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
33
+ have water, but no fish. What am I?
34
+
35
+ Answer:'
36
+ example_title: Riddle
37
+ - text: The process of photosynthesis involves the conversion of
38
+ example_title: Photosynthesis
39
+ - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
40
+ and a loaf of bread. When she got home, she realized she forgot
41
+ example_title: Story Continuation
42
+ - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
43
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
44
+ they meet if the distance between the stations is 300 miles?
45
+
46
+ To determine'
47
+ example_title: Math Problem
48
+ - text: In the context of computer programming, an algorithm is
49
+ example_title: Algorithm Definition
 
 
 
 
 
 
50
  pipeline_tag: text-generation
51
+ model-index:
52
+ - name: NanoLlama-GQA-L10-A32_KV8-v13-KI
53
+ results:
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: AI2 Reasoning Challenge (25-Shot)
59
+ type: ai2_arc
60
+ config: ARC-Challenge
61
+ split: test
62
+ args:
63
+ num_few_shot: 25
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 23.81
67
+ name: normalized accuracy
68
+ source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: HellaSwag (10-Shot)
76
+ type: hellaswag
77
+ split: validation
78
+ args:
79
+ num_few_shot: 10
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 29.39
83
+ name: normalized accuracy
84
+ source:
85
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MMLU (5-Shot)
92
+ type: cais/mmlu
93
+ config: all
94
+ split: test
95
+ args:
96
+ num_few_shot: 5
97
+ metrics:
98
+ - type: acc
99
+ value: 25.37
100
+ name: accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: TruthfulQA (0-shot)
109
+ type: truthful_qa
110
+ config: multiple_choice
111
+ split: validation
112
+ args:
113
+ num_few_shot: 0
114
+ metrics:
115
+ - type: mc2
116
+ value: 44.77
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
119
+ name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: Winogrande (5-shot)
125
+ type: winogrande
126
+ config: winogrande_xl
127
+ split: validation
128
+ args:
129
+ num_few_shot: 5
130
+ metrics:
131
+ - type: acc
132
+ value: 51.14
133
+ name: accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
136
+ name: Open LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: GSM8k (5-shot)
142
+ type: gsm8k
143
+ config: main
144
+ split: test
145
+ args:
146
+ num_few_shot: 5
147
+ metrics:
148
+ - type: acc
149
+ value: 0.91
150
+ name: accuracy
151
+ source:
152
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
153
+ name: Open LLM Leaderboard
154
  ---
155
 
156
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
226
  - Pytorch 2.1.0
227
  - Datasets 2.15.0
228
  - Tokenizers 0.15.0
229
+
230
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
231
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__NanoLlama-GQA-L10-A32_KV8-v13-KI)
232
+
233
+ | Metric |Value|
234
+ |---------------------------------|----:|
235
+ |Avg. |29.23|
236
+ |AI2 Reasoning Challenge (25-Shot)|23.81|
237
+ |HellaSwag (10-Shot) |29.39|
238
+ |MMLU (5-Shot) |25.37|
239
+ |TruthfulQA (0-shot) |44.77|
240
+ |Winogrande (5-shot) |51.14|
241
+ |GSM8k (5-shot) | 0.91|
242
+