feeltheAGI commited on
Commit
061dedb
1 Parent(s): d1cfa8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -65,3 +65,27 @@ This model is a merge of the following models:
65
  |agieval_sat_math | 0|acc |0.3227|± |0.0316|
66
  | | |acc_norm|0.3045|± |0.0311|
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  |agieval_sat_math | 0|acc |0.3227|± |0.0316|
66
  | | |acc_norm|0.3045|± |0.0311|
67
 
68
+
69
+ ### Bigbench
70
+
71
+ | Task |Version| Metric |Value | |Stderr|
72
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
73
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5684|± |0.0360|
74
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.6612|± |0.0247|
75
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.4380|± |0.0309|
76
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2173|± |0.0218|
77
+ | | |exact_str_match |0.0000|± |0.0000|
78
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.3320|± |0.0211|
79
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2243|± |0.0158|
80
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5667|± |0.0287|
81
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4260|± |0.0221|
82
+ |bigbench_navigate | 0|multiple_choice_grade|0.5310|± |0.0158|
83
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.7230|± |0.0100|
84
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.5379|± |0.0236|
85
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2956|± |0.0145|
86
+ |bigbench_snarks | 0|multiple_choice_grade|0.6961|± |0.0343|
87
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.7424|± |0.0139|
88
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4690|± |0.0158|
89
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2304|± |0.0119|
90
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1880|± |0.0093|
91
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5667|± |0.0287|