Adding Evaluation Results

5a9da09 verified 7 months ago

6.81 kB

	---
	license: cc-by-nc-4.0
	tags:
	- merge
	model-index:
	- name: Loyal-Toppy-Bruins-Maid-7B-DARE
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 68.86
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 86.03
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.84
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 61.19
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 79.72
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.81
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
	name: Open LLM Leaderboard
	---

	![image/png](https://huggingface.co/SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE/resolve/main/bruins-maid.png)

	<!-- description start -->
	## Description

	This repository hosts FP16 files for Loyal-Toppy-Bruins-Maid-7B, a 7B model aimed at having engaging RP with solid character card adherence and being a smart cookie at the same time.

	Its foundation is [Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha), notable for its performance in the LMSYS Chatbot Arena, even surpassing GPT-3.5-Turbo-1106. The model incorporates [rwitz/go-bruins-v2](https://huggingface.co/rwitz/go-bruins-v2), a [Q-bert/MetaMath-Cybertron-Starling](https://huggingface.co/Q-bert/MetaMath-Cybertron-Starling) derivative with Alpaca RP data tuning.

	The other foundational model is [chargoddard/loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), chosen for its strong RP performance and Alpaca format training, with a diverse dataset including PIPPA, rpbuild, and LimaRP.

	[Undi95/Toppy-M-7B](https://huggingface.co/Undi95/Toppy-M-7B), known for its creativity, brings in useful RP data from various sources. It ranks first among 7B models on [OpenRouter](https://openrouter.ai/rankings) for a good reason.

	[NeverSleep/Noromaid-7b-v0.1.1](https://huggingface.co/NeverSleep/Noromaid-7b-v0.1.1), a Mistral finetune with unique RP data not present in other models, was also added for bringing in a unique RP dataset and being a well-regarded RP model.

	The models were merged using the DARE ties method, with a targeted 1.2 absolute weight and high density (0.5-0.6), as discussed in the [MergeKit GitHub Repo](https://github.com/cg123/mergekit/issues/26).

	Currently, this model ranks at the top of my personal RP unit test benchmark and scored a very solid 20 on [lilblam's LLM Logic Test](https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=1278290632). My first impressions of it for RPing are very good but, admittedly, this model came out of the oven today so I haven't played it with it too much 😊

	### The sauce
	```
	models: # Top-Loyal-Bruins-Maid-DARE-7B_v2
	- model: mistralai/Mistral-7B-v0.1
	# no parameters necessary for base model
	- model: rwitz/go-bruins-v2 # MetamathCybertronStarling base
	parameters:
	weight: 0.5
	density: 0.6
	- model: chargoddard/loyal-piano-m7 # Pull in some PIPPA/LimaRP/Orca/rpguild
	parameters:
	weight: 0.5
	density: 0.6
	- model: Undi95/Toppy-M-7B
	parameters:
	weight: 0.1
	density: 0.5
	- model: NeverSleep/Noromaid-7b-v0.1.1
	parameters:
	weight: 0.1
	density: 0.5
	merge_method: dare_ties
	base_model: mistralai/Mistral-7B-v0.1
	parameters:
	normalize: false
	int8_mask: true
	dtype: bfloat16
	```

	<!-- description end -->
	<!-- prompt-template start -->
	## Prompt template: Custom format, or Alpaca

	### Custom format:
	I found the best SillyTavern results from using the Noromaid template.

	SillyTavern config files: [Context](https://files.catbox.moe/ifmhai.json), [Instruct](https://files.catbox.moe/ttw1l9.json).

	Otherwise, I tried to ensure that all of the underlying merged models were Alpaca favored.

	### Alpaca:
	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	{prompt}

	### Response:

	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SanjiWatsuki__Loyal-Toppy-Bruins-Maid-7B-DARE)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|71.07\|
	\|AI2 Reasoning Challenge (25-Shot)\|68.86\|
	\|HellaSwag (10-Shot) \|86.03\|
	\|MMLU (5-Shot) \|64.84\|
	\|TruthfulQA (0-shot) \|61.19\|
	\|Winogrande (5-shot) \|79.72\|
	\|GSM8k (5-shot) \|65.81\|