divinetaco commited on
Commit
e19f984
1 Parent(s): a807c59

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -57
README.md CHANGED
@@ -1,63 +1,62 @@
1
  ---
2
- base_model: []
 
 
 
3
  library_name: transformers
4
  tags:
 
 
5
  - mergekit
6
  - merge
7
-
8
  ---
9
- # ung-merge
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * Miqu-PlayMaid-70B-v0.1
22
- * Senku-70B-Full
23
-
24
- ### Configuration
25
-
26
- The following YAML configuration was used to produce this model:
27
-
28
- ```yaml
29
- merge_method: linear
30
- parameters:
31
- weight: 1.0
32
- slices:
33
- - sources:
34
- - model: Miqu-PlayMaid-70B-v0.1
35
- layer_range: [0, 17]
36
- - sources:
37
- - model: Senku-70B-Full
38
- layer_range: [10, 24]
39
- - sources:
40
- - model: Miqu-PlayMaid-70B-v0.1
41
- layer_range: [17, 32]
42
- - sources:
43
- - model: Senku-70B-Full
44
- layer_range: [24, 40]
45
- - sources:
46
- - model: Miqu-PlayMaid-70B-v0.1
47
- layer_range: [32, 48]
48
- - sources:
49
- - model: Senku-70B-Full
50
- layer_range: [40, 56]
51
- - sources:
52
- - model: Miqu-PlayMaid-70B-v0.1
53
- layer_range: [49, 63]
54
- - sources:
55
- - model: Senku-70B-Full
56
- layer_range: [56, 70]
57
- - sources:
58
- - model: Miqu-PlayMaid-70B-v0.1
59
- layer_range: [64, 80]
60
- dtype: float16
61
- tokenizer_source: model:Miqu-PlayMaid-70B-v0.1
62
-
63
- ```
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ base_model:
4
+ - Netrve/Miqu-PlayMaid-70B-v0.1
5
+ - ShinojiResearch/Senku-70B
6
  library_name: transformers
7
  tags:
8
+ - not-for-all-audiences
9
+ - nsfw
10
  - mergekit
11
  - merge
 
12
  ---
13
+ # aranea-tenebris-120b-v1.0-exl2
14
+ **aka Netrve/Miqu-PlayMaid-70B-v0.1 + ShinojiResearch/Senku-70B**
15
+ Model merge for uncensored creative writing and rp
16
+
17
+ ![image/png](https://huggingface.co/divinetaco/aranea-tenebris-120b-v1.0-exl2/resolve/main/aranea-tenebris.png)
18
+
19
+ A [mergekit](https://github.com/arcee-ai/mergekit) frankenmerge based on [Netrve/Miqu-PlayMaid-70B-v0.1](https://huggingface.co/Netrve/Miqu-PlayMaid-70B-v0.1) with interleaved layers of [ShinojiResearch/Senku-70B](https://huggingface.co/ShinojiResearch/Senku-70B).
20
+ This was the top performing model from a second series of merge experiments to create a highly coherant creative writing and rp model.
21
+ Tests consisted of a series of private DnD scenario benchmarks, with manual comparison of the most promising merges.
22
+
23
+ A number of different base models, interleave models and layer offsets were compared.
24
+ This model outperformed a number of other popular 70B+ models and merges in both creativity and coherancy tests. It was (briefly) compared to Mixtral 8x22B running 2/3/4 experts.
25
+
26
+ - Usable context: ~32768
27
+ - Recommended prompt format: Alpaca
28
+ - Layers: 137
29
+
30
+
31
+ ### Quantization
32
+
33
+ llama.cpp [imatrix.dat](./imatrix.dat)
34
+
35
+ Will upload a few quants when bandwidth permits.
36
+
37
+ ### Testing
38
+
39
+ Two different writing styles were considered for each testing scenario:
40
+ - Completions for 3rd person narration. No character role was assumed.
41
+ - Completions for 1st and 2nd person turn based (out-of-order) rp. A character role was assumed by the model, but narration of minor characters and events was encouraged.
42
+
43
+ Tests assumed a mature audience, but a range of scenarios were constructed.
44
+ Thematic inconsistancy or bias in character behaviour was penalized heavily.
45
+
46
+ Models showing the following were penalized during manual comparison:
47
+ - Consistently short responses.
48
+ - Laziness or readily gave up on solving a character problem.
49
+ - Overly malleable, where characters could not hold opinions or beliefs.
50
+ - Passiveness or an inability to drive the narrative.
51
+ - Persistent repeats. Bad merges tend to latch onto and reuse specific keywords.
52
+ - Ignoring or missing obvious scenario solutions.
53
+ - Impersonating other major characters out of turn during rp tests.
54
+ - Faliure to follow a character's description. This criteria is pretty broad, and could include things like character skills, refusals etc.
55
+ - Major inconsistencies in scenes or recall. Note - invention of thematically consistant detail was encouraged.
56
+
57
+ ### Interesting observations from benchmarking
58
+
59
+ - 10 layer interleave stride with a 20 layer interleave width consistently outperformed alternative combinations for coherancy.
60
+ - 8 layer interleave stride with a 16 layer interleave width consistantly outperformed alternative combinations for creativity whilst remaining reasonably coherant.
61
+ - Regular stride intervals are not optimal. In particular offsetting the first or last set of base models offets often improved metrics.
62
+ - Goliath-120B is still a good standard for coherancy below 4096 context. A few miqu-1 merges are comparable, but testing found a small amount coherancy could be sacrificed for notable creativity improvements.