lipcut commited on
Commit
a44cc75
1 Parent(s): a23d49b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -23
README.md CHANGED
@@ -13,57 +13,68 @@ base_model:
13
  - MediaTek-Research/Breeze-7B-Instruct-v0_1
14
  ---
15
 
16
- # shizhi-twilight-7B
 
 
 
17
 
18
- shizhi-twilight-7B is a Mixure of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
 
19
  * [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)
 
 
 
 
 
 
 
20
  * [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
 
 
 
21
 
22
  ## 🧩 Configuration
23
 
24
  ```yaml
25
- models:
26
- - model: MediaTek-Research/Breeze-7B-Instruct-v0_1
27
- # No parameters necessary for base model
28
- - model: argilla/CapybaraHermes-2.5-Mistral-7B
29
- parameters:
30
- density: 0.53
31
- weight: 0.95
32
- merge_method: dare_ties
33
  base_model: MediaTek-Research/Breeze-7B-Instruct-v0_1
34
  parameters:
35
- int8_mask: true
36
- normalize: true
37
- experts:
38
- - source_model: argilla/CapybaraHermes-2.5-Mistral-7B
39
- positive_prompts:
40
- - "Peform the following tasks with your best ability"
41
- - source_model: MediaTek-Research/Breeze-7B-Instruct-v0_1
42
- positive_prompts:
43
- - "You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan."
44
  dtype: bfloat16
45
  ```
46
 
47
  ## 💻 Usage
48
 
49
  ```python
50
- !pip install -qU transformers bitsandbytes accelerate
51
 
52
  from transformers import AutoTokenizer
53
  import transformers
54
  import torch
55
 
56
  model = "lipcut/shizhi-twilight-7B"
 
57
 
58
  tokenizer = AutoTokenizer.from_pretrained(model)
 
59
  pipeline = transformers.pipeline(
60
  "text-generation",
61
  model=model,
62
- model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
 
63
  )
64
 
65
- messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
66
- prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
67
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
68
  print(outputs[0]["generated_text"])
69
  ```
 
13
  - MediaTek-Research/Breeze-7B-Instruct-v0_1
14
  ---
15
 
16
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6409720c9e9f790c905ba4bf/v6B0CkdpR74oCetV3w0y-.png)
17
+
18
+
19
+ # 試製-暮光-7B
20
 
21
+ 試製-暮光-7B 是用[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)融合以下模型生成的:
22
+ * [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
23
  * [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)
24
+
25
+ 這是一個實驗模型,目的是爲了檢驗套用在不同語言上的高品質模型調教是否能夠轉移(此模型爲英文到中文)。
26
+
27
+
28
+ # shizhi-twilight-7B
29
+
30
+ shizhi-twilight-7B is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
31
  * [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
32
+ * [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)
33
+
34
+ This is an experiment product on checking whether high quality fine-tuning on one language (English) could be transferred to another language (Mandarin) leveraging Slerp merge method.
35
 
36
  ## 🧩 Configuration
37
 
38
  ```yaml
39
+ slices:
40
+ - sources:
41
+ - model: MediaTek-Research/Breeze-7B-Instruct-v0_1
42
+ layer_range: [0, 32]
43
+ - model: argilla/CapybaraHermes-2.5-Mistral-7B
44
+ layer_range: [0, 32]
45
+ merge_method: slerp
 
46
  base_model: MediaTek-Research/Breeze-7B-Instruct-v0_1
47
  parameters:
48
+ t:
49
+ - filter: self_attn
50
+ value: [0, 0.5, 0.3, 0.7, 1]
51
+ - filter: mlp
52
+ value: [1, 0.5, 0.7, 0.3, 0]
53
+ - value: 0.5
 
 
 
54
  dtype: bfloat16
55
  ```
56
 
57
  ## 💻 Usage
58
 
59
  ```python
60
+ !pip install -qU transformers accelerate
61
 
62
  from transformers import AutoTokenizer
63
  import transformers
64
  import torch
65
 
66
  model = "lipcut/shizhi-twilight-7B"
67
+ messages = [{"role": "user", "content": "什麼是大型語言模型?"}]
68
 
69
  tokenizer = AutoTokenizer.from_pretrained(model)
70
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
71
  pipeline = transformers.pipeline(
72
  "text-generation",
73
  model=model,
74
+ torch_dtype=torch.float16,
75
+ device_map="auto",
76
  )
77
 
 
 
78
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
79
  print(outputs[0]["generated_text"])
80
  ```