eek commited on
Commit
f28fc55
1 Parent(s): c2ba064

Update README.me with new instructiosn

Browse files

ADD: Mention that mlx and mlx-lm on pip now support DBRX.
ADD: Also add `--use-default-chat-template` variable
ADD: Also add fix for python usage with `add_generation_prompt=True`

Files changed (1) hide show
  1. README.md +9 -32
README.md CHANGED
@@ -42,47 +42,27 @@ python -m mlx_lm.convert --hf-path databricks/dbrx-instruct -q --upload-repo mlx
42
 
43
  ## Use with mlx
44
 
 
 
45
  ```bash
46
- git clone [email protected]:ml-explore/mlx-examples.git
47
- cd mlx-examples/llms/
48
- python setup.py build
49
- python setup.py install
50
 
51
- python -m mlx_lm.generate --model mlx-community/dbrx-instruct-4bit --prompt "Hello" --trust-remote-code --max-tokens 500
52
  ```
53
 
54
- Remember, this is an Instruct model, so you will need to use the instruct prompt template:
55
 
56
  ## Example:
57
 
58
- ```text
59
- <|im_start|>system
60
- You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.
61
- YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.
62
- You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).
63
- (You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)
64
- This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.
65
- YOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.<|im_end|>
66
- <|im_start|>user
67
- What's the difference between PCA vs UMAP vs t-SNE?<|im_end|>
68
- <|im_start|>assistant
69
- The difference
70
- ```
71
-
72
- I've also added some extra words for the assistant start otherwise the model would instantly add an `<|im_end|>` token and stop.
73
- If `<|im_start|>assistant` is not added, an error will appear.
74
-
75
- You can add the above in a text file and use it as:
76
-
77
  ```bash
78
- python -m mlx_lm.generate --model mlx-community/dbrx-instruct-4bit --prompt "$(cat my_prompt.txt)" --trust-remote-code --max-tokens 1000
79
  ```
80
 
81
  Output:
82
 
83
 
84
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630f2745982455e61cc5fb1d/UUeNmuNipYwN7FVrT9KSq.png)
85
-
86
 
87
  On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.
88
 
@@ -109,10 +89,7 @@ chat = [
109
  {"role": "assistant", "content": "The "},
110
  ]
111
 
112
- prompt = tokenizer.apply_chat_template(chat, tokenize=False)
113
-
114
- # We need to remove the last <|im_end|> token so that the AI continues generation
115
- prompt = prompt[::-1].replace("<|im_end|>"[::-1], "", 1)[::-1]
116
 
117
  response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.6, max_tokens=1500)
118
  ```
 
42
 
43
  ## Use with mlx
44
 
45
+ Make you you first upgrade mlx-lm and mlx to the latest.
46
+
47
  ```bash
48
+ pip install mlx --upgrade
49
+ pip install mlx-lm --upgrade
 
 
50
 
51
+ python -m mlx_lm.generate --model mlx-community/dbrx-instruct-4bit --prompt "Hello" --trust-remote-code --use-default-chat-template --max-tokens 500
52
  ```
53
 
54
+ Remember, this is an Instruct model, so you will need to use the instruct prompt template by appending `--use-default-chat-template`
55
 
56
  ## Example:
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ```bash
59
+ python -m mlx_lm.generate --model dbrx-instruct-4bit --prompt "What's the difference between PCA vs UMAP vs t-SNE?" --trust-remote-code --use-default-chat-template --max-tokens 1000
60
  ```
61
 
62
  Output:
63
 
64
 
65
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630f2745982455e61cc5fb1d/AW-tHeKBOAVEfqP1zNH0j.png)
 
66
 
67
  On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.
68
 
 
89
  {"role": "assistant", "content": "The "},
90
  ]
91
 
92
+ prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
 
 
 
93
 
94
  response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.6, max_tokens=1500)
95
  ```