File size: 1,368 Bytes
b7c7474
47e53f9
f37de11
 
 
 
 
 
 
 
 
6f2e91d
 
f37de11
 
 
 
 
 
6f2e91d
981f3b2
6f2e91d
b7c7474
 
 
 
 
 
47e53f9
b7c7474
47e53f9
f37de11
192f79a
f37de11
47e53f9
f37de11
 
47e53f9
f37de11
47e53f9
 
b7c7474
f37de11
 
 
981f3b2
47e53f9
f37de11
b7c7474
 
 
47e53f9
f37de11
47e53f9
 
b7c7474
981f3b2
 
f37de11
b7c7474
47e53f9
b7c7474
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
library_name: transformers
license: llama3.1
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
model-index:
- name: Meta-Llama-3.1-70B-Instruct-NF4
  results: []
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is a quantized version of `Llama 3.1 70B Instruct`. Quantized to **4-bit** using `bistandbytes` and `accelerate`.

- **Developed by:** Farid Saud @ DSRS
- **License:** llama3.1
- **Base Model:** meta-llama/Meta-Llama-3.1-70B-Instruct

## Use this model


Use a pipeline as a high-level helper:
```python
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")
pipe(messages)
```



Load model directly
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")
model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")
```

The base model information can be found in the original [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)