How to quantize this model using QLoRA ?
I'm trying to convert this model into 4bit but somehow it's falling while getting a response.
ValueError: The following model_kwargs
are not used by the model: ['token_type_ids'] (note: typos in the generate
arguments will also show up in this list)
The code is below :
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/falcon-7b"
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
text = "Hello my name is"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=60)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
See this discussion for a solution π