[Bug] Does not work

#3
by catid - opened

Used example script with latest pytorch, einops, and transformers but it does not work:

Traceback (most recent call last):
File "/home/catid/sources/supercharger/test_falcon_basic.py", line 8, in
pipeline = transformers.pipeline(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/init.py", line 788, in pipeline
framework, model = infer_framework_load_model(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/base.py", line 278, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Possibly related, I get The model 'RWForCausalLM' is not supported for text-generation

I do see that this warning pops up on 7b, which goes on to work fine, so might be a misleading warning here, just thought I'd share it.

the Model doesn't work. I get the same error on 40B

ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

Oh good thought I was just doing something dumb

I am able to run the model on my end but the answer just keeps going and does not end. Also pretty slow in streaming response. Running on 96gb 4 A10G's.

model = AutoModelForCausalLM.from_pretrained(mname,trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')

loading like this and im getting error after one answer:
RuntimeError: The size of tensor a (9) must match the size of tensor b (488) at non-singleton dimension 1

Have we solved the problem?

Facing the same issue, how do I solve?

ValueError: Could not load model tiiuae/falcon-7b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Whe using this code:

https://huggingface.co/tiiuae/falcon-40b

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b" # Gebruik evt het kleinere broertje: tiiuae/falcon-7b

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

I get this output:

Downloading (โ€ฆ)okenizer_config.json: 100%
175/175 [00:00<00:00, 6.61kB/s]
Downloading (โ€ฆ)/main/tokenizer.json: 100%
2.73M/2.73M [00:00<00:00, 5.61MB/s]
Downloading (โ€ฆ)cial_tokens_map.json: 100%
281/281 [00:00<00:00, 1.34kB/s]
Downloading (โ€ฆ)lve/main/config.json: 100%
656/656 [00:00<00:00, 947B/s]
Downloading (โ€ฆ)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 3.46kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:

  • configuration_RW.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Downloading (โ€ฆ)main/modelling_RW.py: 100%
    47.1k/47.1k [00:00<00:00, 108kB/s]
    A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:
  • modelling_RW.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Downloading (โ€ฆ)model.bin.index.json: 100%
    39.3k/39.3k [00:00<00:00, 697kB/s]
    Downloading shards: 67%
    6/9 [05:54<02:52, 57.54s/it]
    Downloading (โ€ฆ)l-00001-of-00009.bin: 100%
    9.50G/9.50G [00:46<00:00, 258MB/s]
    Downloading (โ€ฆ)l-00002-of-00009.bin: 100%
    9.51G/9.51G [01:14<00:00, 257MB/s]
    Downloading (โ€ฆ)l-00003-of-00009.bin: 100%
    9.51G/9.51G [00:50<00:00, 262MB/s]
    Downloading (โ€ฆ)l-00004-of-00009.bin: 100%
    9.51G/9.51G [00:55<00:00, 246MB/s]
    Downloading (โ€ฆ)l-00005-of-00009.bin: 100%
    9.51G/9.51G [00:57<00:00, 224MB/s]
    Downloading (โ€ฆ)l-00006-of-00009.bin: 100%
    9.51G/9.51G [00:58<00:00, 170MB/s]
    Downloading (โ€ฆ)l-00007-of-00009.bin: 18%
    1.74G/9.51G [00:12<00:44, 174MB/s]
    โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
    โ”‚ in <cell line: 13>:13 โ”‚
    โ”‚ โ”‚

โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py:788 in pipeline โ”‚
โ”‚ โ”‚
โ”‚ 785 โ”‚ # Forced if framework already defined, inferred if it's None โ”‚
โ”‚ 786 โ”‚ # Will load the correct model if possible โ”‚
โ”‚ 787 โ”‚ model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} โ”‚
โ”‚ โฑ 788 โ”‚ framework, model = infer_framework_load_model( โ”‚
โ”‚ 789 โ”‚ โ”‚ model, โ”‚
โ”‚ 790 โ”‚ โ”‚ model_classes=model_classes, โ”‚
โ”‚ 791 โ”‚ โ”‚ config=config, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:279 in โ”‚
โ”‚ infer_framework_load_model โ”‚
โ”‚ โ”‚
โ”‚ 276 โ”‚ โ”‚ โ”‚ โ”‚ continue โ”‚
โ”‚ 277 โ”‚ โ”‚ โ”‚
โ”‚ 278 โ”‚ โ”‚ if isinstance(model, str): โ”‚
โ”‚ โฑ 279 โ”‚ โ”‚ โ”‚ raise ValueError(f"Could not load model {model} with any of the following cl โ”‚
โ”‚ 280 โ”‚ โ”‚
โ”‚ 281 โ”‚ framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__clas โ”‚
โ”‚ 282 โ”‚ return framework, model โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

pip install transformers
pip install einops
pip install accelerate
pip install xformers

if you pip this package , it maybe ok , the problem of " ValueError: Could not load model tiiuae/falcon-7b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>" maybe solved

I am loading model to A6000 GPU with 48GB ram. with torch.int8 . I am getting the same error.:
ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Kernel Restarting
The kernel for Desktop/LLM/Falcon/Fl.ipynb appears to have died. It will restart automatically.
it does not work on M2 Apple MacBook.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b"

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
# torch_dtype=torch.bfloat16,
trust_remote_code=True,
# device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

I was able to get past the AutoModelForCausalLM error in falcon-7b-instruct by using the line @alexwall77 provided below:

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)

Thank you, Alex!

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)



pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Whe using this code:

https://huggingface.co/tiiuae/falcon-40b

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b" # Gebruik evt het kleinere broertje: tiiuae/falcon-7b

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

I get this output:

Downloading (โ€ฆ)okenizer_config.json: 100%
175/175 [00:00<00:00, 6.61kB/s]
Downloading (โ€ฆ)/main/tokenizer.json: 100%
2.73M/2.73M [00:00<00:00, 5.61MB/s]
Downloading (โ€ฆ)cial_tokens_map.json: 100%
281/281 [00:00<00:00, 1.34kB/s]
Downloading (โ€ฆ)lve/main/config.json: 100%
656/656 [00:00<00:00, 947B/s]
Downloading (โ€ฆ)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 3.46kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:

  • configuration_RW.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Downloading (โ€ฆ)main/modelling_RW.py: 100%
    47.1k/47.1k [00:00<00:00, 108kB/s]
    A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:
  • modelling_RW.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Downloading (โ€ฆ)model.bin.index.json: 100%
    39.3k/39.3k [00:00<00:00, 697kB/s]
    Downloading shards: 67%
    6/9 [05:54<02:52, 57.54s/it]
    Downloading (โ€ฆ)l-00001-of-00009.bin: 100%
    9.50G/9.50G [00:46<00:00, 258MB/s]
    Downloading (โ€ฆ)l-00002-of-00009.bin: 100%
    9.51G/9.51G [01:14<00:00, 257MB/s]
    Downloading (โ€ฆ)l-00003-of-00009.bin: 100%
    9.51G/9.51G [00:50<00:00, 262MB/s]
    Downloading (โ€ฆ)l-00004-of-00009.bin: 100%
    9.51G/9.51G [00:55<00:00, 246MB/s]
    Downloading (โ€ฆ)l-00005-of-00009.bin: 100%
    9.51G/9.51G [00:57<00:00, 224MB/s]
    Downloading (โ€ฆ)l-00006-of-00009.bin: 100%
    9.51G/9.51G [00:58<00:00, 170MB/s]
    Downloading (โ€ฆ)l-00007-of-00009.bin: 18%
    1.74G/9.51G [00:12<00:44, 174MB/s]
    โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
    โ”‚ in <cell line: 13>:13 โ”‚
    โ”‚ โ”‚

โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py:788 in pipeline โ”‚
โ”‚ โ”‚
โ”‚ 785 โ”‚ # Forced if framework already defined, inferred if it's None โ”‚
โ”‚ 786 โ”‚ # Will load the correct model if possible โ”‚
โ”‚ 787 โ”‚ model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} โ”‚
โ”‚ โฑ 788 โ”‚ framework, model = infer_framework_load_model( โ”‚
โ”‚ 789 โ”‚ โ”‚ model, โ”‚
โ”‚ 790 โ”‚ โ”‚ model_classes=model_classes, โ”‚
โ”‚ 791 โ”‚ โ”‚ config=config, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:279 in โ”‚
โ”‚ infer_framework_load_model โ”‚
โ”‚ โ”‚
โ”‚ 276 โ”‚ โ”‚ โ”‚ โ”‚ continue โ”‚
โ”‚ 277 โ”‚ โ”‚ โ”‚
โ”‚ 278 โ”‚ โ”‚ if isinstance(model, str): โ”‚
โ”‚ โฑ 279 โ”‚ โ”‚ โ”‚ raise ValueError(f"Could not load model {model} with any of the following cl โ”‚
โ”‚ 280 โ”‚ โ”‚
โ”‚ 281 โ”‚ framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__clas โ”‚
โ”‚ 282 โ”‚ return framework, model โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

I got the same bug in google colab. Switched to using GPU and then it worked fine.

I am running this on Google COLAB (free version).
When I switch to GPU (GPU T4 Runtime in COLAB) I still get this error.
Also I tried switching to TPU on COLAB (which is possible because of the use of the accelerate lib !), I still get the same error.

How did you solve the problem? I run it on an EC2 with 8 A100, also got the same problem.

I am still getting the same error. Unable to load Falcon 40b instruct or Falcon 40b. This is the error
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Also, I have enough space in RAM. It could be an issue with text generation. Any help on this.

I found the solution to this.

I had to create a folder to offload the existing weights to to get it to work though which i named "device_map_weights".

import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model,
    device_map="auto",
    trust_remote_code=True,
    offload_folder="device_map_weights"
    )
model = AutoModelForCausalLM.from_pretrained(
    model,
    device_map="auto",
    trust_remote_code=True,
    offload_folder="device_map_weights"
    )

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

have we solved the original issue?

Sign up or log in to comment