RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM @fschat 0.2.29, torch 2.0.1+cu118, transformers 4.33.3

#42
by Zeal666 - opened

I'm using https://github.com/lm-sys/FastChat/blob/main/docs/gptq.md. And I got the error below. Does anyone know on how to resolve this? Appreciate!
+++++++++++++++++++++++++++++++++++
2023-10-03 03:53:39 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='models/Llama-2-70B-chat-GPTQ', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt='models/Llama-2-70B-chat-GPTQ/model.safetensors', gptq_wbits=4, gptq_groupsize=128, gptq_act_order=True, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None)
2023-10-03 03:53:39 | INFO | model_worker | Loading the model ['Llama-2-70B-chat-GPTQ'] on worker 9bf7fc38 ...
2023-10-03 03:53:39 | INFO | stdout | Loading GPTQ quantized model...
2023-10-03 03:53:43 | INFO | stdout | Loading model ...
2023-10-03 03:53:47 | ERROR | stderr | Traceback (most recent call last):
2023-10-03 03:53:47 | ERROR | stderr | File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
2023-10-03 03:53:47 | ERROR | stderr | return _run_code(code, main_globals, None,
2023-10-03 03:53:47 | ERROR | stderr | File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
2023-10-03 03:53:47 | ERROR | stderr | exec(code, run_globals)
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/serve/model_worker.py", line 543, in
2023-10-03 03:53:47 | ERROR | stderr | args, worker = create_model_worker()
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/serve/model_worker.py", line 518, in create_model_worker
2023-10-03 03:53:47 | ERROR | stderr | worker = ModelWorker(
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/serve/model_worker.py", line 221, in init
2023-10-03 03:53:47 | ERROR | stderr | self.model, self.tokenizer = load_model(
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/model/model_adapter.py", line 269, in load_model
2023-10-03 03:53:47 | ERROR | stderr | model, tokenizer = load_gptq_quantized(model_path, gptq_config)
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/modules/gptq.py", line 46, in load_gptq_quantized
2023-10-03 03:53:47 | ERROR | stderr | model = load_quant(
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/FastChat/fastchat/../repositories/GPTQ-for-LLaMa/llama.py", line 308, in load_quant
2023-10-03 03:53:47 | ERROR | stderr | model.load_state_dict(safe_load(checkpoint))
2023-10-03 03:53:47 | ERROR | stderr | File "/home/zeal/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
2023-10-03 03:53:47 | ERROR | stderr | raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
2023-10-03 03:53:47 | ERROR | stderr | RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

Sign up or log in to comment