Error in ORTModelForCausalLM

#1
by sachin7p - opened

We are loading the model using the following commands:
model = ORTModelForCausalLM.from_pretrained(
"/home/common/models/phi3-medium-4k-instruct-onnx-cpu",
decoder_file_name="phi3-medium-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
decoder_with_past_file_name="phi3-medium-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
use_merged=True,
provider="CPUExecutionProvider",
trust_remote_code=True,
local_files_only=True,
)

Here, we get the following warning "ORTModelForCausalLM loaded a legacy ONNX model with no position_ids input, although this input is required for batched generation for the architecture phi3. We strongly encourage to re-export the model with optimum>=1.14 for position_ids and batched inference support."

While using this model in "text-generation" pipeline in HugggingFace transformers, we get the following error:
RuntimeError: Error in execution: Got invalid dimensions for input: past_key_values.0.key for the following indices
index: 1 Got: 32 Expected: 10
index: 3 Got: 96 Expected: 128
Please fix either the inputs/outputs or the model.

The versions we are using are as follows. Can someone please guide on this?

onnx==1.16.2
onnxruntime==1.19.2
onnxruntime-genai==0.4.0
optimum==1.21.4
torch==2.4.1
transformers==4.43.4

Python 3.11
Ubuntu 20.04 (CPU)

Here, we get the following warning "ORTModelForCausalLM loaded a legacy ONNX model with no position_ids input, although this input is required for batched generation for the architecture phi3. We strongly encourage to re-export the model with optimum>=1.14 for position_ids and batched inference support."

This looks to be an issue with Hugging Face's Optimum. The uploaded ONNX model does not have a position_ids input because the input is no longer needed after the optimizations that were applied. You can open an issue about this in the Optimum GitHub repo.

While using this model in "text-generation" pipeline in HugggingFace transformers, we get the following error:
RuntimeError: Error in execution: Got invalid dimensions for input: past_key_values.0.key for the following indices
index: 1 Got: 32 Expected: 10
index: 3 Got: 96 Expected: 128
Please fix either the inputs/outputs or the model.

Similar to how Hugging Face's Transformers has support for PyTorch pipelines, Hugging Face's Optimum has support for ONNX pipelines. You can also use ONNX Runtime GenAI to run the uploaded Phi-3 ONNX models. Here is a tutorial you can follow.

kvaishnavi changed discussion status to closed

Thanks for your prompt response! ONNX Runtime GenAI model is now working at our end.

Sign up or log in to comment