A lot of <unk> generations in the cuda int 4 model.

#12
by Satandon1999 - opened

I am using a code derived from https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-generate.py .

I have tried the cpu-int4 and the cuda-int4 models using the same data and code. Where the cpu model seems to be working fine, the cuda model is generating almost all of the tokens as 0 (which get decoded to ).

Versions:
onnxruntime-genai-cuda 0.3.0
torch 2.3.1+cu118

Is this being caused by some package version issues? Does anyone have any idea regarding this?

Microsoft org

Can you try using the Phi-3 example scripts such as this one? If you want to use the model-generate.py example, the chat template is also necessary for Phi-3. An example of the chat template for Phi-3 mini can be found here.

kvaishnavi changed discussion status to closed

Sign up or log in to comment