Phi-3-small-8k-instruct-onnx

#19
by internetUser - opened

There is onnx-cpu version for Phi-3-mini and Phi-3-medium but not for Phi-3-small. This model fits perfectly for cpu usage. If possible, could you provide Phi-3-small-8k-instruct-onnx-cpu and if not, could you indicate how to convert this model to onnx.

Microsoft org

As mentioned here, the SparseAttention operator has a kernel implementation only for CUDA currently. A kernel implementation for SparseAttention on CPU is in progress. Once it is complete, we will publish optimized and quantized ONNX models for Phi-3 small that run on CPU.

kvaishnavi changed discussion status to closed

Sign up or log in to comment