Consumer GPU's?

#3
by billythefisherman - opened

Will this model work on any consumer Nvidia hardware that supports INT-4 i.e Turing or later that has enough VRAM? Would 10GB be enough VRAM as the data file <9GB or am I being naïve and working memory is much larger? Your notes only mention V100 and A100 as far as I can see. I'd like to try this locally on a 3080Ti but also would like to deploy to machines with consumer hardware for inference locally.

Apologies I see there's a DirectML version which is perfect for my application

Microsoft org
edited May 23

Both DirectML and CUDA has int4. We tested it in 4080 GPU with 16GB VRAM. 10GB VRAM might not be able to run Phi-3 medium. If so, try Phi-3 small instead.

Choosing Between DirectML and CUDA: The choice between DirectML and CUDA depends on your specific use case, hardware availability, and preferences. If you’re looking for broader compatibility and ease of setup, DirectML might be a good choice. However, if you have NVIDIA GPUs and need highly optimized performance, CUDA remains a strong contender. In another word, if you are familiar with CUDA and cuDNN installation, you can choose CUDA, otherwise choose DirectML for easy of setup.

Hi @tlwu Ive downloaded and built the onnxruntime-genai sample so that I can run the models. Im using the c++ example. But when I try to run any phi3 model I get an error of incorrect parameter from it. Ive posted an issue on github here but no reply as of yet:

https://github.com/microsoft/onnxruntime-genai/issues/521

Any idea what might be going wrong?

kvaishnavi changed discussion status to closed

Sign up or log in to comment