Not able to run Model using VLLM

#3
by Pchaudhary - opened

When I am running the code using VLLM in GPU environment . I am getting below error :

[rank0]: RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

My GPU configurations are :

GPU Model: Tesla T4
CUDA Compute Capability: 7.5
Total Memory: 15948 MB (approximately 16 GB)
Number of Multiprocessors: 40

Can you provide a workaround on how to run the model ?

Neural Magic org

Hi @Pchaudhary vLLM only supports FP8 models on Ampere (CUDA Compute Capability: 8.0) and up for weight-only, and Ada Lovelace (CUDA Compute Capability: 8.9) and up for weights+activations. We don't have any planned support for older GPUs.

mgoin changed discussion status to closed

Sign up or log in to comment