Not able to run Model using VLLM

by Pchaudhary - opened Aug 6

Aug 6

When I am running the code using VLLM in GPU environment . I am getting below error :

[rank0]: RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

My GPU configurations are :

GPU Model: Tesla T4
CUDA Compute Capability: 7.5
Total Memory: 15948 MB (approximately 16 GB)
Number of Multiprocessors: 40

Can you provide a workaround on how to run the model ?

mgoin

Neural Magic org Aug 6

Hi @Pchaudhary vLLM only supports FP8 models on Ampere (CUDA Compute Capability: 8.0) and up for weight-only, and Ada Lovelace (CUDA Compute Capability: 8.9) and up for weights+activations. We don't have any planned support for older GPUs.

mgoin changed discussion status to closed Aug 6

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment