THUDM/CogVideoX-5b · 为什么量化还是T4卡不能运行

17 days ago

OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB. GPU 0 has a total capacity of 14.75 GiB of which 7.37 GiB is free. Process 4566 has 7.38 GiB memory in use. Of the allocated memory 6.97 GiB is allocated by PyTorch, and 284.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

quantization = int8_weight_only
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())
# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org 17 days ago

请查看github issue置顶的信息

astar987

16 days ago

3090 could run within comfyui

zRzRzRzRzRzRzR changed discussion status to closed 16 days ago