为什么量化还是T4卡不能运行

#9
by loong - opened

OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB. GPU 0 has a total capacity of 14.75 GiB of which 7.37 GiB is free. Process 4566 has 7.38 GiB memory in use. Of the allocated memory 6.97 GiB is allocated by PyTorch, and 284.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

quantization = int8_weight_only
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())
# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

请查看github issue置顶的信息

3090 could run within comfyui

zRzRzRzRzRzRzR changed discussion status to closed

Sign up or log in to comment