Is there any way to improve inference time?

#68
by winvin - opened

I have enough vram to load whole model to GPU, but it's still a bit slow to generate image, about 1 second for every sampling step. I want to know is there any accelerating framework for text-to-image? Like vllm/TensortRT-LLM for LLM models.

You can quantize the model, its looking like bnb_nf4 is the best approach. This has already been done several times. If you just want to use flux1-dev try "HighCWu/FLUX.1-dev-4bit"

@winvin @megachad You can speed up it by a pretty massive amount with two things.
1st. Use hyper flux instead of flux.1 dev since hyper flux is already 3-4x faster as it requires 8 steps instead of 25+ and produces better or similar quality images usually.
2nd. Use torch compile, you can get massive speed ups but have to wait almost an hour I believe. I have not tested this with flux as I am not patient enough but with SD3 2B on an a100 gpu, you get a massive 4x speed up.

@YaTharThShaRma999 "HighCWu/FLUX.1-dev-4bit"seems to be much faster but I wasnt able to load any flux loras with it. I did get img2img to work though. Takes about 20 seconds to generate 768x768 on a RTX3080, and about 5 seconds to load initially.

Sign up or log in to comment