Load model into TGI

#27
by schauppi - opened

Hello - Thx for the great work!

I want to load this model in the text generation inference v0.9.3 (latest) with 2x3090 24gb vRAM each. In this GitHub thread: https://github.com/huggingface/text-generation-inference they said it is not possible/will not fit?

You mentioned here https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/2#64ba51be41078fd9a059c1a6 that it would be possible.

Please could you guide me in the right direction running this model in TGI with my setup?

Sign up or log in to comment