Works well quantised to q8 on 2 x AMD 7900XTX cards

#11
by bkieser - opened

I did the following:

  1. Cloned the repo from HF.
  2. Used llama.cpp to generate a native (F32) gguf
  3. Used llama.cpp to quantise to q_8 (also gguf output)
  4. Ran that q8 version (llama.cpp for now, ollama still to do).

Works as well as GPT 4 and Claude 3 in my experience. It generated a snake game in ruby, including threads for the snake to keep moving while maintaining a responsive keyboard in about 20 shots. Similar, in the real world, to GPT 4 or Claude 3 Opus and much better than the real world experience of Gemini.

Well done to the people who produced this model. It's the first truly usable model for coding, sufficiently good enough for use with agents such as crew ai or autogen.

Need to add - The 2 x Radeon 7900XTX cards were chosen because they are 50% cheaper (or more) than the Nvidia RTX4090. The important thing is the VRAM. The Nvidia 4090 GPU itself is way faster than the 7900XTX, but it has only 24GB RAM which is too small for anything > 7B LLMs unless you start heavily quantising and then you lose the model's power. It comes at a cost. Whereas each 7900XTX also has 24GB, which means that you get twice the VRAM (48GB) for the same price as the 4090.

Another advantage is that with normal PC motherboards you can fit in only 1 x 4090 because of the size of the card and the cooling on it. But you can squeeze in 2 x Radeon 7900XTX cards on a standard PC motherboard. To get more of each you need to jump up quite seriously in cost point to the more specialised motherboards significantly adding to your costs.

So right now the best option is the AMD Radeon 7900XTX cards on a standard motherboard (you need a larger power pack), and with q8 quantisation there is almost no loss off the F32 or F16 versions of this model. And it will happily run at full context length across both GPUs, taking up 75% of the RAM on both, giving you one heck of a powerful code generation capability for commodity hardware prices.

Very impressive.

Multimodal Art Projection org

Thank you for sharing your experience and insights on using AMD Radeon 7900XTX cards for coding with the model. Your comparison with GPT-4 and Claude 3, along with the technical details on cost-effectiveness and performance, is highly appreciated. Thanks for contributing!

aaabiao changed discussion status to closed

Sign up or log in to comment