Inference Speed

#61
by khaled-hesham - opened

What is the inference speed that you could reach with this model.

Cohere For AI org

Hi @khaled-hesham ! That's a good question. The answer depends on many factors such as the batch size you use during inference, the hardware you have available, the precision you are using to load the model, and the exact metric you care about (is it time to first token? is it the complete answer? is it tokens per seconds? or seconds per token?)

Feel free to experiment and report the results you get here! I'm pretty sure more people will be interested in this!

Sign up or log in to comment