allenai/OLMoE-1B-7B-0924-Instruct · Add to Huggingface open leaderboard 2

9 days ago

I am sure, if you added this model to the leaderboard, it will receive more publicity, as there are actually not many models below 7b active parameters.
See https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Muennighoff

Ai2 org 6 days ago

Anyone is welcome to add it; it seems like the LB does not support filtering by active params tho

ThiloteE

6 days ago

No, it does not support it yet.

I know of the GGUF naming convention, which follows mixtral's naming scheme, but I honestly also like what Qwen1.5-MoE-A2.7B did with "A", so it is very clear how many parameters are actually activated. If one searches for MoE, it shows most MoE models, then users can remember their score and compare with other models. It's a little messy and kinda works, but ideally, developers of language models should follow a common standard. It's fine if metadata is written into the model files, the accompanying configuration files or the README instead of the name of the model, as long as a search feature can access that data.

Muennighoff

Ai2 org 6 days ago

It should also be clear for this model i.e. 1B are activated, 7B in total; The Qwen naming scheme does not make total params as clear which is inconvenient i think

ThiloteE

5 days ago

•

edited 5 days ago

The current name could be interpreted to mean that 1B total is activated, but it is made up of multiple smaller sub 1B parameters.

e.g. 2x 0.5b = 1b parameters activated, but 14 x 0.5 = 7b parameters in total.

Obviously it is clear from your model card that this is not the case, but when looking at the name alone, it is not clear.

Edit: But we are digressing. Sorry :D