What do you recommend to use this model for?

#2
by BoscoTheDog - opened

What is it's intended use?

Screenshot 2024-07-16 at 19.42.46.png

Yeah, it got everything wrong I tested it with. I guess it's a basis to finetune it into something better.

Hugging Face TB Research org

Hi, we just updated the Instruct models and the outputs should be better, you can also try the larger 360M for better performance in these demos
https://huggingface.co/spaces/HuggingFaceTB/instant-smollm
https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Instruct-WebGPU

Thanks, but the idea was to check what such small models can do. I guess the whole approach shows that such small models have severe limitations . Maybe it can only be fine-tuned to be useful in some specific area. It certainly doesn't get simple facts about the world right, the 360M does way better. I wouldn't bother with making it learn dates and such, just giving vague answers.

Danube 3 500M actually is surprisingly good for it's size. Quantized to Q4 it's just 320MB. It's like talking to a basic version of Wikipedia. It outputs markdown too, which is a big plus. It's a very interesting development.

Thanks, I'll try that out. Did you ever try to fine tune one. They can't need that much GPU. I think that for example that such small models should return Python code when asked a math question, then another part of a system could execute that and return the values. Also, they don't need all knowledge of Wikipedia, but then the answers should be vague and maybe also indicating some insecurity.

Sign up or log in to comment