What do you recommend to use this model for?

by BoscoTheDog - opened Jul 16

Discussion

BoscoTheDog

Jul 16

What is it's intended use?

NoidoDev

Jul 31

•

edited Jul 31

Yeah, it got everything wrong I tested it with. I guess it's a basis to finetune it into something better.

loubnabnl

Hugging Face TB Research org Aug 18

Hi, we just updated the Instruct models and the outputs should be better, you can also try the larger 360M for better performance in these demos
https://huggingface.co/spaces/HuggingFaceTB/instant-smollm
https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Instruct-WebGPU

NoidoDev

Aug 18

Thanks, but the idea was to check what such small models can do. I guess the whole approach shows that such small models have severe limitations . Maybe it can only be fine-tuned to be useful in some specific area. It certainly doesn't get simple facts about the world right, the 360M does way better. I wouldn't bother with making it learn dates and such, just giving vague answers.

BoscoTheDog

Aug 18

•

edited about 1 month ago

Danube 3 500M actually is surprisingly good for it's size. Quantized to Q4 it's just 320MB. It's like talking to a basic version of Wikipedia. It outputs markdown too, which is a big plus. It's a very interesting development.

NoidoDev

Aug 19

Thanks, I'll try that out. Did you ever try to fine tune one. They can't need that much GPU. I think that for example that such small models should return Python code when asked a math question, then another part of a system could execute that and return the values. Also, they don't need all knowledge of Wikipedia, but then the answers should be vague and maybe also indicating some insecurity.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment