So, is this based on OG Llama 3 or Llama 3.1?

#2
by XelotX - opened

During their last update they changed the config.json to indicate Llama 3 instead of Llama 3.1 and they also xhanged the context size back to 8K (down from 128K).

Do you have any idea if this was a fluke or this model was indeed finetuned from the original Llama 3?

Yeah that's a confusing one I'm not sure about, if it's indeed based on llama 3.1 it should be fine to apply rope to it and get back the normal context length, but it may suffer a bit to use its CoT past the 8k context since I assume that's how long their datasets were

That said I wouldn't be shocked if it was still able to generalize the CoT portion especially since by the time you're past 8k context (unless it's all ingestion) it'll basically have multi-prompt examples of how to do it

Another user pointed me to the part of the video that explains it: they did use Llama 3.1 but they finetuned it mostly on 8K samples. It still doesn't explain why they changed the name as well instead of only max context but here we are.

Source: yt video https://www.youtube.com/live/5_m-kN64Exc?si=1sj_CsnOE-lOjdQU at min 33:00

Sign up or log in to comment