Documents in fineweb dataset may exceed max context length of this classifier

#6
by divonysus - opened

How are these pieces dealt with in fineweb-edu curation?

HuggingFaceFW org

Samples are truncated to the model's context length, you can find the inference code here: https://github.com/huggingface/cosmopedia/blob/main/classification/run_edu_bert.py

Sign up or log in to comment