Tokens

#1
by Recognizeme - opened

Can you tell me if you are still developing the model?

Are you looking to increase the number of tokens?

I am not currently still developing the model but it would be pretty straightforward to train it on more tokens! See: https://github.com/JohnGiorgi/DeCLUTR. Based on the results in the paper I would expect increasing the training set to have a large positive effect on performance.

It's a real shame. Your model is one of the best for getting embeddings in scientific texts!

Sign up or log in to comment