Sentence Transformers integration

#2
by tomaarsen HF staff - opened

Hello!

Pull Request overview

  • Add Sentence Transformers integration.

Details

This PR adds proper support in Sentence Transformers, i.e. the package often used in third party embedding applications. It abstracts away a lot of the transformers code from the user, and instead hides it in the configuration. As a result, the user can just use:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

instead of manually loading both the model and the tokenizer, adding the query prompt themselves, computing the token embeddings & then taking the CLS embedding and then doing normalization.

P.s. Sentence Transformers is being maintained by Hugging Face.

  • Tom Aarsen
tomaarsen changed pull request status to open
spacemanidol changed pull request status to merged

Hi @tomaarsen . Would you please update the sentence-transformers version of this model to the latest version?

Could you elaborate on that? You should be able to use this model with the latest version of Sentence Transformers.

For reference, the version in config_sentence_transformers.json is the lowest recommended sentence transformers that should work. All newer versions should continue to work.

  • Tom Aarsen

Oh nice. As the embedder is expected to work well with the newer versions of sentence-transformers, that's very convenient.

Thanks Tom Aarsen.

Sign up or log in to comment