Snowflake/snowflake-arctic-embed-m · Sentence Transformers integration

tomaarsen

Apr 16

•

edited Apr 16

Hello!

Pull Request overview

Add Sentence Transformers integration.

Details

This PR adds proper support in Sentence Transformers, i.e. the package often used in third party embedding applications. It abstracts away a lot of the transformers code from the user, and instead hides it in the configuration. As a result, the user can just use:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

instead of manually loading both the model and the tokenizer, adding the query prompt themselves, computing the token embeddings & then taking the CLS embedding and then doing normalization.

P.s. Sentence Transformers is being maintained by Hugging Face.

Tom Aarsen

Add Sentence Transformers integration + README3dad7c2b

tomaarsen changed pull request status to open Apr 16

spacemanidol changed pull request status to merged Apr 16

KurtGD1915

Jul 22

Hi @tomaarsen . Would you please update the sentence-transformers version of this model to the latest version?

tomaarsen

Jul 22

Could you elaborate on that? You should be able to use this model with the latest version of Sentence Transformers.

For reference, the version in config_sentence_transformers.json is the lowest recommended sentence transformers that should work. All newer versions should continue to work.

Tom Aarsen

KurtGD1915

Jul 22

Oh nice. As the embedder is expected to work well with the newer versions of sentence-transformers, that's very convenient.

Thanks Tom Aarsen.