Is this the same model as the Jina API?

#14
by mjrdbds - opened

Thank you for this great model! This is as much a question about this as the Jina AI API.

I'm trying to reconcile the results of this model vs the API. I get different results for the same image.

I'm using the default model, as initialized here, and the Jina AI API.

curl https://api.jina.ai/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <MY_TOKEN>" \
  -d '{
    "input": [
     {"url": "https://fastly.picsum.photos/id/84/1280/848.jpg?hmac=YFRYDI4UsfbeTzI8ZakNOR98wVU7a-9a2tGF542539s"}],
    "model": "jina-clip-v1",
    "encoding_type": "float"
  }'
from transformers import AutoModel
model = AutoModel.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True)
model.encode_image(["https://fastly.picsum.photos/id/84/1280/848.jpg?hmac=YFRYDI4UsfbeTzI8ZakNOR98wVU7a-9a2tGF542539s"])

These two vectors seem completely different. Is there any guarantee on what model is behind the API?

Jina AI org

Hey @mjrdbds ,

This is the same model, let us check what may be going on

Jina AI org

Hey @mjrdbds ,

The discrepancy comes from the API not returining the normalized vector.

If u normalize the vector from the API, you will get the same as the local one

Aah this fixed it - thank you Joan for such a quick response.

For the record - I was hoping to use the API here (and compensate you for this great model), but I couldn't get rate limits fast enough or find an easy way to contact support.

I'll work around, but I do wish I could compensate you and save me a few hours of infra work ;)

mjrdbds changed discussion status to closed
Jina AI org

@mjrdbds Thank you for your support! We're now enabling normalization on the API, so you can go ahead and use it to save efforts on your end.

Out of curiosity, what kind of rate limit are you hitting on the API, and what limit would you be satisfied with?

Jina AI org

Hello @mjrdbds ,

Just out of curiosity, what is actually what you are getting? Is it the rate limit? how many requests per minute are you sending? or is it the latency or throughput?

Sure - I would have wanted to embed about 10k docs of 1k tokens each in 10 seconds, so 1M tokens per second. I'm not very picky on latency. I care about the 10 seconds for quick iteration on our search stack.

Instead, I'm going through Modal to do this right now, but I'd be happy to switch back to your API.

Sign up or log in to comment