Inference Endpoint

#5
by iamrobotbear - opened

Is it possible to deploy this on a Huggingface inference endpoint?

@ZhengPeng7 Thanks, really great work!

Definitely! Actually, I'm still new to some Hugging Face interfaces. So, if you want any extra functions that I can do, don't hesitate to let me know :)
I'm about to finish it today.

Hey, robotbear, the inference endpoint seems to need payment on the deployment? If so, I might not be able to afford that.
But you can still easily load the model and use it via the instructions here: https://huggingface.co/ZhengPeng7/BiRefNet#how-to-use or by the space.

Hi, @iamrobotbear , you can have a try on the inference API from FAL: https://fal.ai/models/fal-ai/birefnet, which is great and cheap. I've been communicating well with them. Therefore, if you encounter some problems when using it, you can leave them to me. I'll find time to talk with them to keep it as good as possible!

@ZhengPeng7 Thanks, I was hoping it would get deployed on Fal as well.

I appreciate the instructions here: https://huggingface.co/ZhengPeng7/BiRefNet#how-to-use. I had followed those and still want to try and use Huggingface Inference endpoints as this would allow for auto sleeping when the GPU is not in use and subsequently 0 billing.

Thanks!

Hi, Brian. Do you mean that you want to do some deployment of BiRefNet and prefer HF inference endpoints since it won't make you cost when the endpoint is not active? I think that feature is also on FAL (cost can be counted by inference times). When I'm free, I'll also try to spare some money on the HF inference endpoints and tell you if it's good to be kept long.

Hi @brianjking
this model does not have an inference endpoint since it was not created within the transformers library.
another good alternative to using the model would be to use one of these live spaces using birefnet :

scroll all the way down and you will find a button saying Use via API , click on it to open the api and implement the logic in your code

image.png

here's an example of how to use one of the APIs in python

from gradio_client import Client, handle_file
from PIL import Image

client = Client("not-lain/background-removal")
result = client.predict(
        image=handle_file('https://huggingface.co/spaces/not-lain/background-removal/resolve/main/chameleon.jpg'), # or any local image
        api_name="/image"
)

Image.open(result[0])

Using Gradio APIs has a better advantage over inference endpoints since you are not bound by a number of api calls (normal inference apis has a limited number of api calls per hour) and you do not need to use any tokens or any sort of authentification.

Let me know how if you have any further feedback.

Regards
Hafedh Hichri (Lain)

Wow, Hafedh (Lain), thank you so much for the patient explanation! That's truly amazing! I never knew space had this function, which is really valuable🎖️. And I also tried your space demo, the URL interface is very cool. Many thanks, I learnt a lot from you :)

Best Regards
Peng Zheng

Anytime @ZhengPeng7 🤗

@not-lain @ZhengPeng7

First off, thank you so much for the great model and for your assistance so far!

Secondly, please allow me to explain a couple of things:

  1. The reason I want to use Huggingface inference endpoints is that I'm looking to use this for an enterprise use case where a: Privacy is essential, b: I need to constrain costs. Huggingface inference endpoints (https://huggingface.co/docs/inference-endpoints/index) do not cost anything when they go to sleep and auto wake up when pinged, making them easy ways to get up to speed quickly, c: can be hosted on AWS or Azure infrastructure - a name enterprises outside of the generative AI space are familiar with and trust.

  2. I love Fal and would love to use their endpoint, but I go back to point a and c from point #1.

  3. Gradio apps offer endpoints, but they require the app to be running the entire time.

Thanks!

Sign up or log in to comment