Skip to content

Morgan's blog/fast-whisper-endpoints.md #2841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Michellehbn
Copy link
Member

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

# How to deploy

You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy:
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how XXX Image will render

Suggested change
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">
<img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no sorry that was a placeholder so I wouldn't forget that's where we wanted to add a screenshot haha sorry, will add legit images

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fail needs to have the .md extension. Also reminder about adding an entry to _blog.yml and a thumbnail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! will do!

- user: michellehbn
---

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option on [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). It provides up to 8x performance improvements compared to the previous version, and makes everyone one click away from deploying dedicated, powerful transcription models in a cost-effective way, leveraging the amazing work done by the AI community.


The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.

# Inference Stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Inference Stack
## Inference Stack

The post title will be rendered using a first-level heading, imo the sections look better using ##.


# Inference Stack

This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.
The new Whisper endpoint leverages amazing open-source community projects. Inference is powered by [the vLLM project](https://github.com/vllm-project/vllm), which provides efficient ways of running AI models on various hardware families – especially, but not limited to, NVIDIA GPUs. We use the vLLM implementation of OpenAI's Whisper model, allowing us to enable further, lower-level optimizations down the software stack.


This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.

In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations:
In this initial release, we are targeting NVIDIA GPUs with compute capabilities 8.9 or better (Ada Lovelace), like L4 & L40s, which unlocks a wide range of software optimizations:


Through this release, we would like to make Inference Endpoints more community-centric and allow anyone to come and contribute to create incredible inference deployments on the Hugging Face Platform. Along with the community, we would like to propose optimized deployments for a wide range of tasks through the use of awesome and available open-source technologies.

The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully sold on this one (sounds a bit boilerplate to me), but can't think of a better option right now.

You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy:
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">

You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312.
When you click on any of the Whisper ASR models, you'll see the new container URL when you click on Container Configuration: `mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312`.

Once the container is launched you can have access to the model with the Endpoint URL:
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">

# Inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Inference
## Inference


print("Transcript:", response.json()["text"])
```
# FastRTC Demo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# FastRTC Demo
## FastRTC Demo

```
# FastRTC Demo

With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time!
With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with [FastRTC](https://fastrtc.org). Simply speak into your microphone and see your speech transcribed in real time!

- user: michellehbn
---

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.

Co-authored-by: Julien Chaumond <[email protected]>
@Michellehbn Michellehbn changed the title Morgan's blog/fast-whisper-endpoints Morgan's blog/fast-whisper-endpoints.md May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants