-
Notifications
You must be signed in to change notification settings - Fork 859
Morgan's blog/fast-whisper-endpoints.md #2841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
fast-whisper-endpoints
Outdated
# How to deploy | ||
|
||
You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy: | ||
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how XXX Image will render
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;"> | |
<img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no sorry that was a placeholder so I wouldn't forget that's where we wanted to add a screenshot haha sorry, will add legit images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
fast-whisper-endpoints
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fail needs to have the .md
extension. Also reminder about adding an entry to _blog.yml
and a thumbnail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! will do!
fast-whisper-endpoints
Outdated
- user: michellehbn | ||
--- | ||
|
||
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community. | |
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option on [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). It provides up to 8x performance improvements compared to the previous version, and makes everyone one click away from deploying dedicated, powerful transcription models in a cost-effective way, leveraging the amazing work done by the AI community. |
fast-whisper-endpoints
Outdated
|
||
The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software. | ||
|
||
# Inference Stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Inference Stack | |
## Inference Stack |
The post title will be rendered using a first-level heading, imo the sections look better using ##
.
fast-whisper-endpoints
Outdated
|
||
# Inference Stack | ||
|
||
This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack. | |
The new Whisper endpoint leverages amazing open-source community projects. Inference is powered by [the vLLM project](https://github.com/vllm-project/vllm), which provides efficient ways of running AI models on various hardware families – especially, but not limited to, NVIDIA GPUs. We use the vLLM implementation of OpenAI's Whisper model, allowing us to enable further, lower-level optimizations down the software stack. |
fast-whisper-endpoints
Outdated
|
||
This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack. | ||
|
||
In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations: | |
In this initial release, we are targeting NVIDIA GPUs with compute capabilities 8.9 or better (Ada Lovelace), like L4 & L40s, which unlocks a wide range of software optimizations: |
fast-whisper-endpoints
Outdated
|
||
Through this release, we would like to make Inference Endpoints more community-centric and allow anyone to come and contribute to create incredible inference deployments on the Hugging Face Platform. Along with the community, we would like to propose optimized deployments for a wide range of tasks through the use of awesome and available open-source technologies. | ||
|
||
The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully sold on this one (sounds a bit boilerplate to me), but can't think of a better option right now.
fast-whisper-endpoints
Outdated
You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy: | ||
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;"> | ||
|
||
You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312. | |
When you click on any of the Whisper ASR models, you'll see the new container URL when you click on Container Configuration: `mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312`. |
fast-whisper-endpoints
Outdated
Once the container is launched you can have access to the model with the Endpoint URL: | ||
XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;"> | ||
|
||
# Inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Inference | |
## Inference |
fast-whisper-endpoints
Outdated
|
||
print("Transcript:", response.json()["text"]) | ||
``` | ||
# FastRTC Demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# FastRTC Demo | |
## FastRTC Demo |
fast-whisper-endpoints
Outdated
``` | ||
# FastRTC Demo | ||
|
||
With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time! | |
With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with [FastRTC](https://fastrtc.org). Simply speak into your microphone and see your speech transcribed in real time! |
fast-whisper-endpoints
Outdated
- user: michellehbn | ||
--- | ||
|
||
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community. | |
Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community. |
Co-authored-by: Julien Chaumond <[email protected]>
with all suggested changes from reviewers!
Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.
Preparing the Article
You're not quite done yet, though. Please make sure to follow this process (as documented here):
md
file. You can also specifyguest
ororg
for the authors.Here is an example of a complete PR: #2382
Getting a Review
Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.
Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.