Morgan's blog/fast-whisper-endpoints.md #2841

Michellehbn · 2025-05-07T14:53:45Z

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

Add an entry to _blog.yml.
Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
Check you use a short title and blog path.
Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
Ensure the publication date is correct.
Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

freddyaboulton

Awesome!

freddyaboulton · 2025-05-08T06:25:05Z

fast-whisper-endpoints

+# How to deploy
+
+You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy: 
+XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">


Not sure how XXX Image will render

Suggested change

XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">

<img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">

oh no sorry that was a placeholder so I wouldn't forget that's where we wanted to add a screenshot haha sorry, will add legit images

pcuenca

Nice!

pcuenca · 2025-05-08T07:21:23Z

fast-whisper-endpoints

The fail needs to have the .md extension. Also reminder about adding an entry to _blog.yml and a thumbnail.

thank you! will do!

pcuenca · 2025-05-08T07:24:33Z

fast-whisper-endpoints

+- user: michellehbn
+---
+
+Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.


Suggested change

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option on [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). It provides up to 8x performance improvements compared to the previous version, and makes everyone one click away from deploying dedicated, powerful transcription models in a cost-effective way, leveraging the amazing work done by the AI community.

pcuenca · 2025-05-08T07:26:18Z

fast-whisper-endpoints

+
+The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.
+
+# Inference Stack


Suggested change

# Inference Stack

## Inference Stack

The post title will be rendered using a first-level heading, imo the sections look better using ##.

pcuenca · 2025-05-08T07:30:15Z

fast-whisper-endpoints

+
+# Inference Stack
+
+This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack. 


Suggested change

This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.

The new Whisper endpoint leverages amazing open-source community projects. Inference is powered by [the vLLM project](https://github.com/vllm-project/vllm), which provides efficient ways of running AI models on various hardware families – especially, but not limited to, NVIDIA GPUs. We use the vLLM implementation of OpenAI's Whisper model, allowing us to enable further, lower-level optimizations down the software stack.

pcuenca · 2025-05-08T07:31:31Z

fast-whisper-endpoints

+
+This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack. 
+
+In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations:


Suggested change

In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations:

In this initial release, we are targeting NVIDIA GPUs with compute capabilities 8.9 or better (Ada Lovelace), like L4 & L40s, which unlocks a wide range of software optimizations:

pcuenca · 2025-05-08T07:53:47Z

fast-whisper-endpoints

+
+Through this release, we would like to make Inference Endpoints more community-centric and allow anyone to come and contribute to create incredible inference deployments on the Hugging Face Platform. Along with the community, we would like to propose optimized deployments for a wide range of tasks through the use of awesome and available open-source technologies.
+
+The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.


Not fully sold on this one (sounds a bit boilerplate to me), but can't think of a better option right now.

pcuenca · 2025-05-08T07:56:07Z

fast-whisper-endpoints

+You can deploy your own ASR inference pipeline via [Hugging Face Endpoints](https://endpoints.huggingface.co/catalog?task=automatic-speech-recognition). Endpoints allows anyone willing to deploy AI models into production ready environments to do so by filling in a few parameters. It also features the most complete fleet of AI hardware available on the market to suit your need for cost and performances. All of this directly from where the AI community is being built.To get started, nothing easier, simply choose the model you want to deploy: 
+XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">
+
+You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312.


Suggested change

You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312.

When you click on any of the Whisper ASR models, you'll see the new container URL when you click on Container Configuration: `mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312`.

pcuenca · 2025-05-08T07:56:25Z

fast-whisper-endpoints

+Once the container is launched you can have access to the model with the Endpoint URL:
+XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">
+
+# Inference


Suggested change

# Inference

## Inference

pcuenca · 2025-05-08T07:56:52Z

fast-whisper-endpoints

+
+print("Transcript:", response.json()["text"])
+```
+# FastRTC Demo


Suggested change

# FastRTC Demo

## FastRTC Demo

pcuenca · 2025-05-08T07:57:22Z

fast-whisper-endpoints

+```
+# FastRTC Demo
+
+With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time!


Suggested change

With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time!

With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with [FastRTC](https://fastrtc.org). Simply speak into your microphone and see your speech transcribed in real time!

fast-whisper-endpoints

julien-c · 2025-05-09T09:04:00Z

fast-whisper-endpoints

+- user: michellehbn
+---
+
+Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.


Suggested change

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Hugging Face Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.

Today we are happy to introduce a new blazing fast OpenAI Whisper deployment option targeting [Inference Endpoints](https://endpoints.huggingface.co). This new addition to the Inference Endpoints catalog provides up to 8x performance improvements compared to the previous entry and makes everyone one click away from deploying dedicated powerful transcription models in a cost-effective way, leveraging the amazingon work done by the AI community.

Co-authored-by: Julien Chaumond <[email protected]>

with all suggested changes from reviewers!

Michellehbn added 3 commits May 7, 2025 16:52

Create fast-whisper-endpoints

65f04c7

Update fast-whisper-endpoints

862889b

Update fast-whisper-endpoints

70d9766

freddyaboulton approved these changes May 8, 2025

View reviewed changes

pcuenca reviewed May 8, 2025

View reviewed changes

julien-c reviewed May 9, 2025

View reviewed changes

fast-whisper-endpoints Outdated Show resolved Hide resolved

julien-c reviewed May 9, 2025

View reviewed changes

Update fast-whisper-endpoints

ae2e0c0

Co-authored-by: Julien Chaumond <[email protected]>

Michellehbn changed the title ~~Morgan's blog/fast-whisper-endpoints~~ Morgan's blog/fast-whisper-endpoints.md May 9, 2025

Michellehbn added 2 commits May 9, 2025 11:37

Rename fast-whisper-endpoints to fast-whisper-endpoints.md

366d635

Update fast-whisper-endpoints.md

b29a1f5

with all suggested changes from reviewers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Morgan's blog/fast-whisper-endpoints.md #2841

Morgan's blog/fast-whisper-endpoints.md #2841

Michellehbn commented May 7, 2025

freddyaboulton left a comment

freddyaboulton May 8, 2025

Michellehbn May 9, 2025

pcuenca left a comment

pcuenca May 8, 2025

Michellehbn May 9, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

pcuenca May 8, 2025

julien-c May 9, 2025

	XXX image <img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">
	<img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/construct_pipeline.svg?raw=true" alt="png" style="display: block; margin-left: auto; margin-right: auto;">


		The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.

		# Inference Stack


		# Inference Stack

		This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.


		This endpoint is made available leveraging amazing open-source community projects. The inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardwares, especially, but not limited to, NVIDIA GPUs. To get technical, we are using the vLLM implementation of OpenAI Whisper model, allowing us to enable further, lower-level, optimizations down the software stack.

		In this initial release, we are targeting NVIDIA GPUs with compute capabilities greater than 8.9 (Ada Lovelace) like L4 & L40s, giving us a wide range of software optimizations:


		Through this release, we would like to make Inference Endpoints more community-centric and allow anyone to come and contribute to create incredible inference deployments on the Hugging Face Platform. Along with the community, we would like to propose optimized deployments for a wide range of tasks through the use of awesome and available open-source technologies.

		The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software.

	You can see the correct and new container URL under Container Configuration: mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312.
	When you click on any of the Whisper ASR models, you'll see the new container URL when you click on Container Configuration: `mfuntowicz/endpoints-whisper-vllm:v1.0.2-py312`.

	With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with FastRTC. Simply speak into your microphone and see your speech transcribed in real time!
	With this blazing fast endpoint, it’s possible to build real-time transcription apps. Try out this [example](https://huggingface.co/spaces/freddyaboulton/really-fast-whisper) built with [FastRTC](https://fastrtc.org). Simply speak into your microphone and see your speech transcribed in real time!

Morgan's blog/fast-whisper-endpoints.md #2841

Are you sure you want to change the base?

Morgan's blog/fast-whisper-endpoints.md #2841

Conversation

Michellehbn commented May 7, 2025

Preparing the Article

Getting a Review

freddyaboulton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcuenca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment