Skip to content

ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Mar 13, 2025

This change allows using WebGPU in transformers.js with ORT Node.js binding.

Still doing testing (while the tests need this change)

Closes #1242

@AdamStrojek
Copy link

Wouldn't it be better to do the same thing as it is done in Onnx Runtime Web?

    if (apis.IS_WEBGPU_AVAILABLE) {
        supportedDevices.push('webgpu');
    }

Electron applications can have WebGPU enabled when terminal Node not. Also onnx-runtime-node provides only backers for native modules, when onnx-runtime-web have bindings for WebGPU, so just adding supported devices will not work without switching runtime

@fs-eire
Copy link
Contributor Author

fs-eire commented Mar 16, 2025

If I remember it correctly, IS_WEBGPU_AVAILABLE is checked against nagivator.gpu, which is only available in browser.

For electron, the rendering process is actually a "web" environment instead of "node"

@AdamStrojek
Copy link

Yes, you are correct, IS_WEBGPU_AVAILABLE is just a simple check against navigation.gpu. In theory, it is possible to install a 3rd-party package for WebGPU support in Node, but it is a complicated topic. Still, my comment is valid; I copied my example from a few lines higher in the same source file.

I recently did tests. Unfortunately, transformers.js are not detecting Electron applications correctly and mark them as Node applications, so it provides only CPU. I had a lot of trouble getting it running in an Electron app. Mostly, it was picky about path and fs packages. If I changed the target platform to Node, it generated other problems. I'm preparing a new issue report for developers with my findings.

I already did tests with your branch, and this simple change didn’t enable WebGPU in Electron apps.

@fs-eire fs-eire force-pushed the fs-eire/nodejs-support-native-webgpu-ep branch from a536b8d to 2dbde16 Compare April 18, 2025 23:26
@fs-eire fs-eire force-pushed the fs-eire/nodejs-support-native-webgpu-ep branch from 2dbde16 to 6cfeec3 Compare April 18, 2025 23:26
@fs-eire
Copy link
Contributor Author

fs-eire commented Apr 18, 2025

Updated the version of onnxruntime-node to 1.22.0-dev.20250418-c19a49615b. This version supports WebGPU on Windows and macOS.

@xenova
Copy link
Collaborator

xenova commented Apr 19, 2025

Wow thanks @fs-eire! Very exciting!!! Does the browser package https://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

@fs-eire
Copy link
Contributor Author

fs-eire commented Apr 19, 2025

Wow thanks @fs-eire! Very exciting!!! Does the browser package https://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

No.

BTW for WebGPU EP support in onnxruntime-web : There are still some perf issue for using WebGPU EP in a WebAssembly build. If you want to do conformance test only for WebGPU EP (eg. check correctness but not latency), I can offer you a private build of onnxruntime-web with WebGPU EP.

@xenova
Copy link
Collaborator

xenova commented Apr 19, 2025

That would be great! Feel free to send via slack perhaps? Eventually, we can hook this into the Transformers.js CI to ensure correctness across all supported architectures.

@xenova
Copy link
Collaborator

xenova commented Apr 19, 2025

I've been testing the webgpu EP for some llama/qwen models, and running into a few correctness issues.

Here's some code to help test/debug:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/ZR1-1.5B-ONNX",
  { dtype: "q4f16", device: "webgpu" }, // device="cpu" works fine
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Write me a poem about Machine Learning." },
];

// Generate a response
const output = await generator(messages, {
    max_new_tokens: 512,
    do_sample: false,
    streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true}),
});
console.log(output[0].generated_text.at(-1).content);

@xenova
Copy link
Collaborator

xenova commented Apr 19, 2025

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

@guschmue
Copy link
Contributor

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

for webgpu-ep / DeepSeek-R1-Distill-Qwen-1.5B we know about some open issue when GQA takes the FA2 path.
Don't happen on all GPU's but I can reproduce it on nvidia.

If ZR1-1.5B-ONNX is similar to DeepSeek-R1-Distill-Qwen-1.5B, might be the same. Not tried DeepSeek-R1-Distill-Qwen-1.5B with fp32. Let me check on this.

@guschmue
Copy link
Contributor

looks like the same issue as deepseek when GQA uses FA2 with fp16. fp32 seems ok.
I'll put this high on my list to look at.

@xenova
Copy link
Collaborator

xenova commented Apr 22, 2025

Great, thanks @guschmue!

@xenova xenova mentioned this pull request Apr 25, 2025
@xenova xenova changed the title [WIP] allow using 'webgpu' in nodejs binding ONNX Runtime improvements (experimental native webgpu; fix iOS) Apr 25, 2025
@xenova xenova changed the base branch from main to ort-improvements April 25, 2025 22:43
@xenova xenova marked this pull request as ready for review April 25, 2025 22:43
@xenova
Copy link
Collaborator

xenova commented Apr 25, 2025

I'm accumulating all these changes into https://github.com/huggingface/transformers.js/tree/ort-improvements to make development and testing a bit easier (many version bumps and ort-specific changes)

@xenova xenova merged commit 747a04d into huggingface:ort-improvements Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 v3 crashes on iOS and macOS devices due to increasing memory usage
4 participants