Skip to content

Releases: huggingface/transformers.js

3.5.2

30 May 19:32
4362237
Compare
Choose a tag to compare

What's new?

  • Update paper links to HF papers by @qgallouedec in #1318
  • Allow older (legacy) BPE models to be detected even when the type is not specified in #1314
  • Fix WhisperTextStreamer when return_timestamps is true (correctly ignore printing of timestamp tokens) in #1327
  • Improve typescript exports and expose common types in #1325

New Contributors

Full Changelog: 3.5.1...3.5.2

3.5.1

03 May 04:00
746c8c2
Compare
Choose a tag to compare

What's new?

  • Add support for Qwen3 in #1300.

    Example usage:

    import { pipeline, TextStreamer } from "@huggingface/transformers";
    
    // Create a text generation pipeline
    const generator = await pipeline(
      "text-generation",
      "onnx-community/Qwen3-0.6B-ONNX",
      { dtype: "q4f16", device: "webgpu" },
    );
    
    // Define the list of messages
    const messages = [
      { role: "user", content: "If 5 brog 5 is 1, and 4 brog 2 is 2, what is 3 brog 1?" },
    ];
    
    // Generate a response
    const output = await generator(messages, {
        max_new_tokens: 1024,
        do_sample: true,
        top_k: 20,
        temperature: 0.7,
        streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true}),
    });
    console.log(output[0].generated_text.at(-1).content);

    Try out the online demo:

    qwen3-webgpu.mp4
  • Add support for D-FINE in #1303

    Example usage:

    import { pipeline } from "@huggingface/transformers";
    
    const detector = await pipeline("object-detection", "onnx-community/dfine_s_coco-ONNX");
    
    const image = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg";
    const output = await detector(image, { threshold: 0.5 });
    console.log(output);

    See list of supported models: https://huggingface.co/models?library=transformers.js&other=d_fine&sort=trending

  • Introduce global inference chain (+ other WebGPU fixes) in #1293

  • fix: RawImage.fromURL error when input file url by @himself65 in #1288

  • [bugfix] tokenizers respect padding: true with non-null max_length by @dwisdom0 in #1284

New Contributors

Full Changelog: 3.5.0...3.5.1

3.5.0

16 Apr 16:37
c701ccf
Compare
Choose a tag to compare

🔥 Transformers.js v3.5

🛠️ Improvements

  • Fix error when dtype in config is unset by @hans00 in #1271
  • [audio utils] fix fft_bin_width computation in #1274
  • Fix bad words logits processor in #1278
  • Implement LRU cache for BPE tokenizer in #1283
  • Return buffer instead of file_path if cache unavailable for model loading by @PrafulB in #1280
  • Use custom cache over FSCache if specified by @PrafulB in #1285
  • Support device-level configuration across all devices by @ibelem in #1276

🤗 New contributors

Full Changelog: 3.4.2...3.5.0

3.4.2

02 Apr 00:40
7ac2d8b
Compare
Choose a tag to compare

What's new?

  • Add support for RF-DETR and RT-DETRv2 in #1260
  • Optimize added token split in #1261 and #1265
  • Support loading local models using relative paths, absolute paths, and model directory in #1268

Full Changelog: 3.4.1...3.4.2

3.4.1

25 Mar 22:30
39a75ce
Compare
Choose a tag to compare

What's new?

  • Add support for SNAC (Multi-Scale Neural Audio Codec) in #1251
  • Add support for Metric3D (v1 & v2) in #1254
  • Add support for Gemma 3 text in #1229. Note: Only Node.js execution is supported for now.
  • Safeguard against background removal pipeline precision issues in #1255. Thanks to @LuSrodri for reporting the issue!
  • Allow RawImage to read from all types of supported sources by @BritishWerewolf in #1244
  • Update pipelines.md api docs in #1256
  • Update extension example to use latest version by @fs-eire in #1213

Full Changelog: 3.4.0...3.4.1

3.4.0

07 Mar 12:04
5b5e5ed
Compare
Choose a tag to compare

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ New Background Removal Pipeline

Removing backgrounds from images is now as easy as:

import { pipeline } from "@huggingface/transformers";
const segmenter = await pipeline("background-removal", "onnx-community/BEN2-ONNX");
const output = await segmenter("input.png");
output[0].save("output.png"); // (Optional) Save the image

You can find the full list of compatible models here, which will continue to grow in future! 🔥 For more information, check out #1216.

🤖 New models

  • Ultravox for audio-text-to-text generation (#1207). See here for the list of supported models.

    See example usage
    import { UltravoxProcessor, UltravoxModel, read_audio } from "@huggingface/transformers";
    
    const processor = await UltravoxProcessor.from_pretrained(
      "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
    );
    const model = await UltravoxModel.from_pretrained(
      "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
      {
        dtype: {
          embed_tokens: "q8", // "fp32", "fp16", "q8"
          audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
          decoder_model_merged: "q4", // "q8", "q4", "q4f16"
        },
      },
    );
    
    const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
    const messages = [
      {
        role: "system",
        content: "You are a helpful assistant.",
      },
      { role: "user", content: "Transcribe this audio:<|audio|>" },
    ];
    const text = processor.tokenizer.apply_chat_template(messages, {
      add_generation_prompt: true,
      tokenize: false,
    });
    
    const inputs = await processor(text, audio);
    const generated_ids = await model.generate({
      ...inputs,
      max_new_tokens: 128,
    });
    
    const generated_texts = processor.batch_decode(
      generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
      { skip_special_tokens: true },
    );
    console.log(generated_texts[0]);
    // "I can transcribe the audio for you. Here's the transcription:\n\n\"I have a dream that one day this nation will rise up and live out the true meaning of its creed.\"\n\n- Martin Luther King Jr.\n\nWould you like me to provide the transcription in a specific format (e.g., word-for-word, character-for-character, or a specific font)?"
  • DAC and Mimi for audio tokenization/neural audio codecs (#1215). See here for the list of supported DAC models and here for the list of supported Mimi models.

    See example usage

    DAC:

    import { DacModel, AutoFeatureExtractor } from '@huggingface/transformers';
    
    const model_id = "onnx-community/dac_16khz-ONNX";
    const model = await DacModel.from_pretrained(model_id);
    const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);
    
    const audio_sample = new Float32Array(12000);
    
    // pre-process the inputs
    const inputs = await feature_extractor(audio_sample);
    {
        // explicitly encode then decode the audio inputs
        const encoder_outputs = await model.encode(inputs);
        const { audio_values } = await model.decode(encoder_outputs);
        console.log(audio_values);
    }
    
    {
        // or the equivalent with a forward pass
        const { audio_values } = await model(inputs);
        console.log(audio_values);
    }

    Mimi:

    import { MimiModel, AutoFeatureExtractor } from '@huggingface/transformers';
    
    const model_id = "onnx-community/kyutai-mimi-ONNX";
    const model = await MimiModel.from_pretrained(model_id);
    const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);
    
    const audio_sample = new Float32Array(12000);
    
    // pre-process the inputs
    const inputs = await feature_extractor(audio_sample);
    {
        // explicitly encode then decode the audio inputs
        const encoder_outputs = await model.encode(inputs);
        const { audio_values } = await model.decode(encoder_outputs);
        console.log(audio_values);
    }
    
    {
        // or the equivalent with a forward pass
        const { audio_values } = await model(inputs);
        console.log(audio_values);
    }
  • SmolVLM2, a lightweight multimodal model designed to analyze image and video content (#1196). See here for the list of supported models. Usage is identical to SmolVLM.

  • LiteWhisper for automatic speech recognition (#1219). See here for the list of supported models. Usage is identical to Whisper.

🛠️ Other improvements

  • Add support for multi-chunk external data files in #1212
  • Fix package export by @fs-eire in #1161
  • Add NFD normalizer in #1211. Thanks to @adewdev for reporting!
  • Documentation improvements by @viksit in #1184
  • Optimize conversion script in #1204 and #1218
  • Use Float16Array instead of Uint16Array for kvcache when available in #1208

🤗 New contributors

Full Changelog: 3.3.3...3.4.0

3.3.3

06 Feb 23:33
829ace0
Compare
Choose a tag to compare

What's new?

  • Bump onnxruntime-web and @huggingface/jinja in #1183.

Full Changelog: 3.3.2...3.3.3

3.3.2

22 Jan 15:13
6f43f24
Compare
Choose a tag to compare

What's new?

  • Add support for Helium and Glm in #1156
  • Improve build process and fix usage with certain bundlers in #1158
  • Auto-detect wordpiece tokenizer when model.type is missing in #1151
  • Update Moonshine config values for transformers v4.48.0 in #1155
  • Support simultaneous tensor op execution in WASM in #1162
  • Update react tutorial sample code in #1152

Full Changelog: 3.3.1...3.3.2

3.3.1

15 Jan 15:36
e1753ac
Compare
Choose a tag to compare

What's new?

  • hotfix: Copy missing ort-wasm-simd-threaded.jsep.mjs to dist folder (#1150)

Full Changelog: 3.3.0...3.3.1

3.3.0

15 Jan 13:28
e00ff3b
Compare
Choose a tag to compare

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

See #1148 for more information and here for the list of supported models.

First, install the kokoro-js library, which uses Transformers.js, from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_bella",
});
audio.save("audio.wav");

Grounding DINO for zero-shot object detection

See #1137 for more information and here for the list of supported models.

Example: Zero-shot object detection with onnx-community/grounding-dino-tiny-ONNX using the pipeline API.

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("zero-shot-object-detection", "onnx-community/grounding-dino-tiny-ONNX");

const url = "http://images.cocodataset.org/val2017/000000039769.jpg";
const candidate_labels = ["a cat."];
const output = await detector(url, candidate_labels, {
  threshold: 0.3,
});
See example output
[
  { score: 0.45316222310066223, label: "a cat", box: { xmin: 343, ymin: 23, xmax: 637, ymax: 372 } },
  { score: 0.36190420389175415, label: "a cat", box: { xmin: 12, ymin: 52, xmax: 317, ymax: 472 } },
]

🛠️ Other improvements

🤗 New contributors

Full Changelog: 3.2.4...3.3.0