30 May 19:32

xenova

4362237

3.5.2 Latest

Latest

What's new?

Update paper links to HF papers by @qgallouedec in #1318
Allow older (legacy) BPE models to be detected even when the type is not specified in #1314
Fix WhisperTextStreamer when return_timestamps is true (correctly ignore printing of timestamp tokens) in #1327
Improve typescript exports and expose common types in #1325

New Contributors

@qgallouedec made their first contribution in #1318

Full Changelog: 3.5.1...3.5.2

Contributors

qgallouedec

Assets 2

03 May 04:00

xenova

3.5.1

746c8c2

3.5.1

What's new?

Add support for Qwen3 in #1300.

Example usage:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen3-0.6B-ONNX",
  { dtype: "q4f16", device: "webgpu" },
);

// Define the list of messages
const messages = [
  { role: "user", content: "If 5 brog 5 is 1, and 4 brog 2 is 2, what is 3 brog 1?" },
];

// Generate a response
const output = await generator(messages, {
    max_new_tokens: 1024,
    do_sample: true,
    top_k: 20,
    temperature: 0.7,
    streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true}),
});
console.log(output[0].generated_text.at(-1).content);

Try out the online demo:

qwen3-webgpu.mp4

Add support for D-FINE in #1303

Example usage:

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("object-detection", "onnx-community/dfine_s_coco-ONNX");

const image = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg";
const output = await detector(image, { threshold: 0.5 });
console.log(output);

See list of supported models: https://huggingface.co/models?library=transformers.js&other=d_fine&sort=trending

Introduce global inference chain (+ other WebGPU fixes) in #1293
fix: RawImage.fromURL error when input file url by @himself65 in #1288
[bugfix] tokenizers respect padding: true with non-null max_length by @dwisdom0 in #1284

New Contributors

@himself65 made their first contribution in #1288
@dwisdom0 made their first contribution in #1284

Full Changelog: 3.5.0...3.5.1

Contributors

dwisdom0 and himself65

Assets 2

16 Apr 16:37

xenova

3.5.0

c701ccf

3.5.0

🔥 Transformers.js v3.5

🛠️ Improvements

Fix error when dtype in config is unset by @hans00 in #1271
[audio utils] fix fft_bin_width computation in #1274
Fix bad words logits processor in #1278
Implement LRU cache for BPE tokenizer in #1283
Return buffer instead of file_path if cache unavailable for model loading by @PrafulB in #1280
Use custom cache over FSCache if specified by @PrafulB in #1285
Support device-level configuration across all devices by @ibelem in #1276

🤗 New contributors

@PrafulB made their first contribution in #1280

Full Changelog: 3.4.2...3.5.0

Contributors

hans00, ibelem, and PrafulB

Assets 2

02 Apr 00:40

xenova

3.4.2

7ac2d8b

3.4.2

What's new?

Add support for RF-DETR and RT-DETRv2 in #1260
Optimize added token split in #1261 and #1265
Support loading local models using relative paths, absolute paths, and model directory in #1268

Full Changelog: 3.4.1...3.4.2

Assets 2

25 Mar 22:30

xenova

3.4.1

39a75ce

3.4.1

What's new?

Add support for SNAC (Multi-Scale Neural Audio Codec) in #1251
Add support for Metric3D (v1 & v2) in #1254
Add support for Gemma 3 text in #1229. Note: Only Node.js execution is supported for now.
Safeguard against background removal pipeline precision issues in #1255. Thanks to @LuSrodri for reporting the issue!
Allow RawImage to read from all types of supported sources by @BritishWerewolf in #1244
Update pipelines.md api docs in #1256
Update extension example to use latest version by @fs-eire in #1213

Full Changelog: 3.4.0...3.4.1

Contributors

BritishWerewolf, fs-eire, and LuSrodri

Assets 2

07 Mar 12:04

xenova

3.4.0

5b5e5ed

3.4.0

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ Background Removal Pipeline
🤖 New models: Ultravox DAC, Mimi, SmolVLM2, LiteWhisper
🛠️ Other improvements
🤗 New contributors

🖼️ New Background Removal Pipeline

Removing backgrounds from images is now as easy as:

import { pipeline } from "@huggingface/transformers";
const segmenter = await pipeline("background-removal", "onnx-community/BEN2-ONNX");
const output = await segmenter("input.png");
output[0].save("output.png"); // (Optional) Save the image

You can find the full list of compatible models here, which will continue to grow in future! 🔥 For more information, check out #1216.

🤖 New models

Ultravox for audio-text-to-text generation (#1207). See here for the list of supported models.

See example usage

import { UltravoxProcessor, UltravoxModel, read_audio } from "@huggingface/transformers";

const processor = await UltravoxProcessor.from_pretrained(
  "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
);
const model = await UltravoxModel.from_pretrained(
  "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
  {
    dtype: {
      embed_tokens: "q8", // "fp32", "fp16", "q8"
      audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
      decoder_model_merged: "q4", // "q8", "q4", "q4f16"
    },
  },
);

const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
const messages = [
  {
    role: "system",
    content: "You are a helpful assistant.",
  },
  { role: "user", content: "Transcribe this audio:<|audio|>" },
];
const text = processor.tokenizer.apply_chat_template(messages, {
  add_generation_prompt: true,
  tokenize: false,
});

const inputs = await processor(text, audio);
const generated_ids = await model.generate({
  ...inputs,
  max_new_tokens: 128,
});

const generated_texts = processor.batch_decode(
  generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log(generated_texts[0]);
// "I can transcribe the audio for you. Here's the transcription:\n\n\"I have a dream that one day this nation will rise up and live out the true meaning of its creed.\"\n\n- Martin Luther King Jr.\n\nWould you like me to provide the transcription in a specific format (e.g., word-for-word, character-for-character, or a specific font)?"

DAC and Mimi for audio tokenization/neural audio codecs (#1215). See here for the list of supported DAC models and here for the list of supported Mimi models.

See example usage

DAC:

import { DacModel, AutoFeatureExtractor } from '@huggingface/transformers';

const model_id = "onnx-community/dac_16khz-ONNX";
const model = await DacModel.from_pretrained(model_id);
const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);

const audio_sample = new Float32Array(12000);

// pre-process the inputs
const inputs = await feature_extractor(audio_sample);
{
    // explicitly encode then decode the audio inputs
    const encoder_outputs = await model.encode(inputs);
    const { audio_values } = await model.decode(encoder_outputs);
    console.log(audio_values);
}

{
    // or the equivalent with a forward pass
    const { audio_values } = await model(inputs);
    console.log(audio_values);
}

Mimi:

import { MimiModel, AutoFeatureExtractor } from '@huggingface/transformers';

const model_id = "onnx-community/kyutai-mimi-ONNX";
const model = await MimiModel.from_pretrained(model_id);
const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);

const audio_sample = new Float32Array(12000);

// pre-process the inputs
const inputs = await feature_extractor(audio_sample);
{
    // explicitly encode then decode the audio inputs
    const encoder_outputs = await model.encode(inputs);
    const { audio_values } = await model.decode(encoder_outputs);
    console.log(audio_values);
}

{
    // or the equivalent with a forward pass
    const { audio_values } = await model(inputs);
    console.log(audio_values);
}

SmolVLM2, a lightweight multimodal model designed to analyze image and video content (#1196). See here for the list of supported models. Usage is identical to SmolVLM.
LiteWhisper for automatic speech recognition (#1219). See here for the list of supported models. Usage is identical to Whisper.

🛠️ Other improvements

Add support for multi-chunk external data files in #1212
Fix package export by @fs-eire in #1161
Add NFD normalizer in #1211. Thanks to @adewdev for reporting!
Documentation improvements by @viksit in #1184
Optimize conversion script in #1204 and #1218
Use Float16Array instead of Uint16Array for kvcache when available in #1208

🤗 New contributors

@axrati made their first contribution in #602
@viksit made their first contribution in #1184
@tangkunyin made their first contribution in #1203

Full Changelog: 3.3.3...3.4.0

Contributors

viksit, tangkunyin, and 3 other contributors

Assets 2

06 Feb 23:33

xenova

3.3.3

829ace0

3.3.3

What's new?

Bump onnxruntime-web and @huggingface/jinja in #1183.

Full Changelog: 3.3.2...3.3.3

Assets 2

22 Jan 15:13

xenova

3.3.2

6f43f24

3.3.2

What's new?

Add support for Helium and Glm in #1156
Improve build process and fix usage with certain bundlers in #1158
Auto-detect wordpiece tokenizer when model.type is missing in #1151
Update Moonshine config values for transformers v4.48.0 in #1155
Support simultaneous tensor op execution in WASM in #1162
Update react tutorial sample code in #1152

Full Changelog: 3.3.1...3.3.2

Assets 2

15 Jan 15:36

xenova

3.3.1

e1753ac

3.3.1

What's new?

hotfix: Copy missing ort-wasm-simd-threaded.jsep.mjs to dist folder (#1150)

Full Changelog: 3.3.0...3.3.1

Assets 2

15 Jan 13:28

xenova

3.3.0

e00ff3b

3.3.0

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding Dino
- StyleTTS 2: High-quality speech synthesis
- Grounding DINO: Zero-shot object detection
🛠️ Other improvements
🤗 New contributors

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

See #1148 for more information and here for the list of supported models.

First, install the kokoro-js library, which uses Transformers.js, from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_bella",
});
audio.save("audio.wav");

Grounding DINO for zero-shot object detection

See #1137 for more information and here for the list of supported models.

Example: Zero-shot object detection with onnx-community/grounding-dino-tiny-ONNX using the pipeline API.

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("zero-shot-object-detection", "onnx-community/grounding-dino-tiny-ONNX");

const url = "http://images.cocodataset.org/val2017/000000039769.jpg";
const candidate_labels = ["a cat."];
const output = await detector(url, candidate_labels, {
  threshold: 0.3,
});

See example output

[
  { score: 0.45316222310066223, label: "a cat", box: { xmin: 343, ymin: 23, xmax: 637, ymax: 372 } },
  { score: 0.36190420389175415, label: "a cat", box: { xmin: 12, ymin: 52, xmax: 317, ymax: 472 } },
]

🛠️ Other improvements

Add the RawAudio class by @Th3G33k in #682
Update React guide for v3 by @sroussey in #1128
Add option to skip special tokens in TextStreamer by @sroussey in #1139

🤗 New contributors

@sroussey made their first contribution in #1128

Full Changelog: 3.2.4...3.3.0

Contributors

sroussey and Th3G33k

Assets 2

Releases: huggingface/transformers.js

3.5.2

What's new?

New Contributors

Contributors

Uh oh!

3.5.1

What's new?

New Contributors

Contributors

Uh oh!

3.5.0

🔥 Transformers.js v3.5

🛠️ Improvements

🤗 New contributors

Contributors

Uh oh!

3.4.2

What's new?

Uh oh!

3.4.1

What's new?

Contributors

Uh oh!

3.4.0

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ New Background Removal Pipeline

🤖 New models

🛠️ Other improvements

🤗 New contributors

Contributors

Uh oh!

3.3.3

What's new?

Uh oh!

3.3.2

What's new?

Uh oh!

3.3.1

What's new?

Uh oh!

3.3.0

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

Grounding DINO for zero-shot object detection

🛠️ Other improvements

🤗 New contributors

Contributors

Uh oh!