Qwen3 models only support CUDA devices with flash attention

## Describe the bug

Currently, Qwen3 embedding models (introduced in PR #627) only support CUDA devices with flash attention enabled. When trying to use these models on CPU or Metal devices, users get the following error:

```
"Qwen3 is only supported on Cuda devices in fp16 with flash attention enabled"
```

## Expected behavior

Qwen3 embedding models should be usable on CPU and Metal devices for inference, similar to other embedding models in TEI.

## Environment

- Device: CPU or Metal (macOS)
- Model: Any Qwen3 embedding model (e.g., `Qwen/Qwen3-Embedding-0.6B`)

## Proposed solution

Implement a CPU-compatible `Qwen3Model` alongside the existing `FlashQwen3Model` to enable Qwen3 embedding inference on non-CUDA devices.

## Additional context

This limitation significantly reduces the accessibility of Qwen3 models for users without CUDA GPUs. Adding CPU support would make these models available to a much broader user base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 models only support CUDA devices with flash attention #630

Describe the bug

Expected behavior

Environment

Proposed solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3 models only support CUDA devices with flash attention #630

Description

Describe the bug

Expected behavior

Environment

Proposed solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions