Skip to content

[Feature]: Composite model loading using AutoWeightsLoader for all models #15697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
DarkLight1337 opened this issue Mar 28, 2025 · 12 comments · Fixed by #15770, #15939 or #16203
Open
1 task done

[Feature]: Composite model loading using AutoWeightsLoader for all models #15697

DarkLight1337 opened this issue Mar 28, 2025 · 12 comments · Fixed by #15770, #15939 or #16203
Assignees
Labels
feature request New feature or request good first issue Good for newcomers

Comments

@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 28, 2025

🚀 The feature, motivation and pitch

#9160 first introduced AutoWeightsLoader to recursively call load_weights on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model classes such as LlamaModel) without having to repeat their weight loading logic.

Currently, load_weights is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:

  1. Move the existing load_weights function from *ForCausalLM to *Model.
  2. Create a new load_weights function in *ForCausalLM that loads the weights using AutoWeightsLoader.
  3. Move any logic in *Model.load_weights that only applies to *ForCausalLM back to *ForCausalLM.load_weights. Usually, this involves lm_head.

For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.

To avoid scope creep, I suggest opening a PR for updating only a few models at a time

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@DarkLight1337 DarkLight1337 added feature request New feature or request good first issue Good for newcomers labels Mar 28, 2025
@lengrongfu
Copy link
Contributor

@DarkLight1337 I can try to process several models

@DarkLight1337
Copy link
Member Author

Can you indicate which models you are working on to avoid others duplicating your work?

@lengrongfu
Copy link
Contributor

I haven't started yet, do you have any recommendations for simpler models?

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Mar 28, 2025

Most language models should work in pretty much the same way (except SSMs like Mamba I guess). You can go in alphabetical order.

@lengrongfu
Copy link
Contributor

Thanks ~
/assign

@lengrongfu
Copy link
Contributor

lengrongfu commented Mar 29, 2025

Tips, these are unimplemented models, i use shell script to count, this is the approximate result:

  • arctic.py
  • baichuan.py
  • bamba.py
  • bart.py
  • bert.py
  • blip.py
  • bloom.py
  • clip.py
  • chameleon.py
  • commandr.py
  • dbrx.py
  • decilm.py
  • deepseek.py
  • deepseek_v2.py
  • eagle.py
  • exaone.py
  • falcon.py
  • gemma.py
  • glm.py
  • glm4v.py
  • gpt2.py
  • gpt_bigcode.py
  • gpt_j.py
  • gpt_neox.py
  • granite.py
  • granitemoe.py
  • granitemoeshared.py
  • gritlm.py
  • grok1.py
  • h2ovl.py
  • idefics2_vision_model.py
  • interfaces.py
  • interfaces_base.py
  • internlm2.py
  • jais.py
  • jamba.py
  • mamba.py
  • mamba2.py
  • mamba_cache.py
  • medusa.py
  • minicpm3.py
  • mixtral.py
  • mixtral_quant.py
  • mllama.py
  • mlp_speculator.py
  • module_mapping.py
  • mpt.py
  • nemotron.py
  • nvlm_d.py
  • olmo.py
  • olmo2.py
  • olmoe.py
  • opt.py
  • orion.py
  • persimmon.py
  • phi.py
  • phi3.py
  • phi3_small.py
  • phi4mm_audio.py
  • phi4mm_utils.py
  • phimoe.py
  • pixtral.py
  • qwen.py
  • qwen2_moe.py
  • qwen3_moe.py
  • qwen_vl.py
  • registry.py
  • roberta.py
  • siglip.py
  • solar.py
  • stablelm.py
  • starcoder2.py
  • teleflm.py
  • vision.py
  • zamba2.py

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Mar 30, 2025

The multi-modal models don't need this change, can you remove them from the list?

@jonghyunchoe
Copy link
Contributor

Hi, I could take on a few models, e.g. baichuan, gpt_neox, and mpt.

@jonghyunchoe
Copy link
Contributor

jonghyunchoe commented Apr 4, 2025

I’ll take on a few more models next week.

@lengrongfu
Copy link
Contributor

lengrongfu commented Apr 11, 2025

@DarkLight1337 I can add two new skip field, eg. skip_substr: Optional[List[str]] = None, skip_suffix: Optional[List[str]] = None,, because prefix not applicable in all situations.

skip_prefixes: Optional[List[str]] = None,

@DarkLight1337
Copy link
Member Author

Unless you also need ignore_unexpected_*, I suggest to instead map the weight to None inside the WeightsMapper

@lengrongfu
Copy link
Contributor

Issue record: #16548 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment