Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LLMDet in transformers #37334

Open
2 tasks done
fushh opened this issue Apr 7, 2025 · 6 comments
Open
2 tasks done

Support LLMDet in transformers #37334

fushh opened this issue Apr 7, 2025 · 6 comments

Comments

@fushh
Copy link

fushh commented Apr 7, 2025

Model description

Could you please kindly consider add LLMDet (CVPR2025 Highlight) to transformers, which is a next-generation open-vocabulary object detector. The architecture of LLMDet is similar to GroundingDino and most of the code can be reused. We have provide the code and the Hugginface-compatible checkpoints at here.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

paper: https://arxiv.org/abs/2501.18954
code: https://github.com/iSEE-Laboratory/LLMDet/tree/main/hf_model
model: https://huggingface.co/fushh7/llmdet_swin_tiny_hf
model: https://huggingface.co/fushh7/llmdet_swin_base_hf
model: https://huggingface.co/fushh7/llmdet_swin_large_hf

@fushh fushh added the New model label Apr 7, 2025
@NielsRogge
Copy link
Contributor

NielsRogge commented Apr 7, 2025

Very cool!

It seems that only 3 minor modifications are needed (as explained here). Hence this is an ideal use case for modular. It means that you can just add modular_llmdet.py which inherits everything from Grounding DINO and adds the necessary changes. The modular converter will then automatically convert that to a standalone modeling_llmdet.py file.

Is this something you'd be eager to open a PR for? To get started, see the guide here: https://huggingface.co/docs/transformers/main/en/modular_transformers

cc also @EduardoPach who added Grounding DINO to Transformers

@qubvel
Copy link
Member

qubvel commented Apr 7, 2025

Hey @fushh! Thanks for the proposal and congratulations on the acceptance of your paper for the CVPR 2025 Highlights! It would be wonderful to have it in Transformers. Please let us know if we can make the PR 🤗

@fushh
Copy link
Author

fushh commented Apr 7, 2025

Since I am not familiar with how to contribute to transformers, it would be grateful if you can help us integrate LLMDet into transformers. Many thanks!

@qubvel
Copy link
Member

qubvel commented Apr 7, 2025

@fushh, sure, we can help with the code review once the PR is open and guide the implementation details. Since the code is similar to GroundingDINO, it should not be that hard to add it by following the structure of the GroundingDINO model.

As @NielsRogge mentioned, here is the good starting point:

You can refer to other model PR's, e.g. RT-DETRv2, which is based on RT-DETR and also uses a modular approach.

@sushmanthreddy
Copy link
Contributor

Hey @fushh ! Congrats on the CVPR 2025 Highlights! I saw the discussion about adding LLMDet to Transformers and was wondering if there's any chance I can help with that? I'd love to contribute and assist in any way I can. Let me know!

@EduardoPach
Copy link
Contributor

Amazing work, @fushh ! Glad to see that the HF GroundingDino implementation is useful to researchers. Feel free to tag me as a reviewer if you open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants