-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support LLMDet in transformers #37334
Comments
Very cool! It seems that only 3 minor modifications are needed (as explained here). Hence this is an ideal use case for modular. It means that you can just add Is this something you'd be eager to open a PR for? To get started, see the guide here: https://huggingface.co/docs/transformers/main/en/modular_transformers cc also @EduardoPach who added Grounding DINO to Transformers |
Hey @fushh! Thanks for the proposal and congratulations on the acceptance of your paper for the CVPR 2025 Highlights! It would be wonderful to have it in Transformers. Please let us know if we can make the PR 🤗 |
Since I am not familiar with how to contribute to transformers, it would be grateful if you can help us integrate LLMDet into transformers. Many thanks! |
@fushh, sure, we can help with the code review once the PR is open and guide the implementation details. Since the code is similar to GroundingDINO, it should not be that hard to add it by following the structure of the GroundingDINO model. As @NielsRogge mentioned, here is the good starting point: You can refer to other model PR's, e.g. RT-DETRv2, which is based on RT-DETR and also uses a modular approach. |
Hey @fushh ! Congrats on the CVPR 2025 Highlights! I saw the discussion about adding LLMDet to Transformers and was wondering if there's any chance I can help with that? I'd love to contribute and assist in any way I can. Let me know! |
Amazing work, @fushh ! Glad to see that the HF |
Model description
Could you please kindly consider add LLMDet (CVPR2025 Highlight) to transformers, which is a next-generation open-vocabulary object detector. The architecture of LLMDet is similar to GroundingDino and most of the code can be reused. We have provide the code and the Hugginface-compatible checkpoints at here.
Open source status
Provide useful links for the implementation
paper: https://arxiv.org/abs/2501.18954
code: https://github.com/iSEE-Laboratory/LLMDet/tree/main/hf_model
model: https://huggingface.co/fushh7/llmdet_swin_tiny_hf
model: https://huggingface.co/fushh7/llmdet_swin_base_hf
model: https://huggingface.co/fushh7/llmdet_swin_large_hf
The text was updated successfully, but these errors were encountered: