Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A shallow copy in groundingdino #37333

Open
1 of 4 tasks
fushh opened this issue Apr 7, 2025 · 6 comments
Open
1 of 4 tasks

A shallow copy in groundingdino #37333

fushh opened this issue Apr 7, 2025 · 6 comments
Labels

Comments

@fushh
Copy link

fushh commented Apr 7, 2025

System Info

A bug in source code

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

please see

else:
for _ in range(config.decoder_layers):
_bbox_embed = GroundingDinoMLPPredictionHead(
input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3
)
self.bbox_embed = nn.ModuleList([_bbox_embed for _ in range(config.decoder_layers)])

Expected behavior

a deep copy

@fushh fushh added the bug label Apr 7, 2025
@Rocketknight1
Copy link
Member

Hi @fushh I don't understand the bug from this description. Can you explain?

@fushh
Copy link
Author

fushh commented Apr 7, 2025

In original implementation, each _bbox_embed in self.bbox_embed is the same no matter config.decoder_bbox_embed_share is True or False, which is a shallow copy.

However, when config.decoder_bbox_embed_share=False, a deep copy is needed. An example code is as follows.

if config.decoder_bbox_embed_share:
          _bbox_embed = GroundingDinoMLPPredictionHead(
              input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3
          )
          self.bbox_embed = nn.ModuleList([_bbox_embed for _ in range(config.decoder_layers)])
      else:
          model_list = []
          for _ in range(config.decoder_layers):
              _bbox_embed = GroundingDinoMLPPredictionHead(
                  input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3
              )
              model_list.append(_bbox_embed)
          self.bbox_embed = nn.ModuleList(model_list)

Only by switching to deep copy can the llmdet model be supported.

Further, it would be grateful if you can help us integrate LLMDet into transformers.
paper: https://arxiv.org/abs/2501.18954
code: https://github.com/iSEE-Laboratory/LLMDet/tree/main/hf_model
model: https://huggingface.co/fushh7/llmdet_swin_tiny_hf
model: https://huggingface.co/fushh7/llmdet_swin_base_hf
model: https://huggingface.co/fushh7/llmdet_swin_large_hf

@Rocketknight1
Copy link
Member

cc @qubvel @NielsRogge for GroundingDINO!

@qubvel
Copy link
Member

qubvel commented Apr 7, 2025

Hey @fushh, according to configs we actually never have

else:
          model_list = []
          for _ in range(config.decoder_layers):
              _bbox_embed = GroundingDinoMLPPredictionHead(
                  input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3
              )
              model_list.append(_bbox_embed)
          self.bbox_embed = nn.ModuleList(model_list)

path activated, so it's probably better to remove this from GrouningDINO at all and redefine it in llmdet from scratch, what do you think?

@fushh
Copy link
Author

fushh commented Apr 7, 2025

I think fixing this bug would be a better choice because other users might prefer to make changes or fine-tune on groundingdino.

@qubvel
Copy link
Member

qubvel commented Apr 7, 2025

Sure, we can fix it here as well. However, we prefer to simplify the modeling code by removing unused parts in already-pretrained checkpoints 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants