difference between paper and implementation in gradcam calculation #789

dengmengjie · 2025-02-25T09:44:17Z

Hi, thank you for your wonderful work.

I've noticed that in the paper, the relevance score between image patches and tokens are calculated as:

where the postive values of gradients are set to 0 through the min function, leaving only negative values. The reason for doing that can be quoted as:

Inspired by GradCAM, we filter out uninformative attention scores by multiplication with the gradient which could cause an increase in the image-text similarity.

But in your code implementation, a clamp(0) function is applied to gradients that is supposed to assign 0 to negative values. Isn't it actually a max function instead of min?
grads = ( grads[:, :, :, 1:].clamp(0).reshape(visual_input.size(0), 12, -1, 24, 24) * mask )

Could anyone provide a explaination? Thanks a lot!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difference between paper and implementation in gradcam calculation #789

difference between paper and implementation in gradcam calculation #789

dengmengjie commented Feb 25, 2025

difference between paper and implementation in gradcam calculation #789

difference between paper and implementation in gradcam calculation #789

Comments

dengmengjie commented Feb 25, 2025