Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation questions #19

Open
checkpoint214159 opened this issue Feb 18, 2025 · 0 comments
Open

Implementation questions #19

checkpoint214159 opened this issue Feb 18, 2025 · 0 comments

Comments

@checkpoint214159
Copy link

checkpoint214159 commented Feb 18, 2025

Hi, thanks for the work you guys have put into this research. However, I do have a few pressing questions I cannot find any solutions to, and was hoping somebodoy could give some insight.

  1. What was the positional encoding added to the encoder outputs? My impression is that it was a GSD positional encoding (GSD PE) using the target resolution, not the input resolution. (e.g from Fig 2 in the paper, input_resolution is 0.7m, target_resolution is 0.3m, correct me if I am mistaken?) However, looking at the code, it seems to be calculated using the input resolution of 0.7m. I believed that, for the decoder to produce an accurate reconstruction of the target, the GSD PE should be calculated from the target resolution, not from the input resolution as was done in the code. Is this the correct implementation?

  2. Furthermore, looking into the forward method of the code, specifically at line 362 of scale-mae/mae/models_mae.py of the main branch, I see
    pos_embed = get_2d_sincos_pos_embed_with_resolution( x.shape[-1], target_dim, target_res, cls_token=True, device=x.device )

Which is then projected under some embedding projection module, but never used for anything else. Is this intentional, or am I just unable to find where it is used? This is one of the reasons why I asked question 1.; it seems like the intention was to get GSD PE with target resolution encoding, but the implementation never showed as such.

As a whole, I would also like to comment that better documentation in place for the backend code, would make it less confusing to traverse, as even having been through the code from the SatMAE paper which this was built upon, I felt loss quite a few times. Regardless, thank you for your hard work and dedication in open-sourcing and publishing your research :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant