Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98

mingkai-zheng · 2025-01-22T02:43:05Z

Hello,

I recently noticed that the model.safetensors.index.json file was updated to include two additional keys: vision_tower.post_layernorm.bias and vision_tower.post_layernorm.weight. I also noticed that the model conducted with the current GitHub codebase does not have the vision_tower.post_layernorm layer.

After testing the image cat.png using the demo example (with do_sample=False and no temperature applied), I observed slight differences in the output based on two versions.

Could you clarify:

Are these differences expected?
Do the changes introduce any performance or behavioral implications?
Is there anything I might have overlooked regarding this update?

Thank you for your assistance!

The text was updated successfully, but these errors were encountered:

xffxff · 2025-01-22T03:14:08Z

@mingkai-zheng Hi

Great observation! You're absolutely right -- the vision tower model arch used in Aria doesn't actually include the post_layernorm layer. The changes to the ckpt weights, including adding "vision_tower.post_layernorm", were introduced by the transformers team, as they integrated Aria into their repo.

I didn't check if the two implementations (ours and transformers) are 100% identical, but I don't think the differences you're seeing are caused by the added post_layernorm weights. From what I understand,transformers add the weights of post_layernorm just because they can reuse the existing Idefics3VisionTransformer without creating a new one that removes the post_layernorm. The transformers impl doesn't actually use the post_layernorm layer. They set output_hidden_states=True, and fetch the hidden state at vision_feature_layer, which is the output right before the post_layernorm, so even though the weights exist, the layer itself doesn't affect the final inference outputs

https://github.com/huggingface/transformers/blob/f4f33a20a23aa90f3510280e34592b2784d48ebe/src/transformers/models/aria/modeling_aria.py#L1417-L1425

mingkai-zheng · 2025-01-22T03:30:46Z

Hi @xffxff

Thank you so much for providing the information! Actually, both versions of the model give quite reasonable responses for the provided cat.png, I’m just a bit confused about where the discrepancies comes in and will investigate further on my end.

Currently, I’m trying to reproduce the performance of Aria on the MMMU (val) benchmark using both versions, based on the lmms-eval codebase. In my experiments, both versions achieve quite similar results, with performance around 45 (input resolution fixed to 980x980). However, this is significantly lower than the 54.9 reported in Table 1 of your paper.

Could you clarify how your team evaluates performance on MMMU? Specifically:

Are you using a different codebase other than lmms-eval?
Are there variations in the prompts or evaluation setup that might explain the discrepancy?

BTW, I believe this question might also be related to the other issue I created yesterday (#90).

Thank you so much for your help !!!

xffxff · 2025-01-22T03:59:28Z

@mingkai-zheng

Thanks for your question!

I’m not directly involved in the training and evaluation of the Aria model, so I don’t have all the details about MMLU evaluation. What I can share is that we use an internal evaluation framework, which might differ from lmms-eval.

@LiJunnan1992 @dxli94 @teowu may know more details

mingkai-zheng mentioned this issue Jan 22, 2025

Question about inference prompt and Resolutoin #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98

Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98

mingkai-zheng commented Jan 22, 2025

xffxff commented Jan 22, 2025

mingkai-zheng commented Jan 22, 2025 •

edited

Loading

xffxff commented Jan 22, 2025

Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98

Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98

Comments

mingkai-zheng commented Jan 22, 2025

xffxff commented Jan 22, 2025

mingkai-zheng commented Jan 22, 2025 • edited Loading

xffxff commented Jan 22, 2025

mingkai-zheng commented Jan 22, 2025 •

edited

Loading