-
Notifications
You must be signed in to change notification settings - Fork 86
Performance and Behavioral Differences Between Versions of model.safetensors.index.json #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @xffxff Thank you so much for providing the information! Actually, both versions of the model give quite reasonable responses for the provided cat.png, I’m just a bit confused about where the discrepancies comes in and will investigate further on my end. Currently, I’m trying to reproduce the performance of Aria on the MMMU (val) benchmark using both versions, based on the lmms-eval codebase. In my experiments, both versions achieve quite similar results, with performance around 45 (input resolution fixed to 980x980). However, this is significantly lower than the 54.9 reported in Table 1 of your paper. Could you clarify how your team evaluates performance on MMMU? Specifically:
BTW, I believe this question might also be related to the other issue I created yesterday (#90). Thank you so much for your help !!! |
Thanks for your question! I’m not directly involved in the training and evaluation of the Aria model, so I don’t have all the details about MMLU evaluation. What I can share is that we use an internal evaluation framework, which might differ from lmms-eval. @LiJunnan1992 @dxli94 @teowu may know more details |
Hello,
I recently noticed that the model.safetensors.index.json file was updated to include two additional keys:
vision_tower.post_layernorm.bias
andvision_tower.post_layernorm.weight
. I also noticed that the model conducted with the current GitHub codebase does not have the vision_tower.post_layernorm layer.After testing the image cat.png using the demo example (with do_sample=False and no temperature applied), I observed slight differences in the output based on two versions.
Could you clarify:
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: