You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: NVIDIA RTX A5000
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
from transformers import AutoImageProcessor, FlaxDinov2Model, Dinov2Model
from PIL import Image
import requests
import numpy as np
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-base")
jax_inputs = image_processor(images=image, return_tensors="np")
# flax model
model = FlaxDinov2Model.from_pretrained("facebook/dinov2-base")
outputs = model(**jax_inputs)
jax_results = outputs.last_hidden_state
# torch model
import torch
model = Dinov2Model.from_pretrained("facebook/dinov2-base")
torch_inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**torch_inputs)
torch_results = outputs.last_hidden_state
print (np.abs(jax_results - torch_results.numpy()).max())
Expected behavior
Hi,
I'm using the Flax version of DINOv2 and want to make sure that it returns consistent results as the torch version. So I run a simple test script as attached. However, I noticed that the token embeddings can have value difference larger than 6 by running it. I was wondering that if this is as expected due to numerical differences? Or is there something wrong in my code and the difference should not be so large? Thanks for your help!
The text was updated successfully, but these errors were encountered:
The issue most likely arises due to numerical differences. While comparing the models for consistency, the ideal approach would be to look at the alignment of the classification token embeddings of the two models,
The result (in my case) comes out to be 0.999900, which clearly indicates that even though there are small numerical differences in the complete tensors, the semantic understanding of the image that the models have is essentially the same.
Note -> jax_results[:, 0, :] works in this case as the first token in DINOv2 is the classification token.
System Info
transformers
version: 4.50.0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Hi,
I'm using the Flax version of DINOv2 and want to make sure that it returns consistent results as the torch version. So I run a simple test script as attached. However, I noticed that the token embeddings can have value difference larger than 6 by running it. I was wondering that if this is as expected due to numerical differences? Or is there something wrong in my code and the difference should not be so large? Thanks for your help!
The text was updated successfully, but these errors were encountered: