Why the sequence length is not the equal to the number of patches?
#1
by
sugarExcess
- opened
The last_hidden_state
of the model outputs arrays of shape (batch_size, sequence_length, hidden_dim)
.
The sequence_length
is 50 but from my understanding of the ViT paper it should be equal to the number of patches 224*224 / 16*16 = 196
.
I checked the original model and there the sequence_length
is indeed 196.
What am I missing?
Hello, using the example furnished in the model card, I get :
print(outputs['logits'].shape)
torch.Size([1, 196, 768])
You may have been modifying the configuration or the input shape of your image is lower than the one expected, giving less patches than with 224*224
Hi,
This is explained here: https://discuss.huggingface.co/t/size-of-last-hidden-state-and-mask-in-vitmae/70297/3?u=nielsr
Thank you for the response!
sugarExcess
changed discussion status to
closed