Why the sequence length is not the equal to the number of patches?

by sugarExcess - opened Dec 5, 2023

Dec 5, 2023

•

edited Dec 5, 2023

The last_hidden_state of the model outputs arrays of shape (batch_size, sequence_length, hidden_dim).
The sequence_length is 50 but from my understanding of the ViT paper it should be equal to the number of patches 224*224 / 16*16 = 196.

I checked the original model and there the sequence_length is indeed 196.
What am I missing?

perlis

Dec 26, 2023

Hello, using the example furnished in the model card, I get :
print(outputs['logits'].shape)

torch.Size([1, 196, 768])
You may have been modifying the configuration or the input shape of your image is lower than the one expected, giving less patches than with 224*224

nielsr

Jan 23

Hi,

This is explained here: https://discuss.huggingface.co/t/size-of-last-hidden-state-and-mask-in-vitmae/70297/3?u=nielsr

sugarExcess

Jan 24

Thank you for the response!

sugarExcess changed discussion status to closed Jan 24

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment