add_special_tokens=True doesn't add eos token at the end of the sequence

by Andriy - opened Jul 3

Jul 3

full_text = "This is a test sequence"
full_text_encoded = self.tokenizer.tokenize("This is a test sequence", add_special_tokens=True, return_tensors="pt")
print(full_text_encoded)

['This', 'Ġis', 'Ġa', 'Ġtest', 'Ġsequence']

Is eos token supposed to be added manually?

jklj077

Qwen org Jul 15

for Qwen2Tokenizer, add_special_tokens indeed does nothing, as there is not a consistent way to implement this.

please use the correct format depending on your usecase.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment