2Vasabi commited on
Commit
bfb07e5
1 Parent(s): a655f52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -1
README.md CHANGED
@@ -26,4 +26,69 @@ tvl was trained in bf16
26
 
27
  Train dataset contains:
28
  - GrandMaster-PRO-MAX dataset (60k samples)
29
- - Translated, humanized and merged by image subset of GQA (TODO)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  Train dataset contains:
28
  - GrandMaster-PRO-MAX dataset (60k samples)
29
+ - Translated, humanized and merged by image subset of GQA (TODO)
30
+
31
+ ## Bechmarks
32
+
33
+ ### TODO
34
+
35
+ ## Quickstart
36
+
37
+ Your can simply run [this notebook](https://www.kaggle.com/code/artemdzhalilov/tvl-hand-test) or run code below.
38
+
39
+ First install qwen-vl-utils and dev version of transformers:
40
+
41
+ ```bash
42
+ pip install qwen-vl-utils
43
+ pip install --no-cache-dir git+https://github.com/huggingface/transformers@19e6e80e10118f855137b90740936c0b11ac397f
44
+ ```
45
+
46
+ And then run:
47
+
48
+ ```bash
49
+ from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
50
+ from qwen_vl_utils import process_vision_info
51
+ import torch
52
+
53
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
54
+ "2Vasabi/tvl-mini-instruct-0.1", torch_dtype=torch.bfloat16, device_map="auto"
55
+ )
56
+
57
+
58
+ processor = AutoProcessor.from_pretrained("2Vasabi/tvl-mini-instruct-0.1")
59
+ messages = [
60
+ {
61
+ "role": "user",
62
+ "content": [
63
+ {
64
+ "type": "image",
65
+ "image": "https://i.ibb.co/d0QL8s6/images.jpg",
66
+ },
67
+ {"type": "text", "text": "Кратко опиши что ты видишь на изображении"},
68
+ ],
69
+ }
70
+ ]
71
+
72
+ text = processor.apply_chat_template(
73
+ messages, tokenize=False, add_generation_prompt=True
74
+ )
75
+ image_inputs, video_inputs = process_vision_info(messages)
76
+ inputs = processor(
77
+ text=[text],
78
+ images=image_inputs,
79
+ videos=video_inputs,
80
+ padding=True,
81
+ return_tensors="pt",
82
+ )
83
+
84
+ inputs = inputs.to("cuda")
85
+
86
+ generated_ids = model.generate(**inputs, max_new_tokens=1000)
87
+ generated_ids_trimmed = [
88
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
89
+ ]
90
+ output_text = processor.batch_decode(
91
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
92
+ )
93
+ print(output_text)
94
+ ```