dhkim2810 commited on
Commit
545c86c
1 Parent(s): 93eb021

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md CHANGED
@@ -1,3 +1,80 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Faster Segement Anything (MobileSAM)
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ - **Repository:** [Github - MobileSAM](https://github.com/ChaoningZhang/MobileSAM)
10
+ - **Paper:** [Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/pdf/2306.14289.pdf)
11
+ - **Demo:** [HuggingFace Demo](https://huggingface.co/dhkim2810/MobileSAM)
12
+
13
+ **MobileSAM** performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
14
+
15
+ The comparison of ViT-based image encoder is summarzed as follows:
16
+
17
+ Image Encoder | Original SAM | MobileSAM
18
+ :------------:|:-------------:|:---------:
19
+ Paramters | 611M | 5M
20
+ Speed | 452ms | 8ms
21
+
22
+ Original SAM and MobileSAM have exactly the same prompt-guided mask decoder:
23
+
24
+ Mask Decoder | Original SAM | MobileSAM
25
+ :-----------------------------------------:|:---------:|:-----:
26
+ Paramters | 3.876M | 3.876M
27
+ Speed | 4ms | 4ms
28
+
29
+ The comparison of the whole pipeline is summarzed as follows:
30
+ Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM
31
+ :-----------------------------------------:|:---------:|:-----:
32
+ Paramters | 615M | 9.66M
33
+ Speed | 456ms | 12ms
34
+
35
+
36
+ ## Acknowledgement
37
+
38
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
39
+
40
+ <details>
41
+ <summary>
42
+ <a href="https://github.com/facebookresearch/segment-anything">SAM</a> (Segment Anything) [<b>bib</b>]
43
+ </summary>
44
+
45
+ ```bibtex
46
+ @article{kirillov2023segany,
47
+ title={Segment Anything},
48
+ author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
49
+ journal={arXiv:2304.02643},
50
+ year={2023}
51
+ }
52
+ ```
53
+ </details>
54
+
55
+
56
+
57
+ <details>
58
+ <summary>
59
+ <a href="https://github.com/microsoft/Cream/tree/main/TinyViT">TinyViT</a> (TinyViT: Fast Pretraining Distillation for Small Vision Transformers) [<b>bib</b>]
60
+ </summary>
61
+
62
+ ```bibtex
63
+ @InProceedings{tiny_vit,
64
+ title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
65
+ author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
66
+ booktitle={European conference on computer vision (ECCV)},
67
+ year={2022}
68
+ ```
69
+ </details>
70
+
71
+
72
+ **BibTeX:**
73
+ ```bibtex
74
+ @article{mobile_sam,
75
+ title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
76
+ author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
77
+ journal={arXiv preprint arXiv:2306.14289},
78
+ year={2023}
79
+ }
80
+ ```