shwan commited on
Commit
d640c24
1 Parent(s): ce36fc2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ tags:
4
+ - korean
5
+ - klue
6
+ - summarization
7
+ - translation
8
+ datasets:
9
+ - c4
10
+ license: apache-2.0
11
+ ---
12
+ # KoMiniLM
13
+ 🐣 Korean mini language model
14
+
15
+ ## Overview
16
+ Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.
17
+
18
+ ## Quick tour
19
+ ```python
20
+ from transformers import AutoTokenizer, AutoModel
21
+
22
+ tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM") # 23M model
23
+ model = AutoModel.from_pretrained("BM-K/KoMiniLM")
24
+
25
+ inputs = tokenizer("안녕 세상아!", return_tensors="pt")
26
+ outputs = model(**inputs)
27
+ ```
28
+
29
+ ## Update history
30
+ ** Updates on 2022.06.20 **
31
+ - Release KoMiniLM-bert-68M
32
+
33
+ ** Updates on 2022.05.24 **
34
+ - Release KoMiniLM-bert-23M
35
+
36
+ ## Pre-training
37
+ `Teacher Model`: [KLUE-BERT(base)](https://github.com/KLUE-benchmark/KLUE)
38
+
39
+ ### Object
40
+ Self-Attention Distribution and Self-Attention Value-Relation [[Wang et al., 2020]] were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.
41
+
42
+ ### Data sets
43
+ |Data|News comments|News article|
44
+ |:----:|:----:|:----:|
45
+ |size|10G|10G|
46
+
47
+ ### Config
48
+ - **KoMiniLM-23M**
49
+ ```json
50
+ {
51
+ "architectures": [
52
+ "BartForPreTraining"
53
+ ],
54
+ "attention_probs_dropout_prob": 0.1,
55
+ "classifier_dropout": null,
56
+ "hidden_act": "gelu",
57
+ "hidden_dropout_prob": 0.1,
58
+ "hidden_size": 384,
59
+ "initializer_range": 0.02,
60
+ "intermediate_size": 1536,
61
+ "layer_norm_eps": 1e-12,
62
+ "max_position_embeddings": 512,
63
+ "model_type": "bart",
64
+ "num_attention_heads": 12,
65
+ "num_hidden_layers": 6,
66
+ "output_attentions": true,
67
+ "pad_token_id": 0,
68
+ "position_embedding_type": "absolute",
69
+ "return_dict": false,
70
+ "torch_dtype": "float32",
71
+ "transformers_version": "4.13.0",
72
+ "type_vocab_size": 2,
73
+ "use_cache": true,
74
+ "vocab_size": 32000
75
+ }
76
+ ```
77
+
78
+ ### Performance on subtasks
79
+ - The results of our fine-tuning experiments are an average of 3 runs for each task.
80
+ ```
81
+ cd KoMiniLM-Finetune
82
+ bash scripts/run_all_kominilm.sh
83
+ ```
84
+
85
+ || #Param | Average | NSMC<br>(Acc) | Naver NER<br>(F1) | PAWS<br>(Acc) | KorNLI<br>(Acc) | KorSTS<br>(Spearman) | Question Pair<br>(Acc) | KorQuaD<br>(Dev)<br>(EM/F1) |
86
+ |:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
87
+ |KoBERT(KLUE)| 110M | 86.84 | 90.20±0.07 | 87.11±0.05 | 81.36±0.21 | 81.06±0.33 | 82.47±0.14 | 95.03±0.44 | 84.43±0.18 / <br>93.05±0.04 |
88
+ |KcBERT| 108M | 78.94 | 89.60±0.10 | 84.34±0.13 | 67.02±0.42| 74.17±0.52 | 76.57±0.51 | 93.97±0.27 | 60.87±0.27 / <br>85.01±0.14 |
89
+ |KoBERT(SKT)| 92M | 79.73 | 89.28±0.42 | 87.54±0.04 | 80.93±0.91 | 78.18±0.45 | 75.98±2.81 | 94.37±0.31 | 51.94±0.60 / <br>79.69±0.66 |
90
+ |DistilKoBERT| 28M | 74.73 | 88.39±0.08 | 84.22±0.01 | 61.74±0.45 | 70.22±0.14 | 72.11±0.27 | 92.65±0.16 | 52.52±0.48 / <br>76.00±0.71 |
91
+ | | | | | | | | | |
92
+ |**KoMiniLM<sup>†</sup>**| **68M** | 85.90 | 89.84±0.02 | 85.98±0.09 | 80.78±0.30 | 79.28±0.17 | 81.00±0.07 | 94.89±0.37 | 83.27±0.08 / <br>92.08±0.06 |
93
+ |**KoMiniLM<sup>†</sup>**| **23M** | 84.79 | 89.67±0.03 | 84.79±0.09 | 78.67±0.45 | 78.10±0.07 | 78.90±0.11 | 94.81±0.12 | 82.11±0.42 / <br>91.21±0.29 |
94
+
95
+ - [NSMC](https://github.com/e9t/nsmc) (Naver Sentiment Movie Corpus)
96
+ - [Naver NER](https://github.com/naver/nlp-challenge) (NER task on Naver NLP Challenge 2018)
97
+ - [PAWS](https://github.com/google-research-datasets/paws) (Korean Paraphrase Adversaries from Word Scrambling)
98
+ - [KorNLI/KorSTS](https://github.com/kakaobrain/KorNLUDatasets) (Korean Natural Language Understanding)
99
+ - [Question Pair](https://github.com/songys/Question_pair) (Paired Question)
100
+ - [KorQuAD](https://korquad.github.io/) (The Korean Question Answering Dataset)
101
+
102
+ <img src = "https://user-images.githubusercontent.com/55969260/174229747-279122dc-9d27-4da9-a6e7-f9f1fe1651f7.png"> <br>
103
+
104
+ ### User Contributed Examples
105
+ -
106
+
107
+ ## Reference
108
+ - [KLUE BERT](https://github.com/KLUE-benchmark/KLUE)
109
+ - [KcBERT](https://github.com/Beomi/KcBERT)
110
+ - [SKT KoBERT](https://github.com/SKTBrain/KoBERT)
111
+ - [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
112
+ - [lassl](https://github.com/lassl/lassl)