KoichiYasuoka commited on
Commit
c580c74
1 Parent(s): 8f7be0a

initial release

Browse files
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "th"
4
+ tags:
5
+ - "thai"
6
+ - "token-classification"
7
+ - "pos"
8
+ - "dependency-parsing"
9
+ base_model: KoichiYasuoka/camembert-thai-base
10
+ datasets:
11
+ - "universal_dependencies"
12
+ license: "apache-2.0"
13
+ pipeline_tag: "token-classification"
14
+ widget:
15
+ - text: "หลายหัวดีกว่าหัวเดียว"
16
+ ---
17
+
18
+ # camembert-thai-base-upos
19
+
20
+ ## Model Description
21
+
22
+ This is a CamemBERT model pre-trained on Thai texts for POS-tagging and dependency-parsing, derived from [camembert-thai-base](https://huggingface.co/KoichiYasuoka/camembert-thati-base). Every word is tagged by [UPOS](https://universaldependencies.org/u/pos/) (Universal Part-Of-Speech).
23
+
24
+ ## How to Use
25
+
26
+ ```py
27
+ from transformers import pipeline
28
+ nlp=pipeline("token-classification","KoichiYasuoka/camembert-thai-base-upos",aggregation_strategy="simple")
29
+ print(nlp("หลายหัวดีกว่าหัวเดียว"))
30
+ ```
31
+
32
+ or
33
+
34
+ ```
35
+ import esupar
36
+ nlp=esupar.load("KoichiYasuoka/camembert-thai-base-upos")
37
+ print(nlp("หลายหัวดีกว่าหัวเดียว"))
38
+ ```
39
+
40
+ ## See Also
41
+
42
+ [esupar](https://github.com/KoichiYasuoka/esupar): Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models
43
+
config.json ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:176fecfd3434ae3519103a2cc1bfc557694ee69a111fe6e997ec4ab888d1a7e9
3
+ size 1109330278
special_tokens_map.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<s>NOTUSED",
4
+ "</s>NOTUSED",
5
+ "<_>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "cls_token": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "eos_token": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "mask_token": {
29
+ "content": "<mask>",
30
+ "lstrip": true,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "pad_token": {
36
+ "content": "<pad>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false
41
+ },
42
+ "sep_token": {
43
+ "content": "</s>",
44
+ "lstrip": false,
45
+ "normalized": false,
46
+ "rstrip": false,
47
+ "single_word": false
48
+ },
49
+ "unk_token": {
50
+ "content": "<unk>",
51
+ "lstrip": false,
52
+ "normalized": false,
53
+ "rstrip": false,
54
+ "single_word": false
55
+ }
56
+ }
supar.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00ce85aba1a732475a667f0602a827c0d416b1728ec65eaf9704cd1e61259e73
3
+ size 1163935350
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67d187215f962d5cce64e220651641808a1afeb332d7c6e22447bfe9b2aa9138
3
+ size 16916048
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff