jack813liu commited on
Commit
5a4b298
1 Parent(s): b34398a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -2
README.md CHANGED
@@ -14,6 +14,144 @@ tags:
14
  - filename
15
  - dectection
16
  - content
17
- - bert
18
  - mbert
19
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - filename
15
  - dectection
16
  - content
 
17
  - mbert
18
+ - Multilingual
19
+ ---
20
+ # Model Card for Model ID
21
+
22
+ Detect sexual content in text or file names.
23
+
24
+ ## Model Details
25
+
26
+ ### Model Description
27
+
28
+ - **Developed by:** liu wei
29
+ - **License:** MIT
30
+ - **Finetuned from model:** bert-base-multilingual-cased
31
+ - **Task:** Simple Classification
32
+ - **Language:** Multilingual
33
+ - **Max Length:** 128
34
+ - **Updated Time:** 2024-8-22
35
+
36
+ ### Model Training Information
37
+ - **Training Dataset Size:** 100,000 manually annotated data with noise
38
+ - **Data Distribution:** 50:50
39
+ - **Batch Size:** 8
40
+ - **Epochs:** 5
41
+ - **Accuracy:** 92%
42
+ - **F1:** 92%
43
+
44
+ ### Model Sources [optional]
45
+
46
+ <!-- Provide the basic links for the model. -->
47
+
48
+ - **Repository:** [More Information Needed]
49
+ - **Paper [optional]:** [More Information Needed]
50
+ - **Demo [optional]:** [More Information Needed]
51
+
52
+ ## Uses
53
+
54
+ - Supports multiple languages, such as English, Chinese, Japanese, etc.
55
+ - Use for detect content submitted by users in forums, magnetic search engines, cloud disks, etc.
56
+ - Detect semantics and variant content, Porn movie numbers or variant file names.
57
+ - Compared with GPT4O-mini, The detection accuracy is greatly improved.
58
+
59
+ ### Examples
60
+
61
+ - Example **English**
62
+ ```python
63
+ predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
64
+ ```
65
+ ```json
66
+ {
67
+ "predictions": 1,
68
+ "label": "Sexual"
69
+ }
70
+ ```
71
+
72
+ - Example **Chinese**
73
+ ```python
74
+ predict("橙子 · 保安和女业主的一夜春宵。路见不平拔刀相助,救下苏姐,以身相许!")
75
+ ```
76
+ ```json
77
+ {
78
+ "predictions": 1,
79
+ "label": "Sexual"
80
+ }
81
+ ```
82
+
83
+ - Example **Japanese**
84
+ ```python
85
+ predict("MILK-217-UNCENSORED-LEAKピタコス Gカップ痴女 完全着衣で濃密5PLAY 椿りか 580 2.TS")
86
+ ```
87
+ ```json
88
+ {
89
+ "predictions": 1,
90
+ "label": "Sexual"
91
+ }
92
+ ```
93
+
94
+ - Example **Porn Movie Numbers**
95
+ ```python
96
+ predict("DVAJ-548_CH_SD")
97
+ ```
98
+ ```json
99
+ {
100
+ "predictions": 1,
101
+ "label": "Sexual"
102
+ }
103
+ ```
104
+
105
+
106
+ ## How to Get Started with the Model
107
+
108
+ ### step 1:
109
+ Save all the model files into a directory, such as 'model'
110
+ ### step 2:
111
+ Create a python file under this model, such as 'use_model.py'
112
+ ```python
113
+ import torch
114
+ from transformers import BertForSequenceClassification, BertTokenizer
115
+ # load model
116
+ model = BertForSequenceClassification.from_pretrained("./sexual_detection")
117
+ # load tokenizer
118
+ tokenizer = BertTokenizer.from_pretrained("./sexual_detection")
119
+
120
+ def predict(text):
121
+ encoding = tokenizer(text, return_tensors="pt")
122
+ encoding = {k: v.to(model.device) for k,v in encoding.items()}
123
+
124
+ outputs = model(**encoding)
125
+ probs = torch.sigmoid(outputs.logits)
126
+
127
+ predictions = torch.argmax(probs, dim=-1)
128
+ label_map = {0: "None", 1: "Sexual"}
129
+ predicted_label = label_map[predictions.item()]
130
+ print(f"Predictions:{predictions.item()}, Label:{predicted_label}")
131
+ return {"predictions": predictions.item(), "label": predicted_label}
132
+
133
+ predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
134
+
135
+ ```
136
+ ### step 3:
137
+ Run
138
+ ```shell
139
+ python3 use_model.py
140
+ ```
141
+
142
+ Response JSON
143
+ ```json
144
+ {
145
+ "predictions": 1,
146
+ "label": "Sexual"
147
+ }
148
+ ```
149
+
150
+ ### Explanation
151
+ The results only include two situations:
152
+ - predictions-0 **Not Dectection** sexual content;
153
+ - predictions-1 **Sexual** content was detected.
154
+
155
+
156
+ ## Model Card Contact
157