File size: 3,132 Bytes
464adb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a4b298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62d944c
 
5a4b298
 
 
 
62d944c
5a4b298
62d944c
 
5a4b298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62d944c
5a4b298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
license: mit
language:
- en
- ja
- zh
- ko
metrics:
- accuracy
base_model: google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- sex
- filename
- dectection
- content
- mbert
- Multilingual
---
# Model Card for Model ID

Detect sexual content in text or file names.

## Model Details

### Model Description

- **Developed by:** liu wei
- **License:** MIT
- **Finetuned from model:** bert-base-multilingual-cased
- **Task:** Simple Classification
- **Language:** Multilingual
- **Max Length:** 128
- **Updated Time:** 2024-8-22 

### Model Training Information
- **Training Dataset Size:** 100,000 manually annotated data with noise
- **Data Distribution:** 50:50
- **Batch Size:** 8
- **Epochs:** 5
- **Accuracy:** 92%
- **F1:** 92%


## Uses

- Supports multiple languages, such as English, Chinese, Japanese, etc. 
- Use for detect content submitted by users in forums, magnetic search engines, cloud disks, etc. 
- Detect semantics and variant content, Porn movie numbers or variant file names.
- Compared with GPT4O-mini, The detection accuracy is greatly improved.

### Examples

- Example **English**
```python
predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
```
```json
{
    "predictions": 1,
    "label": "Sexual"
}
```

- Example **Chinese**
```python
predict("橙子 · 保安和女业主的一夜春宵。路见不平拔刀相助,救下苏姐,以身相许!")
```
```json
{
    "predictions": 1,
    "label": "Sexual"
}
```

- Example **Japanese**
```python
predict("MILK-217-UNCENSORED-LEAKピタコス Gカップ痴女 完全着衣で濃密5PLAY 椿りか 580 2.TS")
```
```json
{
    "predictions": 1,
    "label": "Sexual"
}
```

- Example **Porn Movie Numbers**
```python
predict("DVAJ-548_CH_SD")
```
```json
{
    "predictions": 1,
    "label": "Sexual"
}
```


## How to Get Started with the Model


### step 1: 
Create a python file under this model, such as 'use_model.py'
```python
import torch
from transformers import BertForSequenceClassification, BertTokenizer

# load model
tokenizer = BertTokenizer.from_pretrained("uget/sexual_content_dection")
model = BertForSequenceClassification.from_pretrained("uget/sexual_content_dection")

def predict(text):
    encoding = tokenizer(text, return_tensors="pt")
    encoding = {k: v.to(model.device) for k,v in encoding.items()}

    outputs = model(**encoding)
    probs = torch.sigmoid(outputs.logits)
    
    predictions = torch.argmax(probs, dim=-1)
    label_map = {0: "None", 1: "Sexual"}
    predicted_label = label_map[predictions.item()]
    print(f"Predictions:{predictions.item()}, Label:{predicted_label}")
    return {"predictions": predictions.item(), "label": predicted_label}

predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")

```
### step 2:
Run
```shell
python3 use_model.py
```

Response JSON
```json
{
    "predictions": 1,
    "label": "Sexual"
}
```

### Explanation
The results only include two situations: 
- predictions-0   **Not Dectection** sexual content; 
- predictions-1   **Sexual** content was detected.


## Model Card Contact
Email: [email protected]