|
--- |
|
pipeline_type: "text-classification" |
|
|
|
widget: |
|
- text: "this is a lovely message" |
|
example_title: "Example 1" |
|
multi_class: false |
|
- text: "you are an idiot and you and your family should go back to your country" |
|
example_title: "Example 2" |
|
multi_class: false |
|
|
|
|
|
language: |
|
- en |
|
- nl |
|
- fr |
|
- pt |
|
- it |
|
- es |
|
- de |
|
- da |
|
- pl |
|
- af |
|
|
|
datasets: |
|
- jigsaw_toxicity_pred |
|
metrics: |
|
- F1 Accuracy |
|
--- |
|
|
|
# citizenlab/distilbert-base-multilingual-cased-toxicity |
|
|
|
This is multilingual Distil-Bert model sequence classifier trained based on [JIGSAW Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) dataset. |
|
|
|
## How to use it |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity" |
|
|
|
toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path) |
|
toxicity_classifier("this is a lovely message") |
|
> [{'label': 'not_toxic', 'score': 0.9954179525375366}] |
|
|
|
toxicity_classifier("you are an idiot and you and your family should go back to your country") |
|
> [{'label': 'toxic', 'score': 0.9948776960372925}] |
|
|
|
``` |
|
|
|
## Evaluation |
|
|
|
### Accuracy |
|
|
|
``` |
|
Accuracy Score = 0.9425 |
|
F1 Score (Micro) = 0.9450549450549449 |
|
F1 Score (Macro) = 0.8491432341169309 |
|
``` |