metadata

pipeline_type: text-classification
widget:
  - text: this is a lovely message
    example_title: Example 1
    multi_class: false
  - text: you are an idiot and you and your family should go back to your country
    example_title: Example 2
    multi_class: false
language:
  - en
  - nl
  - fr
  - pt
  - it
  - es
  - de
  - da
  - pl
  - af
datasets:
  - jigsaw_toxicity_pred
metrics:
  - F1 Accuracy

citizenlab/distilbert-base-multilingual-cased-toxicity

This is multilingual Distil-Bert model sequence classifier trained based on JIGSAW Toxic Comment Classification Challenge dataset.

How to use it

from transformers import pipeline

model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity"

toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path)
toxicity_classifier("this is a lovely message")
> [{'label': 'not_toxic', 'score': 0.9954179525375366}]

toxicity_classifier("you are an idiot and you and your family should go back to your country")
> [{'label': 'toxic', 'score': 0.9948776960372925}]

Evaluation

Accuracy

  Accuracy Score = 0.9425
F1 Score (Micro) = 0.9450549450549449
F1 Score (Macro) = 0.8491432341169309