peace_cont_bert / README.md
BenjaminOcampo's picture
Upload README.md with huggingface_hub
031683f verified
metadata
base_model: BenjaminOcampo/model-contrastive-bert__trained-in-ishate__seed-0
datasets:
  - ISHate
language:
  - en
library_name: transformers
license: bsl-1.0
metrics:
  - f1
  - accuracy
tags:
  - hate-speech-detection
  - implicit-hate-speech

This model card documents the demo paper "PEACE: Providing Explanations and Analysis for Combating Hate Expressions" accepted at the 27th European Conference on Artificial Intelligence: https://www.ecai2024.eu/calls/demos.

The Model

This model is a hate speech detector fine-tuned specifically for detecting implicit hate speech. It is based on the paper "PEACE: Providing Explanations and Analysis for Combating Hate Expressions" by Greta Damo, Nicolás Benjamín Ocampo, Elena Cabrio, and Serena Villata, presented at the 27th European Conference on Artificial Intelligence.

Training Parameters and Experimental Info

The model was trained using the ISHate dataset, focusing on implicit data. Training parameters included:

  • Batch size: 32
  • Weight decay: 0.01
  • Epochs: 4
  • Learning rate: 2e-5

For detailed information on the training process, please refer to the model's paper.

Usage

First you might need the transformers version 4.30.2.

pip install transformers==4.30.2

This model was created using pytorch vanilla. In order to load it you have to use the following Model Class.

class ContrastiveModel(nn.Module):
    def __init__(self, model):
        super(ContrastiveModel, self).__init__()
        self.model = model
        self.embedding_dim = model.config.hidden_size
        self.fc = nn.Linear(self.embedding_dim, self.embedding_dim)
        self.classifier = nn.Linear(self.embedding_dim, 2)  # Classification layer

    def forward(self, input_ids, attention_mask):
        outputs = self.model(input_ids, attention_mask)
        embeddings = outputs.last_hidden_state[:, 0]  # Use the CLS token embedding as the representation
        embeddings = self.fc(embeddings)
        logits = self.classifier(embeddings)  # Apply classification layer

        return embeddings, logits

Then, we instantiate the model as:

from transformers import AutoModel, AutoTokenizer, AutoConfig

repo_name = "BenjaminOcampo/peace_cont_bert"

config = AutoConfig.from_pretrained(repo_name)
contrastive_model = ContrastiveModel(AutoModel.from_config(config))
tokenizer = AutoTokenizer.from_pretrained(repo_name)

Finally, to load the weights of the model we do as follows:

model_tmp_file = hf_hub_download(repo_id=repo_name, filename="model.pt", token=read_token)

state_dict = torch.load(model_tmp_file)

contrastive_model.load_state_dict(state_dict)

You can make predictions as any pytorch model:

import torch

text = "Are you sure that Islam is a peaceful religion?"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    _, logits = contrastive_model(inputs["input_ids"], inputs["attention_mask"])

probabilities = torch.softmax(logits, dim=1)
_, predicted_labels = torch.max(probabilities, dim=1)

Datasets

The model was trained on the ISHate dataset, specifically the training part of the dataset which focuses on implicit hate speech.

Evaluation Results

The model's performance was evaluated using standard metrics, including F1 score and accuracy. For comprehensive evaluation results, refer to the linked paper.

Authors: