--- license: cc-by-nc-sa-4.0 language: - en metrics: - f1 - accuracy widget: - text: Girls like attention and they get desperate tags: - sexism datasets: - tum-nlp/sexism-socialmedia-balanced --- # BERTweet for sexism detection This is a fine-tuned BERTweet large ([BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/)) model for detecting sexism. The training dataset is **new balanced** version of Explainable Detection of Online Sexism ([**EDOS**](https://github.com/rewire-online/edos))--[sexism-socialmedia-balanced](https://huggingface.co/datasets/tum-nlp/sexism-socialmedia-balanced)--consisting of 16000 entries in English gathered from social media platforms: Twitter and Gab. It achieved a **Macro-F1** score of **0.85** and an **Accuracy** of **0.88** on the test set for the EDOS task. ## How to use ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bertweet-sexism') model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bertweet-sexism') # Create the pipeline for classification sexism_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) # Predict sexism_classifier("Girls like attention and they get desperate") ``` ## Citation ``` @inproceedings{rydelek-etal-2023-adamr, title = "{A}dam{R} at {S}em{E}val-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning", author = "Rydelek, Adam and Dementieva, Daryna and Groh, Georg", editor = {Ojha, Atul Kr. and Do{\u{g}}ru{\"o}z, A. Seza and Da San Martino, Giovanni and Tayyar Madabushi, Harish and Kumar, Ritesh and Sartori, Elisa}, booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.semeval-1.190", doi = "10.18653/v1/2023.semeval-1.190", pages = "1371--1381", abstract = "The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on different datasets, which are tested to find the balance between performance and interpretability. This solution ranked us in the top 40{\%} of teams for each of the tracks.", } ``` ## Licensing Information [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png