nlztrk's picture
Create README.md
675ddb5
|
raw
history blame
No virus
1.36 kB

Train-Test Set: "teknofest_train_final.csv"

Model: "dbmdz/bert-base-turkish-128k-uncased"

Önişleme

  • Büyük karakterler öncesine special token (#) eklenip sonrasında karakterler küçültülmüştür
  • Noktalama işaretleri silinmiştir

Tokenizer Parametreleri

max_length=64
padding=True
truncation=True

Eğitim Parametreleri

  • Epoch: 3
  • Learning Rate: 7e-5
  • Batch-Size: 64
  • Tokenizer Length: 64
  • Loss: BCE
  • Online Hard Example Mining: Açık
  • Class-Weighting: Açık (^0.3)
  • Early Stopping: Kapalı
  • Stratified Batch Sampling: Açık
  • Gradient Accumulation: Kapalı
  • LR Scheduler: Cosine-with-Warmup
  • Warmup Ratio: 0.1
  • Weight Decay: 0.01
  • LLRD: 0.95
  • Label Smoothing: 0.05
  • Gradient Clipping: 1.0
  • MLM Pre-Training: Kapalı

CV10 Sonuçları

              precision    recall  f1-score   support

      INSULT     0.9172    0.9260    0.9216      2393
       OTHER     0.9681    0.9646    0.9663      3528
   PROFANITY     0.9627    0.9571    0.9599      2376
      RACIST     0.9684    0.9651    0.9667      2033
      SEXIST     0.9618    0.9668    0.9643      2081

    accuracy                         0.9562     12411
   macro avg     0.9557    0.9559    0.9558     12411
weighted avg     0.9563    0.9562    0.9562     12411