Edit model card

collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5097
  • Num Input Tokens Seen: 8170160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.553 0.0316 5 1.3090 258336
1.1605 0.0632 10 1.2408 515760
0.8944 0.0948 15 1.2981 775720
0.5199 0.1264 20 1.5066 1026360
0.3638 0.1580 25 1.6259 1281232
0.217 0.1896 30 1.8240 1539752
0.1148 0.2212 35 1.9752 1796440
0.1057 0.2528 40 2.1133 2056024
0.0565 0.2844 45 2.2720 2315344
0.0582 0.3160 50 2.4049 2578576
0.0373 0.3476 55 2.5018 2831728
0.0341 0.3791 60 2.4419 3089328
0.0415 0.4107 65 2.4454 3350752
0.0285 0.4423 70 2.4645 3607904
0.0276 0.4739 75 2.5049 3874552
0.0304 0.5055 80 2.5111 4127008
0.027 0.5371 85 2.5041 4384976
0.029 0.5687 90 2.5237 4651128
0.0285 0.6003 95 2.5093 4909080
0.0287 0.6319 100 2.5117 5165688
0.0262 0.6635 105 2.5054 5415464
0.0259 0.6951 110 2.4879 5674272
0.0271 0.7267 115 2.4664 5934728
0.027 0.7583 120 2.4789 6187376
0.0288 0.7899 125 2.4795 6450096
0.0247 0.8215 130 2.4943 6712368
0.0248 0.8531 135 2.4960 6970504
0.0282 0.8847 140 2.5069 7232472
0.0266 0.9163 145 2.5055 7495824
0.0311 0.9479 150 2.5049 7758216
0.0237 0.9795 155 2.5113 8017088

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

Base model

google/gemma-2-2b
Finetuned
this model