Edit model card

collapse_gemma-2-2b_hs2_replace_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1872
  • Num Input Tokens Seen: 8213360

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5456 0.0318 5 1.3070 265304
1.2703 0.0636 10 1.2218 523568
0.9657 0.0954 15 1.2481 782936
0.6809 0.1271 20 1.3512 1047016
0.5361 0.1589 25 1.4749 1317216
0.3923 0.1907 30 1.5943 1576968
0.2609 0.2225 35 1.7244 1835720
0.1476 0.2543 40 1.9387 2096288
0.0973 0.2861 45 2.0971 2356272
0.1096 0.3178 50 2.1618 2620648
0.0651 0.3496 55 2.1722 2877880
0.0528 0.3814 60 2.1908 3133480
0.0455 0.4132 65 2.2489 3397968
0.0474 0.4450 70 2.2538 3659672
0.0919 0.4768 75 2.2320 3919344
0.0311 0.5085 80 2.1670 4187912
0.0319 0.5403 85 2.2046 4447608
0.0278 0.5721 90 2.2165 4705488
0.0284 0.6039 95 2.2048 4979080
0.0549 0.6357 100 2.1651 5237472
0.0319 0.6675 105 2.1267 5496736
0.0304 0.6992 110 2.1044 5759232
0.0274 0.7310 115 2.0821 6021968
0.0307 0.7628 120 2.0860 6280048
0.0297 0.7946 125 2.1247 6547056
0.0283 0.8264 130 2.1514 6801816
0.0295 0.8582 135 2.1703 7057840
0.0326 0.8899 140 2.1964 7323848
0.0283 0.9217 145 2.1959 7580872
0.0439 0.9535 150 2.1948 7845552
0.0282 0.9853 155 2.1853 8107432

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
20
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter4_sftsd0

Base model

google/gemma-2-2b
Finetuned
this model