Edit model card

collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5203
  • Num Input Tokens Seen: 7902160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7473 0.0363 5 1.2943 288056
1.4345 0.0726 10 1.1947 573696
1.2912 0.1089 15 1.1754 864312
1.1113 0.1452 20 1.1708 1155592
0.9992 0.1815 25 1.2262 1442896
0.8723 0.2178 30 1.2980 1738192
0.6856 0.2541 35 1.3840 2026640
0.5851 0.2904 40 1.4503 2311728
0.544 0.3267 45 1.4799 2603472
0.3782 0.3630 50 1.5256 2895216
0.3556 0.3993 55 1.5141 3177072
0.2995 0.4356 60 1.4836 3464608
0.1964 0.4719 65 1.5114 3749472
0.1595 0.5082 70 1.4883 4033064
0.2194 0.5445 75 1.4652 4326968
0.1651 0.5808 80 1.4571 4611416
0.1319 0.6171 85 1.5090 4903168
0.2408 0.6534 90 1.4362 5191528
0.1663 0.6897 95 1.5008 5474696
0.1661 0.7260 100 1.4591 5757304
0.181 0.7623 105 1.4716 6047208
0.1905 0.7985 110 1.4632 6343824
0.2526 0.8348 115 1.4277 6627376
0.0952 0.8711 120 1.5090 6915440
0.1044 0.9074 125 1.4923 7206272
0.1387 0.9437 130 1.4292 7504304
0.1038 0.9800 135 1.5096 7784504

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
15
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

Base model

google/gemma-2-2b
Finetuned
this model