jbjeong91's picture
End of training
09ab0b3 verified
metadata
library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
  - alignment-handbook
  - trl
  - cpo
  - generated_from_trainer
  - trl
  - cpo
  - generated_from_trainer
datasets:
  - princeton-nlp/llama3-ultrafeedback
model-index:
  - name: llama3.1-cpo-full-0919
    results: []

llama3.1-cpo-full-0919

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0545
  • Rewards/chosen: -18.3931
  • Rewards/rejected: -18.5452
  • Rewards/accuracies: 0.5261
  • Rewards/margins: 0.1521
  • Logps/rejected: -185.4521
  • Logps/chosen: -183.9312
  • Logits/rejected: -0.7551
  • Logits/chosen: -0.7797
  • Nll Loss: 0.5180

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss
No log 0.0230 1 2.5065 -26.5532 -26.5849 0.5174 0.0317 -265.8489 -265.5324 -0.2622 -0.2859 0.7676
No log 0.0460 2 2.5060 -26.5180 -26.5504 0.5217 0.0324 -265.5038 -265.1801 -0.2634 -0.2869 0.7666
No log 0.0690 3 2.5051 -26.5009 -26.5278 0.5217 0.0269 -265.2777 -265.0088 -0.2657 -0.2893 0.7661
No log 0.0920 4 2.4982 -26.3952 -26.4315 0.5239 0.0363 -264.3147 -263.9518 -0.2690 -0.2926 0.7632
No log 0.1149 5 2.4894 -26.2651 -26.3011 0.5217 0.0360 -263.0112 -262.6512 -0.2750 -0.2985 0.7594
No log 0.1379 6 2.4689 -25.9450 -25.9855 0.5283 0.0405 -259.8551 -259.4500 -0.2858 -0.3086 0.7502
No log 0.1609 7 2.4511 -25.7084 -25.7527 0.5283 0.0443 -257.5271 -257.0843 -0.2972 -0.3202 0.7433
No log 0.1839 8 2.4180 -25.2215 -25.2724 0.5326 0.0510 -252.7242 -252.2147 -0.3254 -0.3486 0.7291
No log 0.2069 9 2.3952 -24.8845 -24.9393 0.5283 0.0548 -249.3929 -248.8451 -0.3463 -0.3701 0.7192
2.6865 0.2299 10 2.3761 -24.6215 -24.6782 0.5348 0.0567 -246.7821 -246.2148 -0.3604 -0.3845 0.7115
2.6865 0.2529 11 2.3609 -24.4027 -24.4705 0.5391 0.0678 -244.7050 -244.0270 -0.3731 -0.3976 0.7051
2.6865 0.2759 12 2.3367 -24.0560 -24.1306 0.5348 0.0746 -241.3063 -240.5604 -0.3970 -0.4218 0.6951
2.6865 0.2989 13 2.3109 -23.6786 -23.7645 0.5304 0.0860 -237.6454 -236.7858 -0.4179 -0.4434 0.6840
2.6865 0.3218 14 2.2906 -23.3175 -23.4031 0.5348 0.0856 -234.0311 -233.1748 -0.4423 -0.4679 0.6733
2.6865 0.3448 15 2.2729 -22.9946 -23.0933 0.5348 0.0988 -230.9332 -229.9456 -0.4660 -0.4917 0.6637
2.6865 0.3678 16 2.2576 -22.7067 -22.8056 0.5370 0.0990 -228.0565 -227.0665 -0.4886 -0.5142 0.6549
2.6865 0.3908 17 2.2411 -22.4130 -22.5166 0.5283 0.1036 -225.1658 -224.1296 -0.5152 -0.5408 0.6460
2.6865 0.4138 18 2.2300 -22.1594 -22.2652 0.5261 0.1058 -222.6522 -221.5937 -0.5400 -0.5656 0.6382
2.6865 0.4368 19 2.2170 -21.9205 -22.0355 0.5304 0.1150 -220.3547 -219.2051 -0.5657 -0.5915 0.6308
2.3904 0.4598 20 2.2065 -21.7054 -21.8209 0.5283 0.1156 -218.2092 -217.0537 -0.5920 -0.6175 0.6241
2.3904 0.4828 21 2.1932 -21.4871 -21.6107 0.5261 0.1236 -216.1072 -214.8710 -0.6189 -0.6441 0.6172
2.3904 0.5057 22 2.1839 -21.2899 -21.4129 0.5196 0.1230 -214.1287 -212.8987 -0.6445 -0.6694 0.6109
2.3904 0.5287 23 2.1746 -21.0873 -21.2117 0.5261 0.1244 -212.1172 -210.8729 -0.6688 -0.6940 0.6045
2.3904 0.5517 24 2.1656 -20.9136 -21.0398 0.5239 0.1262 -210.3979 -209.1364 -0.6938 -0.7184 0.5989
2.3904 0.5747 25 2.1555 -20.7191 -20.8481 0.5283 0.1290 -208.4814 -207.1911 -0.7120 -0.7365 0.5926
2.3904 0.5977 26 2.1466 -20.5485 -20.6790 0.5283 0.1305 -206.7897 -205.4852 -0.7301 -0.7545 0.5872
2.3904 0.6207 27 2.1392 -20.3722 -20.5040 0.5370 0.1318 -205.0401 -203.7218 -0.7476 -0.7720 0.5816
2.3904 0.6437 28 2.1308 -20.1853 -20.3216 0.5326 0.1363 -203.2164 -201.8533 -0.7575 -0.7818 0.5756
2.3904 0.6667 29 2.1229 -19.9946 -20.1315 0.5283 0.1370 -201.3155 -199.9459 -0.7683 -0.7925 0.5695
2.3172 0.6897 30 2.1134 -19.7893 -19.9304 0.5261 0.1411 -199.3041 -197.8930 -0.7735 -0.7976 0.5630
2.3172 0.7126 31 2.1055 -19.5960 -19.7401 0.5283 0.1441 -197.4013 -195.9599 -0.7735 -0.7977 0.5569
2.3172 0.7356 32 2.0985 -19.4016 -19.5462 0.5217 0.1445 -195.4615 -194.0163 -0.7817 -0.8060 0.5508
2.3172 0.7586 33 2.0904 -19.2117 -19.3617 0.5239 0.1501 -193.6172 -192.1166 -0.7785 -0.8030 0.5447
2.3172 0.7816 34 2.0850 -19.0381 -19.1813 0.5239 0.1432 -191.8132 -190.3807 -0.7758 -0.8003 0.5392
2.3172 0.8046 35 2.0793 -18.8988 -19.0437 0.5174 0.1449 -190.4374 -188.9884 -0.7715 -0.7964 0.5346
2.3172 0.8276 36 2.0720 -18.7545 -18.8980 0.5196 0.1435 -188.9801 -187.5452 -0.7701 -0.7952 0.5299
2.3172 0.8506 37 2.0663 -18.6567 -18.8053 0.5261 0.1486 -188.0533 -186.5672 -0.7679 -0.7927 0.5266
2.3172 0.8736 38 2.0643 -18.5627 -18.7139 0.5239 0.1512 -187.1391 -185.6268 -0.7631 -0.7882 0.5235
2.3172 0.8966 39 2.0601 -18.5100 -18.6606 0.5283 0.1507 -186.6065 -185.0997 -0.7609 -0.7857 0.5217
2.1039 0.9195 40 2.0598 -18.4610 -18.6128 0.5283 0.1518 -186.1283 -184.6099 -0.7611 -0.7860 0.5201
2.1039 0.9425 41 2.0539 -18.4232 -18.5801 0.5261 0.1568 -185.8007 -184.2324 -0.7540 -0.7789 0.5190
2.1039 0.9655 42 2.0544 -18.3969 -18.5526 0.5283 0.1557 -185.5258 -183.9690 -0.7525 -0.7777 0.5181
2.1039 0.9885 43 2.0545 -18.3931 -18.5452 0.5261 0.1521 -185.4521 -183.9312 -0.7551 -0.7797 0.5180

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.1
  • Datasets 2.21.0
  • Tokenizers 0.19.1