Edit model card

llama3.1-cpo_j-full-0919

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0330
  • Rewards/chosen: -17.2716
  • Rewards/rejected: -17.4010
  • Rewards/accuracies: 0.5283
  • Rewards/margins: 0.1295
  • Logps/rejected: -174.0103
  • Logps/chosen: -172.7156
  • Logits/rejected: -0.7823
  • Logits/chosen: -0.8013
  • Nll Loss: 0.4931

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss
No log 0.0230 1 2.8864 -26.5532 -26.5849 0.5174 0.0317 -265.8489 -265.5324 -0.2622 -0.2859 1.1891
No log 0.0460 2 2.8852 -26.5433 -26.5669 0.5217 0.0236 -265.6693 -265.4330 -0.2636 -0.2872 1.1879
No log 0.0690 3 2.8825 -26.4975 -26.5200 0.5130 0.0225 -265.1996 -264.9747 -0.2640 -0.2874 1.1867
No log 0.0920 4 2.8709 -26.3594 -26.3890 0.5261 0.0296 -263.8900 -263.5942 -0.2712 -0.2946 1.1816
No log 0.1149 5 2.8528 -26.2159 -26.2470 0.5261 0.0311 -262.4702 -262.1591 -0.2743 -0.2975 1.1581
No log 0.1379 6 2.8240 -25.8178 -25.8572 0.5239 0.0394 -258.5723 -258.1781 -0.2926 -0.3156 1.1426
No log 0.1609 7 2.7885 -25.5238 -25.5623 0.5217 0.0385 -255.6230 -255.2376 -0.3115 -0.3343 1.1203
No log 0.1839 8 2.7394 -24.9173 -24.9665 0.5348 0.0491 -249.6646 -249.1732 -0.3523 -0.3759 1.0932
No log 0.2069 9 2.6997 -24.5346 -24.5855 0.5304 0.0509 -245.8549 -245.3461 -0.3716 -0.3958 1.0735
3.0132 0.2299 10 2.6736 -24.2278 -24.2821 0.5348 0.0543 -242.8206 -242.2775 -0.3888 -0.4137 1.0604
3.0132 0.2529 11 2.6514 -23.9833 -24.0509 0.5348 0.0676 -240.5090 -239.8333 -0.4023 -0.4275 1.0473
3.0132 0.2759 12 2.6159 -23.5112 -23.5825 0.5348 0.0712 -235.8245 -235.1123 -0.4367 -0.4619 1.0318
3.0132 0.2989 13 2.5754 -22.9830 -23.0617 0.5304 0.0787 -230.6171 -229.8301 -0.4755 -0.5012 0.9972
3.0132 0.3218 14 2.5425 -22.4882 -22.5719 0.5261 0.0837 -225.7186 -224.8815 -0.5193 -0.5447 0.9638
3.0132 0.3448 15 2.5079 -22.0390 -22.1303 0.5348 0.0913 -221.3027 -220.3899 -0.5581 -0.5829 0.9364
3.0132 0.3678 16 2.4829 -21.6296 -21.7212 0.5348 0.0917 -217.2125 -216.2958 -0.5988 -0.6233 0.9076
3.0132 0.3908 17 2.4569 -21.2310 -21.3319 0.5370 0.1010 -213.3194 -212.3095 -0.6344 -0.6586 0.8787
3.0132 0.4138 18 2.4349 -20.8876 -20.9881 0.5304 0.1005 -209.8813 -208.8761 -0.6694 -0.6929 0.8582
3.0132 0.4368 19 2.4174 -20.5639 -20.6694 0.5304 0.1055 -206.6935 -205.6389 -0.7053 -0.7288 0.8530
2.6371 0.4598 20 2.3954 -20.2042 -20.3148 0.5326 0.1105 -203.1477 -202.0424 -0.7411 -0.7640 0.8311
2.6371 0.4828 21 2.3761 -19.8765 -19.9851 0.5239 0.1086 -199.8509 -198.7649 -0.7733 -0.7956 0.8173
2.6371 0.5057 22 2.3550 -19.4793 -19.5897 0.5217 0.1104 -195.8972 -194.7931 -0.7947 -0.8166 0.7966
2.6371 0.5287 23 2.3190 -19.0685 -19.1814 0.5239 0.1129 -191.8135 -190.6849 -0.8095 -0.8311 0.7629
2.6371 0.5517 24 2.2593 -18.7171 -18.8248 0.5283 0.1077 -188.2483 -187.1710 -0.8197 -0.8403 0.7104
2.6371 0.5747 25 2.1824 -18.4482 -18.5591 0.5326 0.1109 -185.5911 -184.4822 -0.8234 -0.8436 0.6203
2.6371 0.5977 26 2.1139 -18.2639 -18.3699 0.5326 0.1059 -183.6985 -182.6391 -0.8213 -0.8404 0.5467
2.6371 0.6207 27 2.0862 -18.1268 -18.2372 0.5326 0.1104 -182.3718 -181.2681 -0.8158 -0.8341 0.5235
2.6371 0.6437 28 2.0741 -18.0305 -18.1407 0.5283 0.1103 -181.4072 -180.3046 -0.8051 -0.8225 0.5133
2.6371 0.6667 29 2.0690 -17.9415 -18.0517 0.5304 0.1101 -180.5167 -179.4155 -0.7987 -0.8158 0.5092
2.3737 0.6897 30 2.0669 -17.8450 -17.9553 0.5304 0.1103 -179.5531 -178.4496 -0.7900 -0.8066 0.5082
2.3737 0.7126 31 2.0595 -17.7753 -17.8928 0.5370 0.1175 -178.9280 -177.7533 -0.7924 -0.8090 0.5009
2.3737 0.7356 32 2.0559 -17.6972 -17.8134 0.5326 0.1162 -178.1344 -176.9719 -0.7821 -0.7989 0.5023
2.3737 0.7586 33 2.0530 -17.6212 -17.7447 0.5283 0.1235 -177.4470 -176.2120 -0.7772 -0.7941 0.4995
2.3737 0.7816 34 2.0495 -17.5594 -17.6781 0.5239 0.1187 -176.7806 -175.5940 -0.7770 -0.7941 0.4961
2.3737 0.8046 35 2.0463 -17.5069 -17.6289 0.5239 0.1220 -176.2891 -175.0691 -0.7765 -0.7938 0.4933
2.3737 0.8276 36 2.0454 -17.4648 -17.5832 0.5283 0.1184 -175.8317 -174.6475 -0.7759 -0.7937 0.4930
2.3737 0.8506 37 2.0385 -17.4124 -17.5404 0.5239 0.1280 -175.4043 -174.1244 -0.7766 -0.7948 0.4914
2.3737 0.8736 38 2.0369 -17.3727 -17.4968 0.5174 0.1241 -174.9679 -173.7269 -0.7789 -0.7972 0.4935
2.3737 0.8966 39 2.0370 -17.3371 -17.4632 0.5239 0.1262 -174.6325 -173.3709 -0.7812 -0.7995 0.4908
2.078 0.9195 40 2.0331 -17.3114 -17.4457 0.5261 0.1343 -174.4572 -173.1142 -0.7830 -0.8020 0.4896
2.078 0.9425 41 2.0353 -17.2892 -17.4183 0.5283 0.1291 -174.1829 -172.8922 -0.7830 -0.8019 0.4943
2.078 0.9655 42 2.0323 -17.2779 -17.4112 0.5348 0.1333 -174.1118 -172.7786 -0.7816 -0.8008 0.4935
2.078 0.9885 43 2.0330 -17.2716 -17.4010 0.5283 0.1295 -174.0103 -172.7156 -0.7823 -0.8013 0.4931

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.1
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for jbjeong91/llama3.1-cpo_j-full-0919

Finetuned
this model

Dataset used to train jbjeong91/llama3.1-cpo_j-full-0919