llama3.1-cpo_j-full-0919
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:
- Loss: 2.0330
- Rewards/chosen: -17.2716
- Rewards/rejected: -17.4010
- Rewards/accuracies: 0.5283
- Rewards/margins: 0.1295
- Logps/rejected: -174.0103
- Logps/chosen: -172.7156
- Logits/rejected: -0.7823
- Logits/chosen: -0.8013
- Nll Loss: 0.4931
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|
No log | 0.0230 | 1 | 2.8864 | -26.5532 | -26.5849 | 0.5174 | 0.0317 | -265.8489 | -265.5324 | -0.2622 | -0.2859 | 1.1891 |
No log | 0.0460 | 2 | 2.8852 | -26.5433 | -26.5669 | 0.5217 | 0.0236 | -265.6693 | -265.4330 | -0.2636 | -0.2872 | 1.1879 |
No log | 0.0690 | 3 | 2.8825 | -26.4975 | -26.5200 | 0.5130 | 0.0225 | -265.1996 | -264.9747 | -0.2640 | -0.2874 | 1.1867 |
No log | 0.0920 | 4 | 2.8709 | -26.3594 | -26.3890 | 0.5261 | 0.0296 | -263.8900 | -263.5942 | -0.2712 | -0.2946 | 1.1816 |
No log | 0.1149 | 5 | 2.8528 | -26.2159 | -26.2470 | 0.5261 | 0.0311 | -262.4702 | -262.1591 | -0.2743 | -0.2975 | 1.1581 |
No log | 0.1379 | 6 | 2.8240 | -25.8178 | -25.8572 | 0.5239 | 0.0394 | -258.5723 | -258.1781 | -0.2926 | -0.3156 | 1.1426 |
No log | 0.1609 | 7 | 2.7885 | -25.5238 | -25.5623 | 0.5217 | 0.0385 | -255.6230 | -255.2376 | -0.3115 | -0.3343 | 1.1203 |
No log | 0.1839 | 8 | 2.7394 | -24.9173 | -24.9665 | 0.5348 | 0.0491 | -249.6646 | -249.1732 | -0.3523 | -0.3759 | 1.0932 |
No log | 0.2069 | 9 | 2.6997 | -24.5346 | -24.5855 | 0.5304 | 0.0509 | -245.8549 | -245.3461 | -0.3716 | -0.3958 | 1.0735 |
3.0132 | 0.2299 | 10 | 2.6736 | -24.2278 | -24.2821 | 0.5348 | 0.0543 | -242.8206 | -242.2775 | -0.3888 | -0.4137 | 1.0604 |
3.0132 | 0.2529 | 11 | 2.6514 | -23.9833 | -24.0509 | 0.5348 | 0.0676 | -240.5090 | -239.8333 | -0.4023 | -0.4275 | 1.0473 |
3.0132 | 0.2759 | 12 | 2.6159 | -23.5112 | -23.5825 | 0.5348 | 0.0712 | -235.8245 | -235.1123 | -0.4367 | -0.4619 | 1.0318 |
3.0132 | 0.2989 | 13 | 2.5754 | -22.9830 | -23.0617 | 0.5304 | 0.0787 | -230.6171 | -229.8301 | -0.4755 | -0.5012 | 0.9972 |
3.0132 | 0.3218 | 14 | 2.5425 | -22.4882 | -22.5719 | 0.5261 | 0.0837 | -225.7186 | -224.8815 | -0.5193 | -0.5447 | 0.9638 |
3.0132 | 0.3448 | 15 | 2.5079 | -22.0390 | -22.1303 | 0.5348 | 0.0913 | -221.3027 | -220.3899 | -0.5581 | -0.5829 | 0.9364 |
3.0132 | 0.3678 | 16 | 2.4829 | -21.6296 | -21.7212 | 0.5348 | 0.0917 | -217.2125 | -216.2958 | -0.5988 | -0.6233 | 0.9076 |
3.0132 | 0.3908 | 17 | 2.4569 | -21.2310 | -21.3319 | 0.5370 | 0.1010 | -213.3194 | -212.3095 | -0.6344 | -0.6586 | 0.8787 |
3.0132 | 0.4138 | 18 | 2.4349 | -20.8876 | -20.9881 | 0.5304 | 0.1005 | -209.8813 | -208.8761 | -0.6694 | -0.6929 | 0.8582 |
3.0132 | 0.4368 | 19 | 2.4174 | -20.5639 | -20.6694 | 0.5304 | 0.1055 | -206.6935 | -205.6389 | -0.7053 | -0.7288 | 0.8530 |
2.6371 | 0.4598 | 20 | 2.3954 | -20.2042 | -20.3148 | 0.5326 | 0.1105 | -203.1477 | -202.0424 | -0.7411 | -0.7640 | 0.8311 |
2.6371 | 0.4828 | 21 | 2.3761 | -19.8765 | -19.9851 | 0.5239 | 0.1086 | -199.8509 | -198.7649 | -0.7733 | -0.7956 | 0.8173 |
2.6371 | 0.5057 | 22 | 2.3550 | -19.4793 | -19.5897 | 0.5217 | 0.1104 | -195.8972 | -194.7931 | -0.7947 | -0.8166 | 0.7966 |
2.6371 | 0.5287 | 23 | 2.3190 | -19.0685 | -19.1814 | 0.5239 | 0.1129 | -191.8135 | -190.6849 | -0.8095 | -0.8311 | 0.7629 |
2.6371 | 0.5517 | 24 | 2.2593 | -18.7171 | -18.8248 | 0.5283 | 0.1077 | -188.2483 | -187.1710 | -0.8197 | -0.8403 | 0.7104 |
2.6371 | 0.5747 | 25 | 2.1824 | -18.4482 | -18.5591 | 0.5326 | 0.1109 | -185.5911 | -184.4822 | -0.8234 | -0.8436 | 0.6203 |
2.6371 | 0.5977 | 26 | 2.1139 | -18.2639 | -18.3699 | 0.5326 | 0.1059 | -183.6985 | -182.6391 | -0.8213 | -0.8404 | 0.5467 |
2.6371 | 0.6207 | 27 | 2.0862 | -18.1268 | -18.2372 | 0.5326 | 0.1104 | -182.3718 | -181.2681 | -0.8158 | -0.8341 | 0.5235 |
2.6371 | 0.6437 | 28 | 2.0741 | -18.0305 | -18.1407 | 0.5283 | 0.1103 | -181.4072 | -180.3046 | -0.8051 | -0.8225 | 0.5133 |
2.6371 | 0.6667 | 29 | 2.0690 | -17.9415 | -18.0517 | 0.5304 | 0.1101 | -180.5167 | -179.4155 | -0.7987 | -0.8158 | 0.5092 |
2.3737 | 0.6897 | 30 | 2.0669 | -17.8450 | -17.9553 | 0.5304 | 0.1103 | -179.5531 | -178.4496 | -0.7900 | -0.8066 | 0.5082 |
2.3737 | 0.7126 | 31 | 2.0595 | -17.7753 | -17.8928 | 0.5370 | 0.1175 | -178.9280 | -177.7533 | -0.7924 | -0.8090 | 0.5009 |
2.3737 | 0.7356 | 32 | 2.0559 | -17.6972 | -17.8134 | 0.5326 | 0.1162 | -178.1344 | -176.9719 | -0.7821 | -0.7989 | 0.5023 |
2.3737 | 0.7586 | 33 | 2.0530 | -17.6212 | -17.7447 | 0.5283 | 0.1235 | -177.4470 | -176.2120 | -0.7772 | -0.7941 | 0.4995 |
2.3737 | 0.7816 | 34 | 2.0495 | -17.5594 | -17.6781 | 0.5239 | 0.1187 | -176.7806 | -175.5940 | -0.7770 | -0.7941 | 0.4961 |
2.3737 | 0.8046 | 35 | 2.0463 | -17.5069 | -17.6289 | 0.5239 | 0.1220 | -176.2891 | -175.0691 | -0.7765 | -0.7938 | 0.4933 |
2.3737 | 0.8276 | 36 | 2.0454 | -17.4648 | -17.5832 | 0.5283 | 0.1184 | -175.8317 | -174.6475 | -0.7759 | -0.7937 | 0.4930 |
2.3737 | 0.8506 | 37 | 2.0385 | -17.4124 | -17.5404 | 0.5239 | 0.1280 | -175.4043 | -174.1244 | -0.7766 | -0.7948 | 0.4914 |
2.3737 | 0.8736 | 38 | 2.0369 | -17.3727 | -17.4968 | 0.5174 | 0.1241 | -174.9679 | -173.7269 | -0.7789 | -0.7972 | 0.4935 |
2.3737 | 0.8966 | 39 | 2.0370 | -17.3371 | -17.4632 | 0.5239 | 0.1262 | -174.6325 | -173.3709 | -0.7812 | -0.7995 | 0.4908 |
2.078 | 0.9195 | 40 | 2.0331 | -17.3114 | -17.4457 | 0.5261 | 0.1343 | -174.4572 | -173.1142 | -0.7830 | -0.8020 | 0.4896 |
2.078 | 0.9425 | 41 | 2.0353 | -17.2892 | -17.4183 | 0.5283 | 0.1291 | -174.1829 | -172.8922 | -0.7830 | -0.8019 | 0.4943 |
2.078 | 0.9655 | 42 | 2.0323 | -17.2779 | -17.4112 | 0.5348 | 0.1333 | -174.1118 | -172.7786 | -0.7816 | -0.8008 | 0.4935 |
2.078 | 0.9885 | 43 | 2.0330 | -17.2716 | -17.4010 | 0.5283 | 0.1295 | -174.0103 | -172.7156 | -0.7823 | -0.8013 | 0.4931 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.3.1
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference API (serverless) is not available, repository is disabled.
Model tree for jbjeong91/llama3.1-cpo_j-full-0919
Base model
meta-llama/Meta-Llama-3.1-8B
Finetuned
meta-llama/Meta-Llama-3.1-8B-Instruct
Finetuned
this model