--- library_name: transformers license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct tags: - alignment-handbook - trl - cpo - generated_from_trainer - trl - cpo - generated_from_trainer datasets: - princeton-nlp/llama3-ultrafeedback model-index: - name: llama3.1-cpo-full-0919 results: [] --- # llama3.1-cpo-full-0919 This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set: - Loss: 2.0545 - Rewards/chosen: -18.3931 - Rewards/rejected: -18.5452 - Rewards/accuracies: 0.5261 - Rewards/margins: 0.1521 - Logps/rejected: -185.4521 - Logps/chosen: -183.9312 - Logits/rejected: -0.7551 - Logits/chosen: -0.7797 - Nll Loss: 0.5180 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 8 - total_train_batch_size: 128 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:| | No log | 0.0230 | 1 | 2.5065 | -26.5532 | -26.5849 | 0.5174 | 0.0317 | -265.8489 | -265.5324 | -0.2622 | -0.2859 | 0.7676 | | No log | 0.0460 | 2 | 2.5060 | -26.5180 | -26.5504 | 0.5217 | 0.0324 | -265.5038 | -265.1801 | -0.2634 | -0.2869 | 0.7666 | | No log | 0.0690 | 3 | 2.5051 | -26.5009 | -26.5278 | 0.5217 | 0.0269 | -265.2777 | -265.0088 | -0.2657 | -0.2893 | 0.7661 | | No log | 0.0920 | 4 | 2.4982 | -26.3952 | -26.4315 | 0.5239 | 0.0363 | -264.3147 | -263.9518 | -0.2690 | -0.2926 | 0.7632 | | No log | 0.1149 | 5 | 2.4894 | -26.2651 | -26.3011 | 0.5217 | 0.0360 | -263.0112 | -262.6512 | -0.2750 | -0.2985 | 0.7594 | | No log | 0.1379 | 6 | 2.4689 | -25.9450 | -25.9855 | 0.5283 | 0.0405 | -259.8551 | -259.4500 | -0.2858 | -0.3086 | 0.7502 | | No log | 0.1609 | 7 | 2.4511 | -25.7084 | -25.7527 | 0.5283 | 0.0443 | -257.5271 | -257.0843 | -0.2972 | -0.3202 | 0.7433 | | No log | 0.1839 | 8 | 2.4180 | -25.2215 | -25.2724 | 0.5326 | 0.0510 | -252.7242 | -252.2147 | -0.3254 | -0.3486 | 0.7291 | | No log | 0.2069 | 9 | 2.3952 | -24.8845 | -24.9393 | 0.5283 | 0.0548 | -249.3929 | -248.8451 | -0.3463 | -0.3701 | 0.7192 | | 2.6865 | 0.2299 | 10 | 2.3761 | -24.6215 | -24.6782 | 0.5348 | 0.0567 | -246.7821 | -246.2148 | -0.3604 | -0.3845 | 0.7115 | | 2.6865 | 0.2529 | 11 | 2.3609 | -24.4027 | -24.4705 | 0.5391 | 0.0678 | -244.7050 | -244.0270 | -0.3731 | -0.3976 | 0.7051 | | 2.6865 | 0.2759 | 12 | 2.3367 | -24.0560 | -24.1306 | 0.5348 | 0.0746 | -241.3063 | -240.5604 | -0.3970 | -0.4218 | 0.6951 | | 2.6865 | 0.2989 | 13 | 2.3109 | -23.6786 | -23.7645 | 0.5304 | 0.0860 | -237.6454 | -236.7858 | -0.4179 | -0.4434 | 0.6840 | | 2.6865 | 0.3218 | 14 | 2.2906 | -23.3175 | -23.4031 | 0.5348 | 0.0856 | -234.0311 | -233.1748 | -0.4423 | -0.4679 | 0.6733 | | 2.6865 | 0.3448 | 15 | 2.2729 | -22.9946 | -23.0933 | 0.5348 | 0.0988 | -230.9332 | -229.9456 | -0.4660 | -0.4917 | 0.6637 | | 2.6865 | 0.3678 | 16 | 2.2576 | -22.7067 | -22.8056 | 0.5370 | 0.0990 | -228.0565 | -227.0665 | -0.4886 | -0.5142 | 0.6549 | | 2.6865 | 0.3908 | 17 | 2.2411 | -22.4130 | -22.5166 | 0.5283 | 0.1036 | -225.1658 | -224.1296 | -0.5152 | -0.5408 | 0.6460 | | 2.6865 | 0.4138 | 18 | 2.2300 | -22.1594 | -22.2652 | 0.5261 | 0.1058 | -222.6522 | -221.5937 | -0.5400 | -0.5656 | 0.6382 | | 2.6865 | 0.4368 | 19 | 2.2170 | -21.9205 | -22.0355 | 0.5304 | 0.1150 | -220.3547 | -219.2051 | -0.5657 | -0.5915 | 0.6308 | | 2.3904 | 0.4598 | 20 | 2.2065 | -21.7054 | -21.8209 | 0.5283 | 0.1156 | -218.2092 | -217.0537 | -0.5920 | -0.6175 | 0.6241 | | 2.3904 | 0.4828 | 21 | 2.1932 | -21.4871 | -21.6107 | 0.5261 | 0.1236 | -216.1072 | -214.8710 | -0.6189 | -0.6441 | 0.6172 | | 2.3904 | 0.5057 | 22 | 2.1839 | -21.2899 | -21.4129 | 0.5196 | 0.1230 | -214.1287 | -212.8987 | -0.6445 | -0.6694 | 0.6109 | | 2.3904 | 0.5287 | 23 | 2.1746 | -21.0873 | -21.2117 | 0.5261 | 0.1244 | -212.1172 | -210.8729 | -0.6688 | -0.6940 | 0.6045 | | 2.3904 | 0.5517 | 24 | 2.1656 | -20.9136 | -21.0398 | 0.5239 | 0.1262 | -210.3979 | -209.1364 | -0.6938 | -0.7184 | 0.5989 | | 2.3904 | 0.5747 | 25 | 2.1555 | -20.7191 | -20.8481 | 0.5283 | 0.1290 | -208.4814 | -207.1911 | -0.7120 | -0.7365 | 0.5926 | | 2.3904 | 0.5977 | 26 | 2.1466 | -20.5485 | -20.6790 | 0.5283 | 0.1305 | -206.7897 | -205.4852 | -0.7301 | -0.7545 | 0.5872 | | 2.3904 | 0.6207 | 27 | 2.1392 | -20.3722 | -20.5040 | 0.5370 | 0.1318 | -205.0401 | -203.7218 | -0.7476 | -0.7720 | 0.5816 | | 2.3904 | 0.6437 | 28 | 2.1308 | -20.1853 | -20.3216 | 0.5326 | 0.1363 | -203.2164 | -201.8533 | -0.7575 | -0.7818 | 0.5756 | | 2.3904 | 0.6667 | 29 | 2.1229 | -19.9946 | -20.1315 | 0.5283 | 0.1370 | -201.3155 | -199.9459 | -0.7683 | -0.7925 | 0.5695 | | 2.3172 | 0.6897 | 30 | 2.1134 | -19.7893 | -19.9304 | 0.5261 | 0.1411 | -199.3041 | -197.8930 | -0.7735 | -0.7976 | 0.5630 | | 2.3172 | 0.7126 | 31 | 2.1055 | -19.5960 | -19.7401 | 0.5283 | 0.1441 | -197.4013 | -195.9599 | -0.7735 | -0.7977 | 0.5569 | | 2.3172 | 0.7356 | 32 | 2.0985 | -19.4016 | -19.5462 | 0.5217 | 0.1445 | -195.4615 | -194.0163 | -0.7817 | -0.8060 | 0.5508 | | 2.3172 | 0.7586 | 33 | 2.0904 | -19.2117 | -19.3617 | 0.5239 | 0.1501 | -193.6172 | -192.1166 | -0.7785 | -0.8030 | 0.5447 | | 2.3172 | 0.7816 | 34 | 2.0850 | -19.0381 | -19.1813 | 0.5239 | 0.1432 | -191.8132 | -190.3807 | -0.7758 | -0.8003 | 0.5392 | | 2.3172 | 0.8046 | 35 | 2.0793 | -18.8988 | -19.0437 | 0.5174 | 0.1449 | -190.4374 | -188.9884 | -0.7715 | -0.7964 | 0.5346 | | 2.3172 | 0.8276 | 36 | 2.0720 | -18.7545 | -18.8980 | 0.5196 | 0.1435 | -188.9801 | -187.5452 | -0.7701 | -0.7952 | 0.5299 | | 2.3172 | 0.8506 | 37 | 2.0663 | -18.6567 | -18.8053 | 0.5261 | 0.1486 | -188.0533 | -186.5672 | -0.7679 | -0.7927 | 0.5266 | | 2.3172 | 0.8736 | 38 | 2.0643 | -18.5627 | -18.7139 | 0.5239 | 0.1512 | -187.1391 | -185.6268 | -0.7631 | -0.7882 | 0.5235 | | 2.3172 | 0.8966 | 39 | 2.0601 | -18.5100 | -18.6606 | 0.5283 | 0.1507 | -186.6065 | -185.0997 | -0.7609 | -0.7857 | 0.5217 | | 2.1039 | 0.9195 | 40 | 2.0598 | -18.4610 | -18.6128 | 0.5283 | 0.1518 | -186.1283 | -184.6099 | -0.7611 | -0.7860 | 0.5201 | | 2.1039 | 0.9425 | 41 | 2.0539 | -18.4232 | -18.5801 | 0.5261 | 0.1568 | -185.8007 | -184.2324 | -0.7540 | -0.7789 | 0.5190 | | 2.1039 | 0.9655 | 42 | 2.0544 | -18.3969 | -18.5526 | 0.5283 | 0.1557 | -185.5258 | -183.9690 | -0.7525 | -0.7777 | 0.5181 | | 2.1039 | 0.9885 | 43 | 2.0545 | -18.3931 | -18.5452 | 0.5261 | 0.1521 | -185.4521 | -183.9312 | -0.7551 | -0.7797 | 0.5180 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.3.1 - Datasets 2.21.0 - Tokenizers 0.19.1