metadata

library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
  - alignment-handbook
  - trl
  - cpo
  - generated_from_trainer
  - trl
  - cpo
  - generated_from_trainer
datasets:
  - princeton-nlp/llama3-ultrafeedback
model-index:
  - name: llama3.1-cpo-full-0919
    results: []

llama3.1-cpo-full-0919

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

Loss: 2.0545
Rewards/chosen: -18.3931
Rewards/rejected: -18.5452
Rewards/accuracies: 0.5261
Rewards/margins: 0.1521
Logps/rejected: -185.4521
Logps/chosen: -183.9312
Logits/rejected: -0.7551
Logits/chosen: -0.7797
Nll Loss: 0.5180

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss
No log	0.0230	1	2.5065	-26.5532	-26.5849	0.5174	0.0317	-265.8489	-265.5324	-0.2622	-0.2859	0.7676
No log	0.0460	2	2.5060	-26.5180	-26.5504	0.5217	0.0324	-265.5038	-265.1801	-0.2634	-0.2869	0.7666
No log	0.0690	3	2.5051	-26.5009	-26.5278	0.5217	0.0269	-265.2777	-265.0088	-0.2657	-0.2893	0.7661
No log	0.0920	4	2.4982	-26.3952	-26.4315	0.5239	0.0363	-264.3147	-263.9518	-0.2690	-0.2926	0.7632
No log	0.1149	5	2.4894	-26.2651	-26.3011	0.5217	0.0360	-263.0112	-262.6512	-0.2750	-0.2985	0.7594
No log	0.1379	6	2.4689	-25.9450	-25.9855	0.5283	0.0405	-259.8551	-259.4500	-0.2858	-0.3086	0.7502
No log	0.1609	7	2.4511	-25.7084	-25.7527	0.5283	0.0443	-257.5271	-257.0843	-0.2972	-0.3202	0.7433
No log	0.1839	8	2.4180	-25.2215	-25.2724	0.5326	0.0510	-252.7242	-252.2147	-0.3254	-0.3486	0.7291
No log	0.2069	9	2.3952	-24.8845	-24.9393	0.5283	0.0548	-249.3929	-248.8451	-0.3463	-0.3701	0.7192
2.6865	0.2299	10	2.3761	-24.6215	-24.6782	0.5348	0.0567	-246.7821	-246.2148	-0.3604	-0.3845	0.7115
2.6865	0.2529	11	2.3609	-24.4027	-24.4705	0.5391	0.0678	-244.7050	-244.0270	-0.3731	-0.3976	0.7051
2.6865	0.2759	12	2.3367	-24.0560	-24.1306	0.5348	0.0746	-241.3063	-240.5604	-0.3970	-0.4218	0.6951
2.6865	0.2989	13	2.3109	-23.6786	-23.7645	0.5304	0.0860	-237.6454	-236.7858	-0.4179	-0.4434	0.6840
2.6865	0.3218	14	2.2906	-23.3175	-23.4031	0.5348	0.0856	-234.0311	-233.1748	-0.4423	-0.4679	0.6733
2.6865	0.3448	15	2.2729	-22.9946	-23.0933	0.5348	0.0988	-230.9332	-229.9456	-0.4660	-0.4917	0.6637
2.6865	0.3678	16	2.2576	-22.7067	-22.8056	0.5370	0.0990	-228.0565	-227.0665	-0.4886	-0.5142	0.6549
2.6865	0.3908	17	2.2411	-22.4130	-22.5166	0.5283	0.1036	-225.1658	-224.1296	-0.5152	-0.5408	0.6460
2.6865	0.4138	18	2.2300	-22.1594	-22.2652	0.5261	0.1058	-222.6522	-221.5937	-0.5400	-0.5656	0.6382
2.6865	0.4368	19	2.2170	-21.9205	-22.0355	0.5304	0.1150	-220.3547	-219.2051	-0.5657	-0.5915	0.6308
2.3904	0.4598	20	2.2065	-21.7054	-21.8209	0.5283	0.1156	-218.2092	-217.0537	-0.5920	-0.6175	0.6241
2.3904	0.4828	21	2.1932	-21.4871	-21.6107	0.5261	0.1236	-216.1072	-214.8710	-0.6189	-0.6441	0.6172
2.3904	0.5057	22	2.1839	-21.2899	-21.4129	0.5196	0.1230	-214.1287	-212.8987	-0.6445	-0.6694	0.6109
2.3904	0.5287	23	2.1746	-21.0873	-21.2117	0.5261	0.1244	-212.1172	-210.8729	-0.6688	-0.6940	0.6045
2.3904	0.5517	24	2.1656	-20.9136	-21.0398	0.5239	0.1262	-210.3979	-209.1364	-0.6938	-0.7184	0.5989
2.3904	0.5747	25	2.1555	-20.7191	-20.8481	0.5283	0.1290	-208.4814	-207.1911	-0.7120	-0.7365	0.5926
2.3904	0.5977	26	2.1466	-20.5485	-20.6790	0.5283	0.1305	-206.7897	-205.4852	-0.7301	-0.7545	0.5872
2.3904	0.6207	27	2.1392	-20.3722	-20.5040	0.5370	0.1318	-205.0401	-203.7218	-0.7476	-0.7720	0.5816
2.3904	0.6437	28	2.1308	-20.1853	-20.3216	0.5326	0.1363	-203.2164	-201.8533	-0.7575	-0.7818	0.5756
2.3904	0.6667	29	2.1229	-19.9946	-20.1315	0.5283	0.1370	-201.3155	-199.9459	-0.7683	-0.7925	0.5695
2.3172	0.6897	30	2.1134	-19.7893	-19.9304	0.5261	0.1411	-199.3041	-197.8930	-0.7735	-0.7976	0.5630
2.3172	0.7126	31	2.1055	-19.5960	-19.7401	0.5283	0.1441	-197.4013	-195.9599	-0.7735	-0.7977	0.5569
2.3172	0.7356	32	2.0985	-19.4016	-19.5462	0.5217	0.1445	-195.4615	-194.0163	-0.7817	-0.8060	0.5508
2.3172	0.7586	33	2.0904	-19.2117	-19.3617	0.5239	0.1501	-193.6172	-192.1166	-0.7785	-0.8030	0.5447
2.3172	0.7816	34	2.0850	-19.0381	-19.1813	0.5239	0.1432	-191.8132	-190.3807	-0.7758	-0.8003	0.5392
2.3172	0.8046	35	2.0793	-18.8988	-19.0437	0.5174	0.1449	-190.4374	-188.9884	-0.7715	-0.7964	0.5346
2.3172	0.8276	36	2.0720	-18.7545	-18.8980	0.5196	0.1435	-188.9801	-187.5452	-0.7701	-0.7952	0.5299
2.3172	0.8506	37	2.0663	-18.6567	-18.8053	0.5261	0.1486	-188.0533	-186.5672	-0.7679	-0.7927	0.5266
2.3172	0.8736	38	2.0643	-18.5627	-18.7139	0.5239	0.1512	-187.1391	-185.6268	-0.7631	-0.7882	0.5235
2.3172	0.8966	39	2.0601	-18.5100	-18.6606	0.5283	0.1507	-186.6065	-185.0997	-0.7609	-0.7857	0.5217
2.1039	0.9195	40	2.0598	-18.4610	-18.6128	0.5283	0.1518	-186.1283	-184.6099	-0.7611	-0.7860	0.5201
2.1039	0.9425	41	2.0539	-18.4232	-18.5801	0.5261	0.1568	-185.8007	-184.2324	-0.7540	-0.7789	0.5190
2.1039	0.9655	42	2.0544	-18.3969	-18.5526	0.5283	0.1557	-185.5258	-183.9690	-0.7525	-0.7777	0.5181
2.1039	0.9885	43	2.0545	-18.3931	-18.5452	0.5261	0.1521	-185.4521	-183.9312	-0.7551	-0.7797	0.5180

Framework versions

Transformers 4.44.2
Pytorch 2.3.1
Datasets 2.21.0
Tokenizers 0.19.1