Qwen2-7B-Instruct-SPPO-Function-call-v2.5

This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v3, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:

Loss: 0.3208
Rewards/chosen: 1.7980
Rewards/rejected: -0.0440
Rewards/accuracies: 0.8853
Rewards/margins: 1.8420
Logps/rejected: -275.4126
Logps/chosen: -225.6960
Logits/rejected: -0.7099
Logits/chosen: -0.6648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6553	0.1048	100	0.6206	0.2426	0.0793	0.7735	0.1633	-272.9460	-256.8048	-0.7286	-0.6915
0.4736	0.2095	200	0.4579	1.2344	0.4185	0.8353	0.8160	-266.1621	-236.9672	-0.6975	-0.6532
0.4158	0.3143	300	0.4030	1.6264	0.4492	0.8500	1.1771	-265.5471	-229.1290	-0.7183	-0.6811
0.3913	0.4191	400	0.3698	1.7637	0.3444	0.8559	1.4194	-267.6444	-226.3811	-0.7164	-0.6677
0.3117	0.5238	500	0.3486	1.7529	0.1705	0.8706	1.5824	-271.1227	-226.5988	-0.7171	-0.6770
0.3219	0.6286	600	0.3346	1.7488	0.0498	0.8765	1.6990	-273.5360	-226.6806	-0.7125	-0.6709
0.2924	0.7334	700	0.3259	1.7948	0.0020	0.8824	1.7929	-274.4924	-225.7591	-0.7103	-0.6733
0.3287	0.8381	800	0.3221	1.7998	-0.0221	0.8735	1.8218	-274.9728	-225.6601	-0.7049	-0.6610
0.3149	0.9429	900	0.3215	1.7999	-0.0363	0.8824	1.8362	-275.2581	-225.6584	-0.7051	-0.6616

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

khongtrunght
/

Qwen2-7B-Instruct-SPPO-Function-call-v2.5

Qwen2-7B-Instruct-SPPO-Function-call-v2.5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.5

Dataset used to train khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.5

Evaluation results