how many GPU memory do I need to finetune largeV3

#150
by lanejohn - opened

I am trying to finetune largeV3 using transformers code: torchrun --nproc_per_node 3 whisper_transformer_001.py --model_name_or_path="/home/lane/ai/models/openai/whisper/largeV3" --dataset_name="mozilla-foundation/common_voice_2_0" --dataset_config_name="zh-CN" --language="Chinese" --task="transcribe" --train_split_name="train+validation" --eval_split_name="test" --max_steps="400" --output_dir="/home/lane/ai/models/openai/whisper/largeV3-Chinese" --per_device_train_batch_size="1" --per_device_eval_batch_size="1" --logging_steps="5" --learning_rate="1e-5" --warmup_steps="40" --eval_strategy="steps" --eval_steps="100" --save_strategy="steps" --save_steps="100" --generation_max_length="95" --preprocessing_num_workers="8" --max_duration_in_seconds="30" --text_column_name="sentence" --freeze_feature_encoder="False" --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate.

I have 3 GPU with 22G memory for each. but each time I encountered CUDA OUT OF MEMORY.
dose any one have finetuned largeV3? how do you do that, with how much GPU memory? thanks.

Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB. per_device_train_batch_size=2.

Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB. per_device_train_batch_size=2.

thanks a lot for the sharing

Sign up or log in to comment