BrainGPT
/

BrainGPT-7B-v0.1

text-generation-inference

Model card Files Files and versions Community

ucllovelab commited on Mar 3

Commit

0913e59

•

1 Parent(s): d3280eb

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+license: apache-2.0
+datasets:
+- BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset
+tags:
+- text-generation-inference
+- peft
+---
+## Training details:
+We fine-tuned Llama2-7b-chat using LoRA. We used a batch size of 1 and a chunk size of 2048. Training involved the use of the AdamW optimizer with a learning rate of 2e-5 and gradient accumulation steps set at 8. A single training epoch was performed, along with a warm-up step of 0.03 and a weight decay rate of 0.001. The learning rate was controlled using a cosine learning rate scheduler. LoRA adapters, characterized by a rank of 8, an alpha value of 32, and a dropout rate of 0.1, were applied after all self-attention blocks and fully-connected layers. This results in total 17,891,328 trainable parameters, roughly 0.26% of the entire parameters of the base model. To optimize training performance, bf16 mixed precision training and data parallelism were employed. We used 4 Nvidia A100 (80GB) GPUs hosted on the Microsoft Azure platform. An epoch of training takes roughly 42 GPU hours.
+## Training data:
+Please refer to Dataset card: https://huggingface.co/datasets/BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset