--- library_name: transformers pipeline_tag: image-text-to-text license: apache-2.0 datasets: - joshuachou/SkinCAP - HemanthKumarK/SKINgpt language: - en tags: - biology - skin - skin disease - cancer - medical --- # Model Card for PaliGemma Dermatology Model ## Model Details ### Model Description This model, based on the PaliGemma-3B architecture, has been fine-tuned for dermatology-related image and text processing tasks. The model is designed to assist in the identification of various skin conditions using a combination of image analysis and natural language processing. - **Developed by:** Bruce_Wayne - **Model type:** vision model - **Finetuned from model:** https://huggingface.co/google/paligemma-3b-pt-224 - **LoRa Adaptors used:** Yes - **Intended use:** Medical image analysis, specifically for dermatology ** ### please let me know how the model works -->https://forms.gle/cBA6apSevTyiEbp46 ### Thank you ## Uses ### Direct Use The model can be directly used for analyzing dermatology images, providing insights into potential skin conditions. ## Bias, Risks, and Limitations **Skin Tone Bias:** The model may have been trained on a dataset that does not adequately represent all skin tones, potentially leading to biased results. **Geographic Bias:** The model's performance may vary depending on the prevalence of certain conditions in different geographic regions. ## How to Get Started with the Model ```python import torch from transformers import AutoProcessor, PaliGemmaForConditionalGeneration from PIL import Image # Load the model and processor model_id = "brucewayne0459/paligemma_derm" processor = AutoProcessor.from_pretrained(model_id) model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"": 0}) model.eval() # Load a sample image and text input input_text = "Identify the skin condition?" input_image_path = " Replace with your actual image path" input_image = Image.open(input_image_path).convert("RGB") # Process the input inputs = processor(text=input_text, images=input_image, return_tensors="pt", padding="longest").to("cuda" if torch.cuda.is_available() else "cpu") # Set the maximum length for generation max_new_tokens = 50 # Run inference with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=max_new_tokens) # Decode the output decoded_output = processor.decode(outputs[0], skip_special_tokens=True) print("Model Output:", decoded_output) ``` ## Training Details ### Training Data The model was fine-tuned on a dataset of dermatological images combined with disease names ### Training Procedure The model was fine-tuned using LoRA (Low-Rank Adaptation) for more efficient training. Mixed precision (bfloat16) was used to speed up training and reduce memory usage. #### Training Hyperparameters - **Training regime:** Mixed precision (bfloat16) - **Epochs:** 10 - **Learning rate:** 2e-5 - **Batch size:** 6 - **Gradient accumulation steps:** 4 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on a separate validation set of dermatological images and Disease Names, distinct from the training data. #### Metrics - **Validation Loss:** The loss was tracked throughout the training process to evaluate model performance. - **Accuracy:** The primary metric for assessing model predictions. ### Results The model achieved a final validation loss of approximately 0.2214, indicating reasonable performance in predicting skin conditions based on the dataset used. #### Summary ## Environmental Impact - **Hardware Type:** 1 x L4 GPU - **Hours used:** ~22 HOURS - **Cloud Provider:** LIGHTNING AI - **Compute Region:** USA - **Carbon Emitted:** 0.9 kg eq. CO2 ## Technical Specifications ### Model Architecture and Objective - **Architecture:** Vision-Language model based on PaliGemma-3B - **Objective:** To classify and diagnose dermatological conditions from images and text ### Compute Infrastructure #### Hardware - **GPU:** 1xL4 GPU ## Model Card Authors Bruce_Wayne