---
language: en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- ruslanmv
- llama
- trl
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- ruslanmv/ai-medical-chatbot
---

# Medical-Llama3-8B-16bit: Fine-Tuned Llama3 for Medical Q&A
[![](future.jpg)](https://ruslanmv.com/)
This repository provides a fine-tuned version of the powerful Llama3 8B model, specifically designed to answer medical questions in an informative way. It leverages the rich knowledge contained in the AI Medical Chatbot dataset ([ruslanmv/ai-medical-chatbot](https://huggingface.co/datasets/ruslanmv/ai-medical-chatbot)).

**Model & Development**

- **Developed by:** ruslanmv
- **License:** Apache-2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B

**Key Features**

- **Medical Focus:** Optimized to address health-related inquiries.
- **Knowledge Base:** Trained on a comprehensive medical chatbot dataset.
- **Text Generation:** Generates informative and potentially helpful responses.

**Installation**

This model is accessible through the Hugging Face Transformers library. Install it using pip:

```bash
pip install transformers
```

**Usage Example**

Here's a Python code snippet demonstrating how to interact with the `Medical-Llama3-8B-16bit` model and generate answers to your medical questions:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("ruslanmv/Medical-Llama3-8B")
model = AutoModelForCausalLM.from_pretrained("ruslanmv/Medical-Llama3-8B").to("cuda")  # If using GPU

# Function to format and generate response with prompt engineering
def askme(question):
    medical_prompt = """You are an AI Medical Assistant trained on a vast dataset of health information. Below is a medical question:

    Question: {}

    Please provide an informative and comprehensive answer:

    Answer: """.format(question)

    EOS_TOKEN = tokenizer.eos_token

    def format_prompt(question):
        return medical_prompt + question + EOS_TOKEN

    inputs = tokenizer(format_prompt(question), return_tensors="pt").to("cuda")  # If using GPU
    outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)  # Adjust max_new_tokens for longer responses
    answer = tokenizer.batch_decode(outputs)[0].strip()
    return answer

# Example usage
question = "What should I do to reduce my weight gained due to genetic hypothyroidism?"
print(askme(question))
```

**Important Note**

This model is intended for informational purposes only and should not be used as a substitute for professional medical advice. Always consult with a qualified healthcare provider for any medical concerns.

**License**

This model is distributed under the Apache License 2.0 (see LICENSE file for details).

**Contributing**

We welcome contributions to this repository! If you have improvements or suggestions, feel free to create a pull request.

**Disclaimer**

While we strive to provide informative responses, the accuracy of the model's outputs cannot be guaranteed. It is crucial to consult a doctor or other healthcare professional for definitive medical advice.
```