MemGPT-DPO-MoE-test / README.md
starsnatched's picture
Update README.md
be8ba2c verified
|
raw
history blame
No virus
1.79 kB
---
library_name: transformers
license: apache-2.0
language:
- en
---
This is a test release of DPO version of [MemGPT](https://github.com/cpacker/MemGPT) Language Model.
# Model Description
This repository contains a MoE (Mixture of Experts) model of [Mistral 7B Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). It has 2 experts per token. This model is specifically designed for function calling in MemGPT. It demonstrates comparable performances to GPT-4 when it comes to working with MemGPT.
# Key Features
* Function calling
* Dedicated to working with MemGPT
* Supports medium-length context, up to sequences of 8,192
# Prompt Format
This model uses **ChatML** prompt format:
```
<|im_start|>system
{system_instruction}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_response}<|im_end|>
```
# Usage
This model is designed to be ran on multiple backends, such as [oogabooga's textgen WebUI](https://github.com/oobabooga/text-generation-webui).
Simply install your preferred backend, and then load up this model.
Then, configure MemGPT using `memgpt configure`, and chat with MemGPT via `memgpt run` command!
# Model Details
* Developed by: @starsnatched
* Model type: This repo contains a language model based on the transformer decoder architecture.
* Language: English
* Contact: For any questions, concerns or comments about this model, please contact me at Discord, @starsnatched.
# Training Infrastructure
* Hardware: The model in this repo was trained on 2x A100 80GB GPUs.
# Intended Use
The model is designed to be used as the base model for MemGPT agents.
# Limitations and Risks
The model may exhibit unreliable, unsafe, or biased behaviours. Please double check the results this model may produce.