Model description

LLAMA2-stablebeluga-Q4_0/Q8_0 GGML is a language model trained by Stability AI on top of Meta AI. This model is based on the original LLAMA-2, but with a couple of key changes. It has been converted to F32 before being quantized to 4 bits. These alterations make the model more efficient in terms of memory and computational requirements, without significantly compromising its language understanding and generation capabilities.

Intended uses & limitations

How to use

This model can be used with llama.cpp (or similar) for a variety of natural language understanding and generation tasks. These include, but are not limited to, text completion, text generation, conversation modeling, and semantic similarity estimation.

Limitations and bias

While this model is designed to understand and generate human-like text, it has a few limitations:

It might generate incorrect or nonsensical responses if the input prompt is ambiguous or lacks sufficient context.
It is based on the data it was trained on and therefore might reflect the biases present in those data.
Despite the conversion and quantization, this model might still require substantial computational resources for large-scale tasks.

Training data

LLAMA-2-Q4_0/Q8_0 GGML model was trained on the same data as the original LLAMA-2 by Stability AI (Stable Beluga). For more details, please refer to the Stable Beluga 2's model card.

Evaluations

The performance is similar to that of the original LLAMA2-stablebeluga, with a slight drop due to the quantization process. More specific evaluation results will be added as they become available.