File size: 3,820 Bytes
8bfd16b abf0a66 b61b9bf 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 3bb3796 fb6eea0 3bb3796 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b abf0a66 8bfd16b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
tags:
- text-generation
license: cc-by-nc-sa-4.0
language:
- ko
base_model: yanolja/KoSOLAR-10.7B-v0.1
pipeline_tag: text-generation
datasets:
- beomi/KoAlpaca-v1.1a
- Edentns/Worktronics-FAQ
---
# **DataVortexS-10.7B-v0.2**
<img src="./DataVortex.png" alt="DataVortex" style="height: 8em;">
## **Model Details**
### **Base Model**
[yanolja/KoSOLAR-10.7B-v0.1](https://huggingface.co/yanolja/KoSOLAR-10.7B-v0.1)
### **Trained On**
- **OS**: Ubuntu 20.04
- **GPU**: H100 80GB 1ea
- **transformers**: v4.36.2
### **Dataset**
- [beomi/KoAlpaca-v1.1a](https://huggingface.co/datasets/beomi/KoAlpaca-v1.1a)
- Edentns/Worktronics-FAQ - private
### **Instruction format**
It follows **Alpaca** format.
E.g.
```python
text = """\
λΉμ μ μ¬λλ€μ΄ μ 보λ₯Ό μ°Ύμ μ μλλ‘ λμμ£Όλ μΈκ³΅μ§λ₯ λΉμμ
λλ€.
### Instruction:
λνλ―Όκ΅μ μλλ μ΄λμΌ?
### Response:
λνλ―Όκ΅μ μλλ μμΈμ
λλ€.
### Instruction:
μμΈ μΈκ΅¬λ μ΄ λͺ λͺ
μ΄μΌ?
"""
```
## **Model Benchmark**
### **[Ko-LLM-Leaderboard](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard)**
| Model | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
| ---------------------------- | -------- | --------- | ------------ | --------- | ------------- | --------------- |
| DataVortexM-7B-Instruct-v0.1 | 39.81 | 34.13 | 42.35 | 38.73 | 45.46 | 38.37 |
| DataVortexS-10.7B-v0.1 | 0 | 0 | 0 | 0 | 0 | 0 |
| **DataVortexS-10.7B-v0.2** | **43.6** | **38.74** | **50.74** | **38.98** | **44.7** | **44.86** |
| DataVortexS-10.7B-v0.3 | 0 | 0 | 0 | 0 | 0 | 0 |
| DataVortexS-10.7B-v0.4 | 0 | 0 | 0 | 0 | 0 | 0 |
| DataVortexS-10.7B-v1.0 | 0 | 0 | 0 | 0 | 0 | 0 |
| DataVortexTL-1.1B-v0.1 | 0 | 0 | 0 | 0 | 0 | 0 |
| DataVortexS-10.7B-dpo-v0.1 | 0 | 0 | 0 | 0 | 0 | 0 |
## **Implementation Code**
This model contains the chat_template instruction format.
You can use the code below.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("Edentns/DataVortexS-10.7B-v0.2")
tokenizer = AutoTokenizer.from_pretrained("Edentns/DataVortexS-10.7B-v0.2")
messages = [
{"role": "system", "content": "λΉμ μ μ¬λλ€μ΄ μ 보λ₯Ό μ°Ύμ μ μλλ‘ λμμ£Όλ μΈκ³΅μ§λ₯ λΉμμ
λλ€."},
{"role": "user", "content": "λνλ―Όκ΅μ μλλ μ΄λμΌ?"},
{"role": "assistant", "content": "λνλ―Όκ΅μ μλλ μμΈμ
λλ€."},
{"role": "user", "content": "μμΈ μΈκ΅¬λ μ΄ λͺ λͺ
μ΄μΌ?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
## **License**
The model is licensed under the [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license, which allows others to copy, modify, and share the work non-commercially, as long as they give appropriate credit and distribute any derivative works under the same license.
<div align="center">
<a href="https://edentns.com/">
<img src="./Logo.png" alt="Logo" style="height: 3em;">
</a>
</div>
|