The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published 23 days ago • 36 • 4
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2 • 14 • 2
Planning In Natural Language Improves LLM Search For Code Generation Paper • 2409.03733 • Published 14 days ago • 1
FocusLLM: Scaling LLM's Context by Parallel Decoding Paper • 2408.11745 • Published 29 days ago • 23 • 3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Paper • 2408.12570 • Published 28 days ago • 29 • 3
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published 29 days ago • 53 • 4
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16 • 3 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 45 • 10
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Paper • 2405.20314 • Published May 30 • 1
Contextual Position Encoding: Learning to Count What's Important Paper • 2405.18719 • Published May 29 • 5 • 1