DistAya Community

community

AI & ML interests

Knowledge Distillation, Pruning, Quantization, KV Cache Compression, Latency, Inference Speed

Multilingual language models have many deployment challenges. Deployment Challenges

Can we engineer multilingual language models that not only match the prowess of their bulkier counterparts but do so while being more compact, quicker on their feet, and capable of handling massive data batches in real-time production environments. Is this a feat we can achieve? MemoryVariations through time

Techniques:

  • Pruning

  • Knowledge Distillation

    • Hidden State-Based Distillation ~ DistillKit | GitHub
    • Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
    • On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
    • Minitron: Compact Language models via Pruning & Knowledge Distillation
    • DistiLLM: Towards Streamlined Distillation for Large Language Models
  • Quantization

    • Quantization Aware Training (QAT)
    • Post Training Quantization (PTQ)
      • KV Cache Quantization
      • Weight & Activation Quantization
  • Low-Rank Factorization

  • Fine-Tuning | GitHub

Techniques

Datasets:

Initial 7 datasets unified, having 6.62M rows which includes the following:

  • Bangla_Alpaca_Orca : Bangle
  • Urdu_Instruct_News_Article_Generation: Urdu
  • Urdu_Instruct_News_Headline_Generation: Urdu
  • Urdu_Instruct_News_Category_Classification: Urdu
  • cidar: Arabic
  • Six_Millions_Instruction_Dataset_For_Arabic_Llm_Ft: Arabic
  • instructv3: English

Get in touch with the team:

models

None public yet

datasets

None public yet