Michael Goin

mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Organizations

mgoin's activity

New activity in meta-llama/Meta-Llama-3.1-405B-Instruct about 1 month ago

8-kv-heads

4
#17 opened about 2 months ago by ArthurZ
New activity in meta-llama/Meta-Llama-3.1-405B about 1 month ago

8-kv-heads

3
#21 opened about 2 months ago by ArthurZ

run with vllm

8
#4 opened about 2 months ago by kuliev-vitaly
New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 about 1 month ago

Not able to run Model using VLLM

1
#3 opened about 1 month ago by Pchaudhary
New activity in neuralmagic/gemma-2-9b-it-FP8 about 1 month ago

getting issue while loading in llm

1
#1 opened about 1 month ago by Abhinav6310
New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 about 2 months ago

How to fast inference with FP8

1
#2 opened about 2 months ago by CCRss
New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 about 2 months ago

OSError, is the config correct?

2
#1 opened about 2 months ago by jackinthebox52
New activity in mgoin/Nemotron-4-340B-Instruct-hf-FP8 about 2 months ago

Thanks your great work!

2
#1 opened about 2 months ago by bay-llm
New activity in neuralmagic/Mistral-7B-Instruct-v0.3-FP8 about 2 months ago
New activity in nvidia/Minitron-4B-Base about 2 months ago

Where is Minitron-4B-Instruct?

1
#2 opened about 2 months ago by mgoin
New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 about 2 months ago

Are these models limited to H100s?

7
#2 opened about 2 months ago by RonanMcGovern
New activity in nvidia/Minitron-8B-Base about 2 months ago

Replace kv_channels with head_dim

#1 opened about 2 months ago by mgoin
New activity in neuralmagic/Mistral-Nemo-Instruct-2407-FP8 2 months ago

Error serving model

3
#2 opened 2 months ago by EvGUT
New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV 3 months ago

How to load this model?

1
#1 opened 3 months ago by Frz614

Update README.md

#1 opened 3 months ago by alexmarques
New activity in neuralmagic/SparseLlama-3-8B-pruned_50.2of4 3 months ago

Update README.md

#1 opened 3 months ago by alexmarques
New activity in neuralmagic/Qwen2-72B-Instruct-FP8 3 months ago

Update README.md

#1 opened 3 months ago by abhinavnmagic
New activity in neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 3 months ago

Update README.md

#1 opened 3 months ago by abhinavnmagic
New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8 3 months ago

Update README.md

#2 opened 3 months ago by abhinavnmagic
New activity in neuralmagic/Meta-Llama-3-70B-Instruct-FP8 3 months ago

Create README.md

#1 opened 3 months ago by abhinavnmagic
New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8 3 months ago

Fails to run with nm-vllm

1
#1 opened 4 months ago by clintonruairi
New activity in mgoin/ultrachat_2k 4 months ago
New activity in mgoin/Meta-Llama-3-70B-Instruct-Marlin 5 months ago

What is Marlin?

2
#1 opened 5 months ago by Samvanity
New activity in mgoin/Meta-Llama-3-8B-Instruct-Marlin 5 months ago

Inference Issues

7
#1 opened 5 months ago by qeternity
New activity in neuralmagic/Llama-2-7b-evolcodealpaca 6 months ago

Update README.md

#1 opened 6 months ago by abhinavnmagic

Update README.md

#1 opened 6 months ago by abhinavnmagic

Update README.md

#1 opened 6 months ago by abhinavnmagic

Update README.md

#1 opened 6 months ago by alexmarques

Update README.md

#1 opened 6 months ago by alexmarques
New activity in neuralmagic/Llama-2-7b-pruned70-retrained 6 months ago

Update README.md

#1 opened 6 months ago by alexmarques
New activity in neuralmagic/Llama-2-7b-pruned50-retrained 6 months ago

Update README.md

#1 opened 6 months ago by alexmarques
New activity in neuralmagic/mpt-7b-gsm8k-pruned60-pt 10 months ago
New activity in reciprocate/llama2-7b-gsm8k 12 months ago

Create README.md

#2 opened 12 months ago by mgoin