anonymous-18234781/anonymous_bbnlp_2024_submission

Gemma Scope [review version for BlackboxNLP 2024]

Gemma Scope is a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

Check out our Google Colab notebook tutorial for how to use Gemma Scope.

Each Sparse Autoencoder learns many latent representations that seem to correspond to human understandable concepts. Here we showcase a latent representation that corresponds to sentences about time and space travel.