Welcome to the Latent Space Age

Peter Vermaercke
Digital Strategist

As AI companies crack open the "black box" of Large Language Models through interpretability research, we're entering the Latent Space Age. Soon we'll be able to understand and control the hidden features that make AI models work, leading to safer AI systems and revolutionary new tools that let us edit content by tweaking semantic properties—similar to how music producers use mixing boards to fine-tune songs.

How interpretability is the next frontier for LLMs and what the Golden Gate Bridge can teach us about the next wave of GenAI tools.

First, we created Large Language Models (LLMs) and are now trying to make sense of them. Models like OpenAI's GPT-4 or Google's Gemini 2.0 are marvels of engineering. But even the people who created them don't know precisely how or why they work. That's about to change soon: interpretability is the next frontier for LLMs.

In a recent interview, Anthropic's co-founder and CEO, Dario Amodei, said: "I used to be a neuroscientist, where I basically looked inside real brains for a living. And now we're looking inside artificial brains for a living. So we will, over the next few months, have some exciting advances in the area of interpretability — where we're really starting to understand how the models operate." Matt Webb wrote an enticing piece on why this could be an enormous breakthrough.

Interpretability is all about helping us humans understand the internal mechanisms and reasoning processes that lead a model to produce a specific output. It's about making the "black box" more transparent. But in order to do that, we need first to make sense of what goes on inside the black box.

Numbers as a universal language

Large Language Models are complex because they try to store all known information in a tight space. Storing vast amounts of information efficiently is not a new problem. In 1895, the Belgians Paul Otlet and Henri La Fontaine attempted to organise all the knowledge in the world in one directory: the Universal Bibliographic Repertory. They categorised the data using two dimensions: author and theme. Organising by the author was nothing new. However, for the theme, they developed the Universal Decimal Classification (UDC). The UDC replaced traditional keywords with a numerical index, making it language-agnostic and universal. It divides all knowledge into 10 classes, numbered 0 to 9. Works on Arts and entertainment fall into class 7. Each class had subclasses holding related works, making it easy to discover associated publications. LLMs use a similar technique to store information as numbers and place related info close to each other.

The Universal Bibliographic Repertory

The latent space holds many secrets

Instead of hundreds of physical cabinets, LLMs utilise neural networks to store all information about the world. Andrej Karpathy calls it a "a lossy compression of the Internet". As humans, we've historically stored knowledge as text. But that's not very efficient as text takes up a lot of space. And it's sequential and one-dimensional, unlike knowledge. Instead, neural networks transform information into a vector: a set of coordinates defining its position in space, just like the location of a publication in the Universal Bibliographic Repertory, but with thousands of dimensions. These dimensions work together to represent key patterns and relationships in the information, not through arbitrary categories but through meaningful relationships found in the training data.

This multi-dimensional space is called a latent space or embedding space. It's latent because it holds hidden characteristics of the data. These underlying characteristics are called features, and they're not easily observable in the original data. In a typical LLM, we can see, for example, the distance between the vectors for "man" and "woman" to be similar to the distance between "uncle" and "aunt". The model uses a specific direction to encode gender. Knowing this, you can take the vector for "king", subtract the same difference and arrive close to the vector for "queen".

3Blue1Brown – Transformers (how LLMs work) explained visually

Riddle me this

The problem is the latent space holds millions of these hidden features. They're also often difficult to put into words. But now we can fight fire with fire and let models describe themselves. As models get faster, they can make sense of these features at scale and write human-readable labels.

In 2023, OpenAI started exploring how LLMs could automatically describe features using GPT-4. They mapped all existing neurons of GPT-2. They proved we can identify features for smaller models, but doing the same for recent – much larger – foundation models is still very tough. First of all, these methods are compute-intensive. Secondly, because recent models are very complex. In an ideal world, each neuron inside a large language model would correspond to a single concept. But in reality, these models often have individual neurons representing a mixture of unrelated features. This so-called "polysemanticity" makes it hard to interpret each feature correctly. Describing features at scale is still an open problem.

Semantic editing

Once we understand which features are activated by a certain piece of content, we can also try to tweak those features. This is what Linus Lee described as semantic editing: directly edit the semantic layers of a text. It could help you realise what makes your style of writing so specific, and let you finetune these aspects.

Let's imagine a text editor called LaText. Instead of choosing fonts or colours, you'd pick relevant features and control them to get to your desired output. Think of a formality slider that lets you gradually change the level of formality. Or a slider that allows you to add wit. The possibilities are endless. Hovering over a feature would show which parts of your text. Just like this text editor, we'll see loads of AI-powered products make use of this fine-grained control. Delivering more value to the customer in a safer way.

LaText: a fictional semantic text editor. Instead of choosing fonts or colours, you’d pick relevant features and control them.

Last year, Anthropic released Golden Gate Claude. This is a version of Claude where the feature related to the Golden Gate Bridge was activated. If you asked how to spend $10, I recommend using it to drive across the Golden Gate Bridge and pay the toll. But after 24 hours, they took the demo offline. They later announced a Steering API coming to Claude, allowing you to control specific features. So far, it's unreleased. Golden Gate Claude is very promising, but accurately tweaking isolated features is still an open problem.

Visualisation of Golden Gate Claude (Anthropic)

Model creators like OpenAI and Anthropic are very interested in unraveling these hidden features to make models safer. Knowing which features are activated by unwanted behaviour would allow model builders to filter these out. For example, OpenAI identified features in GPT-2 for "words related to deception or falseness" and" words related to falsehoods, particularly hoaxes". This is an important aspect of alignment: the model's goals are aligned with the human's.

The next wave of GenAI

Apart from safer and better aligned models, we will also benefit from these superpowers through new GenAI tools. Imagine being able to fully understand the underlying dimensions of a model and carefully steer its output. This will be especially interesting for mixed or complex content types, like a brand. A brand is defined by tone of voice, logo, color palette, etc. Imagine being able to tweak the playfulness feature, where all those elements together change. Or an entire website, where colours, copy, layout and animation can be carefully steered and tuned. I’ve heard clients say countless times a concept needs to be “more Apple” or “sexier”. But what does that even mean? Now we can use these models to give us an idea and steer it in a certain direction.

New tools will allow you to produce content in a similar way as a music producer can currently produce a song. Tweaking each instrument, adding effects and changing the format of the output. Either via software or via dedicated hardware, where lots of rotary knobs and sliders allow for precise and intuitive editing. The future way of content editing will be to select relevant features and tweak them to your desired output.

@prathyvsh grouped a lot of interesting latent interfaces in an X thread.

Type image ca

AI is a UX problem

Just like most AI problems today, this will be much more a UX problem than it will be a technological problem. We'll need to find new design patterns to be able to deal with that much data and metadata. Allowing people to make informed decisions without overloading them with information. Finding the right balance and harnessing the potential is an exciting next step for any product using AI.

Whether you are already building an AI-powered product or you're about to, the field of interpretability is one to keep a close eye on. Either to improve the experience of your customers or to make it safer overall.

We are on the verge of unlocking a whole new world. A world currently hidden just below our feet, waiting to be discovered. It will allow us to understand our world better and change it to our precise wishes.

Welcome to the Latent Space Age!

Stay ahead
of the game.

Sign up for our monthly newsletter and stay updated on trends, events and inspiring cases.