“Attention” explained in 200 Words

Scroll to read more

“Attention” is a mathematical process that helps make AI models “context-aware”.

It is the backbone of LLM’s, calculating how different words interact to convey meaning.

Attention has 4 primary components: Embeddings, Queries, Keys, Values. Each is made of learned “weights”.

When combined, these weights allow us to predict the next word in a sentence.

Embeddings are mathematical representations of words*.

Larger embeddings capture more nuance around how words are used.

Each embedding “vector” is multiplied by “Query”, “Key”, and “Value” matrices separately, resulting in “Query”, “Key”, and “Value” vectors.

We compare the Query and Key vectors, which give us our “Attention Pattern”; scoring how relevant each word is to updating the meaning of every other word.