“Attention” is a mathematical process that helps make AI models “context-aware”.
data:image/s3,"s3://crabby-images/e8375/e83750ee58292673255a1d976b313e80117fd841" alt=""
It is the backbone of LLM’s, calculating how different words interact to convey meaning.
data:image/s3,"s3://crabby-images/afde6/afde61956ee764d28428125a4e1d86bc3c81ba8a" alt=""
Attention has 4 primary components: Embeddings, Queries, Keys, Values. Each is made of learned “weights”.
data:image/s3,"s3://crabby-images/3b5b8/3b5b82e2b2a17a99221326d2afbc823ecf1b0628" alt=""
When combined, these weights allow us to predict the next word in a sentence.
data:image/s3,"s3://crabby-images/1dd58/1dd58fa1d69b4469d759781dcb1a4af899afe869" alt=""
Embeddings are mathematical representations of words*.
data:image/s3,"s3://crabby-images/f2e7d/f2e7d9fcd12acfa46afe35e3e27c75a98df14feb" alt=""
Larger embeddings capture more nuance around how words are used.
data:image/s3,"s3://crabby-images/d9d58/d9d58a52861a081fc8ad14845f53a9536eecd3e5" alt=""
Each embedding “vector” is multiplied by “Query”, “Key”, and “Value” matrices separately, resulting in “Query”, “Key”, and “Value” vectors.
data:image/s3,"s3://crabby-images/fd07f/fd07f4434d71dee72adf433d027bb03a213ad7fa" alt=""
We compare the Query and Key vectors, which give us our “Attention Pattern”; scoring how relevant each word is to updating the meaning of every other word.
data:image/s3,"s3://crabby-images/b8aec/b8aecdb25aebad5f9818cc09c4b0030e1de05a27" alt=""
The value vector tells us how to update the meaning of these words by multiplying it by these Attention scores.
data:image/s3,"s3://crabby-images/ed78d/ed78d1b9018e05bb839220bb1e2b95deaf6bf48a" alt=""
This result is added back to original word embedding, thereby capturing the context from all surrounding words.
data:image/s3,"s3://crabby-images/efb6e/efb6ea3d838e87dc87e4b2818d45581c35ddea21" alt=""
During “training”, this process is used to predict probabilities of the next possible word.
data:image/s3,"s3://crabby-images/16e2e/16e2e1ae181e044bddd47af6a470cd85bb729062" alt=""
This list is compared to the true word, and the model is “penalized” for incorrect and low confidence predictions.
data:image/s3,"s3://crabby-images/01a05/01a05509176dae83f532935b087dbf0d79fa82fe" alt=""
This penalty is used to adjust the “weights”, to make subsequent predictions more accurate.
data:image/s3,"s3://crabby-images/001fd/001fdcb6fc78ee9b01aa4d27bd3479fd4bcca506" alt=""