We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Inside GPT – Large Language Models Demystified - Alan Smith - NDC Oslo 2024
Explore how large language models like GPT actually work, from tokens and embeddings to attention mechanisms and prompt engineering, with practical examples and insights.
-
Language models work by predicting the statistical probability distribution of the next token in a sequence, they don’t truly “understand” text
-
Tokens are not words - they are common sequences of characters found mostly in English text, with about 50,000 tokens in GPT’s vocabulary
-
Models use embeddings (768-dimensional vectors in GPT-2) to represent tokens mathematically, allowing the model to perform calculations and find relationships between tokens
-
Temperature and top-P (nucleus sampling) are key parameters for controlling randomness in token selection - lower temperature makes outputs more deterministic, higher allows more creativity
-
English is generally most efficient for prompts due to tokenization being optimized for English, with other languages requiring more tokens for the same content
-
Position and sequence order of tokens is critical - models use positional embeddings to understand token location in sequences
-
Models have a fixed context window and training data cutoff - they can’t remember beyond their context length or know about events after their training cutoff
-
Attention mechanisms allow models to identify relationships between tokens in the input sequence, replacing older RNN/LSTM approaches
-
Prompt engineering in English tends to work best, even when generating outputs in other languages
-
Retrieval augmented generation (RAG) can enhance models by providing them access to external knowledge sources beyond their training data