Inside GPT – Large Language Models Demystified - Alan Smith - NDC Oslo 2024

Explore how large language models like GPT actually work, from tokens and embeddings to attention mechanisms and prompt engineering, with practical examples and insights.

Key takeaways

Language models work by predicting the statistical probability distribution of the next token in a sequence, they don’t truly “understand” text
Tokens are not words - they are common sequences of characters found mostly in English text, with about 50,000 tokens in GPT’s vocabulary
Models use embeddings (768-dimensional vectors in GPT-2) to represent tokens mathematically, allowing the model to perform calculations and find relationships between tokens
Temperature and top-P (nucleus sampling) are key parameters for controlling randomness in token selection - lower temperature makes outputs more deterministic, higher allows more creativity
English is generally most efficient for prompts due to tokenization being optimized for English, with other languages requiring more tokens for the same content
Position and sequence order of tokens is critical - models use positional embeddings to understand token location in sequences
Models have a fixed context window and training data cutoff - they can’t remember beyond their context length or know about events after their training cutoff
Attention mechanisms allow models to identify relationships between tokens in the input sequence, replacing older RNN/LSTM approaches
Prompt engineering in English tends to work best, even when generating outputs in other languages
Retrieval augmented generation (RAG) can enhance models by providing them access to external knowledge sources beyond their training data

Inside GPT – Large Language Models Demystified - Alan Smith - NDC Oslo 2024

More talks