Roadmap (10 concepts)
- Tokenization
- Embedding
- Self-Attention
- Multi-Head Attention
- Positional Encoding
- Feed-Forward Blocks
- Residual and LayerNorm
- Autoregressive Training
- Evaluation Metrics
- Scaling Laws
Suggested completion: 2 weeks for readers with math and probability basics.