Learning Path: Transformer Foundations

Suggested completion: 2 weeks for readers with math and probability basics.

Roadmap (10 concepts)

  1. Tokenization
  2. Embedding
  3. Self-Attention
  4. Multi-Head Attention
  5. Positional Encoding
  6. Feed-Forward Blocks
  7. Residual and LayerNorm
  8. Autoregressive Training
  9. Evaluation Metrics
  10. Scaling Laws
Start path