Deep Learning Notes

Backpropagation Backpropagation on scalars from scratch Manual Backpropagation on tensor Loss function Maximum likelihood estimate as loss function Why we add regularization to loss functioñ Optimization Optimization Algorithms (SGD with momentum, RMSProp, Adam) Optimizing loss with weight initialization BatchNormalization RMSNorm Diagnostic tool to look out for while training NN Skip Connections Training Misc Matrix Visualization SwiGLU activation- not mine, but offers best explanation Architecture Implementation GPT implementation MoE RoPE KV Cache and Grouped Query Attention GPU Basic intro to GPU architecture

December 8, 2024 · 1 min · CohleM

Essential blogs

Training Neural Networks Karpathy’s advice while training NN Deep Learning Concepts Contains simple explanation for DL concepts How to scale your LLM (Must read) https://jax-ml.github.io/scaling-book/ The Ultra-Scale Playbook: Training LLMs on GPU Clusters https://huggingface.co/spaces/nanotron/ultrascale-playbook Good coding style https://medium.com/@NoamShazeer/shape-suffixes-good-coding-style-f836e72e24fd How to sample from LLM (top-k, top-p) https://huggingface.co/blog/how-to-generate

January 3, 2025 · 1 min · CohleM