Backpropagation
Loss function
- Maximum likelihood estimate as loss function
- Why we add regularization to loss functioñ
- Goto video for intuitively understanding Entropy/Cross-Entropy/KL-divergence
Optimization
- Optimization Algorithms (SGD with momentum, RMSProp, Adam)
- Optimizing loss with weight initialization
- BatchNormalization
- RMSNorm
- Diagnostic tool to look out for while training NN
- Skip Connections
Training
Misc
- Matrix Visualization
- SwiGLU activation- not mine, but offers best explanation