2025  16

March  1

Interpretability

March 3, 2025 · 11 min · CohleM

February  3

RLHF

February 24, 2025 · 1 min · CohleM

Flops calculation

February 11, 2025 · 3 min · CohleM

Post Training Strategies

February 6, 2025 · 3 min · CohleM

January  12

Notes-while-building-lilLM

January 29, 2025 · 1 min · CohleM

Pytorch Commands I forget time to time/ commands that are essential

January 29, 2025 · 1 min · CohleM

Tokenization

January 22, 2025 · 7 min · CohleM

Papers Summaries

January 21, 2025 · 2 min · CohleM

KV cache and Grouped Query Attention

January 18, 2025 · 11 min · CohleM

RMSNorm

January 15, 2025 · 1 min · CohleM

RoPE

January 15, 2025 · 7 min · CohleM

GPUs

January 8, 2025 · 6 min · CohleM

Mixture of Experts

January 5, 2025 · 14 min · CohleM

DDP and gradient sync

January 3, 2025 · 6 min · CohleM

Gradient Accumulation

January 3, 2025 · 1 min · CohleM

Training Speed Optimization

January 2, 2025 · 3 min · CohleM

2024  9

December  9

skip-connections

December 30, 2024 · 1 min · CohleM

Optimization Algorithms (SGD with momentum, RMSProp, Adam)

December 27, 2024 · 3 min · CohleM

manual-backpropagation-on-tensors

December 24, 2024 · 8 min · CohleM

Matrix Visualization

December 24, 2024 · 2 min · CohleM

Diagnostic-tool-while-training-nn

December 20, 2024 · 6 min · CohleM

BatchNormalization

December 19, 2024 · 4 min · CohleM

Maximum likelihood estimate as loss function

December 16, 2024 · 0 min · CohleM

Backpropagation from scratch

December 8, 2024 · 2 min · CohleM

Why we add regularization in loss function

December 8, 2024 · 1 min · CohleM