BatchNormalization

As we saw in our previous note how important it is to have the pre-activation values to be roughly gaussian (0 mean, and unit std). We saw how we can initialize our weights that make our pre-activation roughly gaussian by using Kaiming init. But, how do we always maintain our pre-activations to be roughly gaussian? Answer: BatchNormalization Benefits stable training preserves vanishing gradients BatchNormalization As the name suggests, batches are normalized (across batches), by normalizing across batches we preserve the gaussian property of our pre-activations....

December 19, 2024 · 4 min · CohleM

Maximum likelihood estimate as loss function

December 16, 2024 · 0 min · CohleM

Backpropagation from scratch

Source: The spelled-out intro to neural networks and backpropagation: building micrograd Backpropagation on paper It implements backpropagation for two arithmetic operation (multiplication and addition) which are quite straightforward. Implementation is for this equation. a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10.0, label='c') e = a*b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L The most important thing to note here is the gradient accumulation step (shown at the bottom-left)....

December 8, 2024 · 2 min · CohleM

Why we add regularization in loss function

it penalizes the weights, and prioritizes uniformity in weights. How does it penalize the weights? Now when we do the backprop and gradient descent. The gradient of loss w.r.t some weights become as we can see it penalizes the weight by reducing the weights’s value by some higher amount compared to the some minimial weight update when we only used loss function. So overall, the model tries to balance the Loss (L) as well as keep the weights small....

December 8, 2024 · 1 min · CohleM

# We always start with a dataset to train on. Let's download the tiny shakespeare dataset !wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt with open('input.txt', 'r', encoding='utf-8') as f: data = f.read() from torch import nn import torch vocab = sorted(list(set(data))) len(data) stoi = {s:i for i,s in enumerate(vocab)} itos = {i:s for s,i in stoi.items()} encode = lambda x: [stoi[i] for i in x] decode = lambda x: ''.join([itos[i] for i in x]) type(data) str Xtr = data[:int(0....

40 min