Matrix Visualization

In deep learning, it’s important to visualize a matrix and how it is represented in a dimension space because the operations that we perform on those matrix becomes very much intuitive afterwards. Visualizing two dimensional matrix. This has to be the most intuitive visualization. [ [12, 63, 10, 42, 70, 31, 34, 8, 34, 5], [10, 97, 100, 39, 64, 25, 86, 22, 31, 25], [28, 44, 82, 61, 70, 94, 22, 88, 89, 56] ] We can simply imagine rows are some examples and columns as those examples’ features....

December 24, 2024 · 2 min · CohleM

Diagnostic-tool-while-training-nn

source: Building makemore Part 3: Activations & Gradients, BatchNorm Things to look out for while training NN Take a look at previous notes to understand this note better consider we have this simple 6 layer NN # Linear Layer g = torch.Generator().manual_seed(2147483647) # for reproducibility class Layer: def __init__(self,fan_in, fan_out, bias=False): self.w = torch.randn((fan_in, fan_out),generator = g) / (fan_in)**(0.5) # applying kaiming init self.bias = bias if bias: self.b = torch....

December 20, 2024 · 6 min · CohleM

BatchNormalization

As we saw in our previous note how important it is to have the pre-activation values to be roughly gaussian (0 mean, and unit std). We saw how we can initialize our weights that make our pre-activation roughly gaussian by using Kaiming init. But, how do we always maintain our pre-activations to be roughly gaussian? Answer: BatchNormalization Benefits stable training preserves vanishing gradients BatchNormalization As the name suggests, batches are normalized (across batches), by normalizing across batches we preserve the gaussian property of our pre-activations....

December 19, 2024 · 4 min · CohleM

Maximum likelihood estimate as loss function

December 16, 2024 · 0 min · CohleM

Backpropagation from scratch

Source: The spelled-out intro to neural networks and backpropagation: building micrograd Backpropagation on paper It implements backpropagation for two arithmetic operation (multiplication and addition) which are quite straightforward. Implementation is for this equation. a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10.0, label='c') e = a*b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L The most important thing to note here is the gradient accumulation step (shown at the bottom-left)....

December 8, 2024 · 2 min · CohleM