Sub-notes

Backpropagation from scratch

Source: The spelled-out intro to neural networks and backpropagation: building micrograd Backpropagation on paper It implements backpropagation for two arithmetic operation (multiplication and addition) which are quite straightforward. Implementation is for this equation. a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10.0, label='c') e = a*b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L The most important thing to note here is the gradient accumulation step (shown at the bottom-left)....

Why we add regularization in loss function

it penalizes the weights, and prioritizes uniformity in weights. How does it penalize the weights? Now when we do the backprop and gradient descent. The gradient of loss w.r.t some weights become as we can see it penalizes the weight by reducing the weights’s value by some higher amount compared to the some minimial weight update when we only used loss function. So overall, the model tries to balance the Loss (L) as well as keep the weights small....

Using Genetic Algorithm for Weights Optimization

BackStory This is a simple fun little project that I did almost a year ago, At that time I used to see a lot of CodeBullet’s videos and wanted to learn the gist behind these evolutionary algorithms, and I can’t get into my head if I don’t do it from scratch so I wanted to implement this project from scratch. At that time, I wanted to document the learning and the implementation process, I even thought of making a youtube video about this topic but could not complete it, at that time, I had made some video animations about this process but could not complete it because I started doing something else and when I can back it was already too much mess and could not complete the video animations, but I’ll include the video animations that I had made earlier....

# We always start with a dataset to train on. Let's download the tiny shakespeare dataset !wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt with open('input.txt', 'r', encoding='utf-8') as f: data = f.read() from torch import nn import torch vocab = sorted(list(set(data))) len(data) stoi = {s:i for i,s in enumerate(vocab)} itos = {i:s for s,i in stoi.items()} encode = lambda x: [stoi[i] for i in x] decode = lambda x: ''.join([itos[i] for i in x]) type(data) str Xtr = data[:int(0....

optimizing-loss-with-weight-initialization

Problem Consider a simple MLP that takes in combined 3 character embeddings as an input and we predicts a new character. # A simple MLP n_embd = 10 # the dimensionality of the character embedding vectors n_hidden = 200 # the number of neurons in the hidden layer of the MLP g = torch.Generator().manual_seed(2147483647) # for reproducibility C = torch.randn((vocab_size, n_embd), generator=g) W1 = torch.randn((n_embd * block_size, n_hidden), generator=g) b1 = torch....