Essential blogs

January 3, 2025 · 1 min · CohleM

Table of Contents

Training Neural Networks
Deep Learning Concepts
How to scale your LLM (Must read)
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
Good coding style
How to sample from LLM (top-k, top-p)
KL Divergence
The meaning of Loss functions

Training Neural Networks

Karpathy’s advice while training NN

Deep Learning Concepts

Contains simple explanation for DL concepts

How to scale your LLM (Must read)

https://jax-ml.github.io/scaling-book/

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

https://huggingface.co/spaces/nanotron/ultrascale-playbook

Good coding style

https://medium.com/@NoamShazeer/shape-suffixes-good-coding-style-f836e72e24fd

How to sample from LLM (top-k, top-p)

https://huggingface.co/blog/how-to-generate

KL Divergence

https://www.youtube.com/watch?v=q0AkK8aYbLY

The meaning of Loss functions

https://jiha-kim.github.io/posts/the-mean-ing-of-loss-functions/