181 coding challenges across numpy, PyTorch, transformers, and distributed systems.
| # | Title | Difficulty | Tags |
|---|---|---|---|
| 1 | Numerically Stable Softmax | Easy | numpy, activation, numerical-stability |
| 2 | Sigmoid | Easy | numpy, activation |
| 3 | ReLU & Variants | Easy | numpy, activation |
| 4 | Cross-Entropy Loss | Easy | numpy, loss |
| 5 | MSE Loss | Easy | numpy, loss |
| 6 | Batch Normalization | Medium | numpy, normalization |
| 7 | Layer Normalization | Medium | numpy, normalization, transformer |
| 8 | Scaled Dot-Product Attention | Medium | numpy, transformer, attention |
| 9 | Adam Optimizer Step | Medium | numpy, optimizer |
| 10 | SGD with Momentum | Medium | numpy, optimizer |
| 11 | Dropout Forward | Medium | numpy, regularization |
| 12 | Backprop: Single Linear Layer | Medium | numpy, backpropagation |
| 13 | Backprop: 2-Layer MLP | Hard | numpy, backpropagation, mlp |
| 14 | Sinusoidal Positional Encoding | Medium | numpy, transformer, positional-encoding |
| 15 | Cosine Annealing LR | Easy | numpy, learning-rate, scheduler |
| 16 | KL Divergence | Easy | numpy, loss, information-theory |
| 17 | He Weight Initialization | Easy | numpy, initialization |
| 18 | L2 Regularization | Easy | numpy, regularization |
| 19 | Gradient Clipping | Easy | numpy, optimization, gradients |
| 20 | Multi-Head Attention | Hard | numpy, transformer, attention |
| 21 | Rotary Position Embedding (RoPE) | Medium | rope, positional-encoding, transformers, attention |
| 22 | Ring All-Reduce | Medium | all-reduce, distributed, collective, ring, data-parallelism |
| 23 | All-Gather | Easy | all-gather, distributed, collective, fsdp, tensor-parallelism |
| 24 | Data Parallelism: Gradient Averaging | Easy | data-parallelism, ddp, gradient, distributed, averaging |
| 25 | Tensor Parallelism (Megatron-LM) | Hard | tensor-parallelism, megatron, column-parallel, row-parallel, distributed |
| 26 | Full Transformer (Encoder-Decoder) | Hard | transformer, encoder-decoder, seq2seq, attention, cross-attention |
| 27 | Flash Attention (Tiled) | Hard | flash-attention, attention, memory-efficient, transformers, tiling |
| 28 | Grouped Query Attention (GQA) | Medium | gqa, attention, llama, mistral, kv-cache, efficiency |
| 29 | KV Cache | Medium | kv-cache, inference, attention, llm-serving |
| 30 | Byte-Pair Encoding (BPE) | Medium | tokenizer, bpe, nlp, vocabulary, gpt, llm |
1–30 of 181