TorchedUp
ProblemsPremium
TorchedUp
ProblemsPremium

Problems

181 coding challenges across numpy, PyTorch, transformers, and distributed systems.

#TitleDifficultyAcceptanceTags
1Numerically Stable SoftmaxEasy—numpy, activation, numerical-stability
2SigmoidEasy—numpy, activation
3ReLU & VariantsEasy—numpy, activation
4Cross-Entropy LossEasy—numpy, loss
5MSE LossEasy—numpy, loss
6Batch NormalizationMedium—numpy, normalization
7Layer NormalizationMedium—numpy, normalization, transformer
8Scaled Dot-Product AttentionMedium—numpy, transformer, attention
9Adam Optimizer StepMedium—numpy, optimizer
10SGD with MomentumMedium—numpy, optimizer
11Dropout ForwardMedium—numpy, regularization
12Backprop: Single Linear LayerMedium—numpy, backpropagation
13Backprop: 2-Layer MLPHard—numpy, backpropagation, mlp
14Sinusoidal Positional EncodingMedium—numpy, transformer, positional-encoding
15Cosine Annealing LREasy—numpy, learning-rate, scheduler
16KL DivergenceEasy—numpy, loss, information-theory
17He Weight InitializationEasy—numpy, initialization
18L2 RegularizationEasy—numpy, regularization
19Gradient ClippingEasy—numpy, optimization, gradients
20Multi-Head AttentionHard—numpy, transformer, attention
21Rotary Position Embedding (RoPE)Medium—rope, positional-encoding, transformers, attention
22Ring All-ReduceMedium—all-reduce, distributed, collective, ring, data-parallelism
23All-GatherEasy—all-gather, distributed, collective, fsdp, tensor-parallelism
24Data Parallelism: Gradient AveragingEasy—data-parallelism, ddp, gradient, distributed, averaging
25Tensor Parallelism (Megatron-LM)Hard—tensor-parallelism, megatron, column-parallel, row-parallel, distributed
26Full Transformer (Encoder-Decoder)Hard—transformer, encoder-decoder, seq2seq, attention, cross-attention
27Flash Attention (Tiled)Hard—flash-attention, attention, memory-efficient, transformers, tiling
28Grouped Query Attention (GQA)Medium—gqa, attention, llama, mistral, kv-cache, efficiency
29KV CacheMedium—kv-cache, inference, attention, llm-serving
30Byte-Pair Encoding (BPE)Medium—tokenizer, bpe, nlp, vocabulary, gpt, llm

1–30 of 181

…
Advertisement

© 2026 TorchedUp. All rights reserved.

ChangelogContact UsTerms of ServicePrivacy PolicyRefund Policy