This is a curated, evolving list of real LLM and deep learning interview questions and answers, designed by a Staff ML Scientist who is still actively interviewing candidates. Practicing these questions will help you prepare for ML Scientist, ML Engineer, Applied Scientist, and Data Scientist roles at FAANG and similar-tier companies.
| Problem | Topics | Difficulty |
|---|---|---|
| Top-p vs. top-k sampling | LLMs, Sampling | Medium |
| ( Login required ) Tokens vs. Words | LLMs, Tokenization | Easy |
| ( Subscription required ) Negative sampling | Neural networks, Deep learning, Negative sampling | Medium |
| ( Subscription required ) Weight initialization | Neural networks, Deep learning | Medium |
| ( Subscription required ) Vanishing and exploding gradients (mathematical explaination) | Neural networks, Deep learning, Mathematical explaination | Hard |
| ( Subscription required ) Transfer learning vs. knowledge distillation | LLM, Deep learning, Transfer learning, Knowledge distillation | Medium |
| ( Subscription required ) Transfer learning | Neural networks, Deep learning, Transformers, LLMs, Catastrophic forgetting | Easy |
| ( Subscription required ) Self-supervised learning | Neural networks, Deep learning, Contrastive learning | Easy |
| ( Subscription required ) Positional embeddings | Feature engineering, Deep learning, Transformers, LLMs, Positional embeddings, Positional encodings | Hard |
| ( Subscription required ) Overfitting in neural networks | Neural networks, Deep learning, Overfitting | Medium |
| ( Subscription required ) Normalization in neural networks | Neural networks, Deep learning, Batch normalization, Layer normalization | Medium |
| ( Subscription required ) Neuaral networks in layman terms | Neural networks, Deep learning | Easy |
| ( Subscription required ) Activation functions | Neural networks, Deep learning | Easy |
| ( Subscription required ) Multi-headed attention and self attention | Neural networks, Deep learning, Transformers, LLMs | Medium |
| ( Subscription required ) Momentum | Neural networks, Deep learning, Optimization | Hard |
| ( Subscription required ) Minimization of loss function intuition | Neural networks, Deep learning, Optimization | Easy |
| ( Subscription required ) L2 regularization vs. weight decay | Neural networks, Deep learning, Regularization | Hard |
| ( Subscription required ) Gradient descent vs. Stochastic Gradient descent and learning rate | Gradient descent (GD), Stochastic gradient descent (SGD), Learning rate, Optimization, Deep learning | Medium |
| ( Subscription required ) Gradient descent vs. Stochastic Gradient descent and local minima | Gradient descent (GD), Stochastic gradient descent (SGD), Local minima, Optimization, Deep learning | Medium |
| ( Subscription required ) Feature crossing | Feature engineering, Deep learning | Easy |
| ( Subscription required ) Examples of encoder and decoder models | LLMs, Transformers, Encoder, Decoder | Easy |
| ( Subscription required ) Attention (intuition) | Neural networks, Deep learning, Transformers, LLMs | Easy |
| ( Subscription required ) Adaptive learning rate | Neural networks, Deep learning, Optimization | Hard |
| ( Subscription required ) Adagrad vs. RMSProp vs. Adam | Neural networks, Deep learning, Optimization | Hard |