This is a curated, evolving list of real machine learning interview questions and answers, designed by a Staff ML Scientist who is still actively interviewing candidates. Practicing these questions will help you prepare for ML Scientist, ML Engineer, Applied Scientist, and Data Scientist roles at FAANG and similar-tier companies.
| Problem | Topics | Difficulty |
|---|---|---|
| Imbalanced dataset | Class imbalance | Medium |
| Gradient descent vs. stochastic gradient descent | Gradient descent, Stochastic gradient descent, Minibatch | Medium |
| Top-p vs. top-k sampling | LLMs, Sampling | Medium |
| Bias-variance equation | Bias-variance tradeoff, Formula derivation | Easy |
| ( Login required ) Multiclass evaluation metrics | Multiclass, Diagnostics | Medium |
| ( Login required ) NDCG vs. Mean Average Precision (MAP) vs. Mean Reciprocal Rank (MRR) | Ranking metrics, NDCG, Mean average precision (MAP), Mean Reciprocal Rank (MRR), Recommender Systems | Medium |
| ( Login required ) k-means | k-means | Medium |
| ( Login required ) Range of R2 when combining regressions | Linear regression, Goodness of fit, R-squared, Correlation | Medium |
| ( Login required ) ROC vs. PR curve | AUC, ROC, AUPR, Precision, Recall, Evaluation metrics | Medium |
| ( Login required ) L1 (Lasso) vs. L2 (Ridge) regularization | L1, L2, Redularization, Lasso, Ridge | Medium |
| ( Login required ) Example of high-bias and high variance | Bias-variance tradeoff | Medium |
| ( Login required ) Tokens vs. Words | LLMs, Tokenization | Easy |
| ( Login required ) AUC ROC and predicted output transformations | AUC ROC | Easy |
| ( Login required ) Linear regression likelihood function | Linear regression, Likelihood function, Formula derivation | Medium |
| ( Login required ) Analyze prediction error | Prediction error, Bias-variance tradeoff, Diagnostics, Learning curves | Medium |
| ( Login required ) Model interpretability | Interpretability | Easy |
| ( Login required ) Encoding categorical features | Categorical features, Embeddings, One-hot encoding, Hashing | Easy |
| ( Subscription required ) Negative sampling | Neural networks, Deep learning, Negative sampling | Medium |
| ( Subscription required ) Non-probability sampling | Sampling, Non-probability | Easy |
| ( Subscription required ) Neuaral networks in layman terms | Neural networks, Deep learning | Easy |
| ( Subscription required ) Overfitting in neural networks | Neural networks, Deep learning, Overfitting | Medium |
| ( Subscription required ) Multicollinearity | Multicollinearity, Linear regression | Medium |
| ( Subscription required ) Multi-headed attention and self attention | Neural networks, Deep learning, Transformers, LLMs | Medium |
| ( Subscription required ) MSE vs. MAE | MSE, MAE | Easy |
| ( Subscription required ) Momentum | Neural networks, Deep learning, Optimization | Hard |
| ( Subscription required ) Missing data | Missing data | Easy |
| ( Subscription required ) Minimization of loss function intuition | Neural networks, Deep learning, Optimization | Easy |
| ( Subscription required ) Random forest feature importance | Feature importance, Explainability, Gini importance, Permutation importance | Medium |
| ( Subscription required ) Weighted and importance sampling | Sampling, Weighted sampling, Importance sampling | Easy |
| ( Subscription required ) Weight initialization | Neural networks, Deep learning | Medium |
| ( Subscription required ) Vanishing and exploding gradients (mathematical explaination) | Neural networks, Deep learning, Mathematical explaination | Hard |
| ( Subscription required ) Transfer learning vs. knowledge distillation | LLM, Deep learning, Transfer learning, Knowledge distillation | Medium |
| ( Subscription required ) Transfer learning | Neural networks, Deep learning, Transformers, LLMs, Catastrophic forgetting | Easy |
| ( Subscription required ) SMOTE | Imbalanced classification, SMOTE, Data augmentation | Easy |
| ( Subscription required ) Self-supervised learning | Neural networks, Deep learning, Contrastive learning | Easy |
| ( Subscription required ) Random vs. stratified sampling | Sampling, Stratified sampling | Easy |
| ( Subscription required ) Normalization in neural networks | Neural networks, Deep learning, Batch normalization, Layer normalization | Medium |
| ( Subscription required ) Prove that a median minizes MAE | MAE, Median, Formula derivation, Proof | Hard |
| ( Subscription required ) Principal Component Analysis (PCA) | PCA | Easy |
| ( Subscription required ) Positional embeddings | Feature engineering, Deep learning, Transformers, LLMs, Positional embeddings, Positional encodings | Hard |
| ( Subscription required ) Linear regression assumptions | Linear regression | Easy |
| ( Subscription required ) Outliers | Outliers, Cook’s distance, Regularization | Easy |
| ( Subscription required ) Optimize multiple objectives | Modeling, Multiple objectives | Easy |
| ( Subscription required ) Not enough data to train a model | Data limitations | Easy |
| ( Subscription required ) Normalization vs. Standardization | Linear regression, Standardization, Normalization | Easy |
| ( Subscription required ) Bootstrap | Bootstrap | Easy |
| ( Subscription required ) Examples of encoder and decoder models | LLMs, Transformers, Encoder, Decoder | Easy |
| ( Subscription required ) Ensembles | Ensembles, Numerical example | Medium |
| ( Subscription required ) Discretization drawbacks | Categorical variables, Discretization | Easy |
| ( Subscription required ) Decide between a multinomial vs. a binary modeling approach | Modeling, Multinomial, Binary | Easy |
| ( Subscription required ) Cross validation | Cross validation, Offline evaluation | Easy |
| ( Subscription required ) Correlation with binary variables | Correlation, Hypothesis testing, Point-biserial correlation coefficient | Easy |
| ( Subscription required ) Comparing decision trees with random forests | Decision trees, Random forests | Easy |
| ( Subscription required ) Common causes of data leakage | Data leakage | Medium |
| ( Subscription required ) Exponentially weighted moving average | Exponentially weighted moving average, Formula derivation, Proof | Medium |
| ( Subscription required ) Bias-variance biased estimator | Bias-variance tradeoff | Medium |
| ( Subscription required ) Bayesian frequentist statistics | Bayesian, Frequentist | Easy |
| ( Subscription required ) Baselines | Model evaluation | Easy |
| ( Subscription required ) Attention (intuition) | Neural networks, Deep learning, Transformers, LLMs | Easy |
| ( Subscription required ) Approximate nearest neighbors | ANN, ANNOY, Nearest neighbors | Medium |
| ( Subscription required ) Adaptive learning rate | Neural networks, Deep learning, Optimization | Hard |
| ( Subscription required ) Adagrad vs. RMSProp vs. Adam | Neural networks, Deep learning, Optimization | Hard |
| ( Subscription required ) Active learning | Labels, Label sampling | Medium |
| ( Subscription required ) Hypothesis testing in regression coefficients | Linear regression, Hypothesis testing | Medium |
| ( Subscription required ) Logistic regression and standardization | Logistic regression, Standardization | Easy |
| ( Subscription required ) Linear regression with stochastic gradient descent (formula derivation) | Linear regression, Stochastic gradient descent, Formula derivation | Medium |
| ( Subscription required ) Linear regression with duplicated rows | Linear regression, Statistical significance | Easy |
| ( Subscription required ) Activation functions | Neural networks, Deep learning | Easy |
| ( Subscription required ) L2 regularization vs. weight decay | Neural networks, Deep learning, Regularization | Hard |
| ( Subscription required ) Interpretability | ML interpretability | Easy |
| ( Subscription required ) Intercept | Linear regression, Intercept | Easy |
| ( Subscription required ) Information gain in decision trees | Decision tree, Entroy, Information Gain, Formula derivation | Medium |
| ( Subscription required ) Logistic regression assumptions | Logistic regression | Easy |
| ( Subscription required ) How to get more labels | Modeling, Label encoding | Medium |
| ( Subscription required ) Gradient descent vs. Stochastic Gradient descent and learning rate | Gradient descent (GD), Stochastic gradient descent (SGD), Learning rate, Optimization, Deep learning | Medium |
| ( Subscription required ) Gradient descent vs. Stochastic Gradient descent and local minima | Gradient descent (GD), Stochastic gradient descent (SGD), Local minima, Optimization, Deep learning | Medium |
| ( Subscription required ) Gradient boosting vs. random forests | Gradient boosting, Random forests, Bagging, Boosting | Medium |
| ( Subscription required ) Gini impurity vs. information gain | Decision tree, Information Gain, Gini impurity | Medium |
| ( Subscription required ) Feature engineering in the era of deep learning | Feature engineering | Easy |
| ( Subscription required ) Feature crossing | Feature engineering, Deep learning | Easy |