with a Gmail account to unlock — no forms, no spam.

80 Real Machine Learning Interview Questions and Answers

-- For Applied Scientists, Data Scientists, and Machine Learning Engineers --

This is a curated, evolving list of real machine learning interview questions and answers, designed by a Staff ML Scientist who is still actively interviewing candidates. Practicing these questions will help you prepare for ML Scientist, ML Engineer, Applied Scientist, and Data Scientist roles at FAANG and similar-tier companies.

Problem Topics Difficulty
Imbalanced dataset Class imbalance Medium
Gradient descent vs. stochastic gradient descent Gradient descent, Stochastic gradient descent, Minibatch Medium
Top-p vs. top-k sampling LLMs, Sampling Medium
Bias-variance equation Bias-variance tradeoff, Formula derivation Easy
() Multiclass evaluation metrics Multiclass, Diagnostics Medium
() NDCG vs. Mean Average Precision (MAP) vs. Mean Reciprocal Rank (MRR) Ranking metrics, NDCG, Mean average precision (MAP), Mean Reciprocal Rank (MRR), Recommender Systems Medium
() k-means k-means Medium
() Range of R2 when combining regressions Linear regression, Goodness of fit, R-squared, Correlation Medium
() ROC vs. PR curve AUC, ROC, AUPR, Precision, Recall, Evaluation metrics Medium
() L1 (Lasso) vs. L2 (Ridge) regularization L1, L2, Redularization, Lasso, Ridge Medium
() Example of high-bias and high variance Bias-variance tradeoff Medium
() Tokens vs. Words LLMs, Tokenization Easy
() AUC ROC and predicted output transformations AUC ROC Easy
() Linear regression likelihood function Linear regression, Likelihood function, Formula derivation Medium
() Analyze prediction error Prediction error, Bias-variance tradeoff, Diagnostics, Learning curves Medium
() Model interpretability Interpretability Easy
() Encoding categorical features Categorical features, Embeddings, One-hot encoding, Hashing Easy
( Subscription required ) Negative sampling Neural networks, Deep learning, Negative sampling Medium
( Subscription required ) Non-probability sampling Sampling, Non-probability Easy
( Subscription required ) Neuaral networks in layman terms Neural networks, Deep learning Easy
( Subscription required ) Overfitting in neural networks Neural networks, Deep learning, Overfitting Medium
( Subscription required ) Multicollinearity Multicollinearity, Linear regression Medium
( Subscription required ) Multi-headed attention and self attention Neural networks, Deep learning, Transformers, LLMs Medium
( Subscription required ) MSE vs. MAE MSE, MAE Easy
( Subscription required ) Momentum Neural networks, Deep learning, Optimization Hard
( Subscription required ) Missing data Missing data Easy
( Subscription required ) Minimization of loss function intuition Neural networks, Deep learning, Optimization Easy
( Subscription required ) Random forest feature importance Feature importance, Explainability, Gini importance, Permutation importance Medium
( Subscription required ) Weighted and importance sampling Sampling, Weighted sampling, Importance sampling Easy
( Subscription required ) Weight initialization Neural networks, Deep learning Medium
( Subscription required ) Vanishing and exploding gradients (mathematical explaination) Neural networks, Deep learning, Mathematical explaination Hard
( Subscription required ) Transfer learning vs. knowledge distillation LLM, Deep learning, Transfer learning, Knowledge distillation Medium
( Subscription required ) Transfer learning Neural networks, Deep learning, Transformers, LLMs, Catastrophic forgetting Easy
( Subscription required ) SMOTE Imbalanced classification, SMOTE, Data augmentation Easy
( Subscription required ) Self-supervised learning Neural networks, Deep learning, Contrastive learning Easy
( Subscription required ) Random vs. stratified sampling Sampling, Stratified sampling Easy
( Subscription required ) Normalization in neural networks Neural networks, Deep learning, Batch normalization, Layer normalization Medium
( Subscription required ) Prove that a median minizes MAE MAE, Median, Formula derivation, Proof Hard
( Subscription required ) Principal Component Analysis (PCA) PCA Easy
( Subscription required ) Positional embeddings Feature engineering, Deep learning, Transformers, LLMs, Positional embeddings, Positional encodings Hard
( Subscription required ) Linear regression assumptions Linear regression Easy
( Subscription required ) Outliers Outliers, Cook’s distance, Regularization Easy
( Subscription required ) Optimize multiple objectives Modeling, Multiple objectives Easy
( Subscription required ) Not enough data to train a model Data limitations Easy
( Subscription required ) Normalization vs. Standardization Linear regression, Standardization, Normalization Easy
( Subscription required ) Bootstrap Bootstrap Easy
( Subscription required ) Examples of encoder and decoder models LLMs, Transformers, Encoder, Decoder Easy
( Subscription required ) Ensembles Ensembles, Numerical example Medium
( Subscription required ) Discretization drawbacks Categorical variables, Discretization Easy
( Subscription required ) Decide between a multinomial vs. a binary modeling approach Modeling, Multinomial, Binary Easy
( Subscription required ) Cross validation Cross validation, Offline evaluation Easy
( Subscription required ) Correlation with binary variables Correlation, Hypothesis testing, Point-biserial correlation coefficient Easy
( Subscription required ) Comparing decision trees with random forests Decision trees, Random forests Easy
( Subscription required ) Common causes of data leakage Data leakage Medium
( Subscription required ) Exponentially weighted moving average Exponentially weighted moving average, Formula derivation, Proof Medium
( Subscription required ) Bias-variance biased estimator Bias-variance tradeoff Medium
( Subscription required ) Bayesian frequentist statistics Bayesian, Frequentist Easy
( Subscription required ) Baselines Model evaluation Easy
( Subscription required ) Attention (intuition) Neural networks, Deep learning, Transformers, LLMs Easy
( Subscription required ) Approximate nearest neighbors ANN, ANNOY, Nearest neighbors Medium
( Subscription required ) Adaptive learning rate Neural networks, Deep learning, Optimization Hard
( Subscription required ) Adagrad vs. RMSProp vs. Adam Neural networks, Deep learning, Optimization Hard
( Subscription required ) Active learning Labels, Label sampling Medium
( Subscription required ) Hypothesis testing in regression coefficients Linear regression, Hypothesis testing Medium
( Subscription required ) Logistic regression and standardization Logistic regression, Standardization Easy
( Subscription required ) Linear regression with stochastic gradient descent (formula derivation) Linear regression, Stochastic gradient descent, Formula derivation Medium
( Subscription required ) Linear regression with duplicated rows Linear regression, Statistical significance Easy
( Subscription required ) Activation functions Neural networks, Deep learning Easy
( Subscription required ) L2 regularization vs. weight decay Neural networks, Deep learning, Regularization Hard
( Subscription required ) Interpretability ML interpretability Easy
( Subscription required ) Intercept Linear regression, Intercept Easy
( Subscription required ) Information gain in decision trees Decision tree, Entroy, Information Gain, Formula derivation Medium
( Subscription required ) Logistic regression assumptions Logistic regression Easy
( Subscription required ) How to get more labels Modeling, Label encoding Medium
( Subscription required ) Gradient descent vs. Stochastic Gradient descent and learning rate Gradient descent (GD), Stochastic gradient descent (SGD), Learning rate, Optimization, Deep learning Medium
( Subscription required ) Gradient descent vs. Stochastic Gradient descent and local minima Gradient descent (GD), Stochastic gradient descent (SGD), Local minima, Optimization, Deep learning Medium
( Subscription required ) Gradient boosting vs. random forests Gradient boosting, Random forests, Bagging, Boosting Medium
( Subscription required ) Gini impurity vs. information gain Decision tree, Information Gain, Gini impurity Medium
( Subscription required ) Feature engineering in the era of deep learning Feature engineering Easy
( Subscription required ) Feature crossing Feature engineering, Deep learning Easy