Easy, medium, and hard questions that cover most topics in machine learning and data science interviews. Solutions that deep dive into explaining complicated concepts, with necessary references and simulations when needed.
Problem | Role | Area | Topics | Difficulty | Company (login required) | Status |
---|---|---|---|---|---|---|
Sample Ratio Mismatch | DS PDS AS | A/B Testing | Sample Ratio Mismatch | Medium | Duolingo Snap Uber | |
Questions worksheet (recruiter-screen) | DS AS MLE PDS | Application process | Discussion prep, Application prep | Medium | Meta Robinhood Tik Tok | |
Questions worksheet (behavioral) | DS AS MLE PDS | Behavioral | Discussion prep | Hard | Chan-Zuckemberk Discord Uber | |
Minimum remove to make valid parentheses (Leetcode 1249) | DS AS MLE | Data Structures and Algorithms | Stack | Medium | Amazon | |
Remove duplicates in place (Leetcode 26) | DS AS MLE | Data Structures and Algorithms | Array, Two pointers | Easy | ||
Bias-variance equation | DS AS MLE | Machine Learning | Bias-variance tradeoff, Formula derivation | Easy | Fidelity Google Salesforce Workiva | |
Linear regression with gradient descent | DS AS MLE | Machine Learning Coding | Gradient descent, Linear regression | Medium | Apple Hinge Uber | |
Gradient descent vs. stochastic gradient descent | DS AS MLE | Machine Learning | Gradient descent, Stochastic gradient descent, Minibatch | Medium | Paypal | |
Imbalanced dataset | PDS DS AS MLE | Machine Learning | Class imbalance | Medium | Discord Paypal Salesforce | |
Sudden drop in user engagement | DS PDS | Metrics | Problem solving, Root-cause analysis | Medium | Snap | |
Documents edited by AI | DS PDS AS | Probability theory | Bayes rule, Conditional probability | Easy | Indeed Snap Thumbtack Warner Bros | |
Trailing by two: should we go for two or three? | DS PDS AS | Probability theory | Independence, Decision making | Easy | Microsoft | |
Artist maxranks | DS AS PDS | SQL | Join | Easy | Paypal | |
Two-sample t-test | DS PDS AS | Statistics | Hypothesis testing | Easy | ||
Choose a project | DS AS MLE PDS | Technical deep dive | Discussion prep | Hard | ||
Normality assumption | DS PDS AS | A/B Testing | Normality | Medium | ||
Reducing variance in AB testing | DS PDS AS | A/B Testing | Variance | Medium | Apple | |
Simpson’s paradox | DS PDS AS | A/B Testing | Simpson’s paradox | Medium | Fidelity Snap | |
Interview success probability | DS AS MLE PDS | Application process | Application prep | Easy | ||
Offer validity and cooling-off periods | DS AS MLE PDS | Application process | Application prep | Easy | ||
Referrals vs. online applications | DS AS MLE PDS | Application process | Application prep | Easy | ||
Response time | DS AS MLE PDS | Application process | Application prep | Easy | ||
Resume review | DS AS MLE PDS | Application process | Application prep, Resume | Easy | ||
Timeline: from application to offer | DS AS MLE PDS | Application process | Application prep | Easy | ||
Up-level during interview | DS AS MLE PDS | Application process | Application prep | Easy | ||
K Closest Points to Origin (Leetcode 973) | DS AS MLE | Data Structures and Algorithms | Heap | Easy | ||
Longest substring without repeating characters (Leetcode 3) | DS AS MLE | Data Structures and Algorithms | Hash, Sliding Window | Medium | Amazon | |
Number of recent calls (Leetcode 933) | DS AS MLE | Data Structures and Algorithms | Queue, DEQueue | Easy | ||
Analyze prediction error | DS AS MLE | Machine Learning | Prediction error, Bias-variance tradeoff, Diagnostics, Learning curves | Medium | Google Salesforce Snap Thumbtack | |
AUC ROC and predicted output transformations | DS AS MLE | Machine Learning | AUC ROC | Easy | ||
Sample from random generator | DS AS MLE | Machine Learning Coding | Sample, Uniform, Random number generator | Medium | Google Snap | |
Simulate dynamic coin flips | DS AS MLE | Machine Learning Coding | Simulation | Easy | Cruise | |
Encoding categorical features | PDS DS AS MLE | Machine Learning | Categorical features, Embeddings, One-hot encoding, Hashing | Easy | Apple Hinge Thumbtack | |
Example of high-bias and high variance | DS AS MLE | Machine Learning | Bias-variance tradeoff | Medium | ||
k-means | PDS DS AS MLE | Machine Learning | k-means | Medium | ||
L1 (Lasso) vs. L2 (Ridge) regularization | DS AS MLE | Machine Learning | L1, L2, Redularization, Lasso, Ridge | Medium | Instacart Uber | |
Linear regression likelihood function | DS AS MLE | Machine Learning | Linear regression, Likelihood function, Formula derivation | Medium | Uber | |
Model interpretability | DS AS MLE | Machine Learning | Interpretability | Easy | Paylocity | |
Multiclass evaluation metrics | DS AS MLE | Machine Learning | Multiclass, Diagnostics | Medium | Fidelity Paypal | |
NDCG vs. Mean Average Precision (MAP) vs. Mean Reciprocal Rank (MRR) | AS MLE | Machine Learning | Ranking metrics, NDCG, Mean average precision (MAP), Mean Reciprocal Rank (MRR), Recommender Systems | Medium | Apple Coursera Indeed Paypal | |
Range of R2 when combining regressions | PDS DS AS MLE | Machine Learning | Linear regression, Goodness of fit, R-squared, Correlation | Medium | D.E. Shaw | |
ROC vs. PR curve | AS MLE | Machine Learning | AUC, ROC, AUPR, Precision, Recall, Evaluation metrics | Medium | Amazon Google Meta Reddit | |
Measuring counterfactual impact | DS PDS | Metrics | Problem solving, Counterfactual | Hard | Meta Snap | |
Games between two players | DS PDS AS | Probability theory | Recursive relationship | Medium | Amazon | |
Monty Hall | DS PDS AS | Probability theory | Bayes rule, Conditional independence, Prior evidence | Medium | ||
Red and blue balls | DS PDS AS | Probability theory | Counting, Combinations, Repetition, Binomial | Easy | Indeed | |
Sum of normally distributed random variables | DS PDS AS | Probability theory | Normal, PDF, CDF | Medium | Meta | |
Two children I | DS PDS AS | Probability theory | Prior evidence | Easy | Amazon | |
Unfair coin probability | DS PDS AS | Probability theory | Bayes rule, Conditional probability | Easy | ||
Choose house or techno | DS AS PDS | SQL | Logical OR | Easy | ||
Expensive house songs I | DS AS PDS | SQL | Subquery, CTE, Join | Medium | ||
Expensive house songs II | DS AS PDS | SQL | Subquery, CTE, Join, Window functions | Hard | Google Snap Tik Tok | |
Histogram of songs | DS AS PDS | SQL | Recursive CTE, Join | Hard | ||
Songs that did not enter the charts or entered high | DS AS PDS | SQL | Subquery, Join | Medium | ||
Songs that ranked 1 to 50 | DS AS PDS | SQL | Between | Easy | ||
Songs that stay in the chars for a while | DS AS PDS | SQL | Subquery, CTE, Join, ALL, Window functions | Medium | ||
Biased coin | DS PDS AS | Statistics | Expectation, CLT, Binomial, Normal, Bernoulli, Hypothesis testing, CDF | Medium | Meta | |
Gambler’s ruin win probability | DS PDS AS | Statistics | Gambler ruin, Random walk, Expectation | Medium | Meta | |
Manual estimation of flips | DS PDS AS | Statistics | Normal, CDF, Binomial, CLT | Medium | ||
Measuring sticks | DS PDS AS | Statistics | Variance | Medium | Sisu | |
Monotonic draws | DS PDS AS | Statistics | Expectation | Hard | ||
Prussian horses | DS PDS AS | Statistics | Poisson, Hypothesis testing, CDF | Medium | Indeed | |
Relationship between p-val and confidence interval | DS PDS AS | Statistics | Confidence interval, P-value, Hypothesis testing | Easy | ||
Questions worksheet (deep dive) | DS AS MLE PDS | Technical deep dive | Discussion prep | Hard | ||
AA tests | DS PDS AS | A/B Testing | Variance | Easy | Apple Snap Uber | |
Counterfactual definition | DS PDS AS | A/B Testing | Counterfactual | Easy | ||
Equal-sized treatment and control groups | DS PDS AS | A/B Testing | Power, Variance, Sample size | Medium | Snap | |
False discovery control | DS PDS AS | A/B Testing | False discovery rate, Multiple hypotheses testing, Benjamini & Hochberg, Bonferroni | Easy | Discord Pinterest | |
Interference | DS PDS AS | A/B Testing | Interference | Easy | Snap TaskRabbit Uber | |
Multi-armed and contextual bandits in AB testing | DS PDS AS | A/B Testing | Contextual bandits, Multi-armed bandits | Medium | ||
Novelty and primacy effects | DS PDS AS | A/B Testing | Novelty effects, Primacy effects | Easy | Uber | |
Randomization checks | DS PDS AS | A/B Testing | Randomization | Easy | ||
Randomization level | DS PDS AS | A/B Testing | Randomization, Variance | Medium | ||
Climbing stairs (Leetcode 70) | DS AS MLE | Data Structures and Algorithms | Recursion, Dynamic programming | Easy | ||
Find if Path Exists in Graph (Leetcode 1971) | DS AS MLE | Data Structures and Algorithms | DFS, BFS | Medium | Amazon | |
Search in Binary Search Tree (Leetcode 700) | DS AS MLE | Data Structures and Algorithms | Binary Search, Binary Search Tree | Easy | ||
Sort an Array (Leetcode 912) | DS AS MLE | Data Structures and Algorithms | Recursion, Sorting | Medium | ||
Activation functions | AS MLE | Machine Learning | Neural networks, Deep learning | Easy | Cruise | |
Active learning | AS MLE | Machine Learning | Labels, Label sampling | Medium | Dropbox | |
Adagrad vs. RMSProp vs. Adam | AS MLE | Machine Learning | Neural networks, Deep learning, Optimization | Hard | Apple Cruise Instacart | |
Adaptive learning rate | AS MLE | Machine Learning | Neural networks, Deep learning, Optimization | Hard | Cruise Instacart Two Sigma | |
Approximate nearest neighbors | AS MLE | Machine Learning | ANN, ANNOY, Nearest neighbors | Medium | Dropbox Lacework | |
Attention (intuition) | AS MLE | Machine Learning | Neural networks, Deep learning, Transformers, LLMs | Easy | Hinge Lacework Tik Tok | |
Baselines | AS MLE DS | Machine Learning | Model evaluation | Easy | ||
Bayesian frequentist statistics | AS MLE | Machine Learning | Bayesian, Frequentist | Easy | ||
Bias-variance biased estimator | DS AS MLE | Machine Learning | Bias-variance tradeoff | Medium | ||
Bootstrap | PDS DS AS MLE | Machine Learning | Bootstrap | Easy | Google LinkedIn | |
K-means from scratch | DS AS MLE | Machine Learning Coding | k-means | Medium | Amazon Etsy Snap | |
Linear regression with stochastic gradient descent | DS AS MLE | Machine Learning Coding | Stochastic Gradient descent, Linear regression | Medium | ||
Logistic regression with gradient descent | DS AS MLE | Machine Learning Coding | Gradient descent, Logistic regression | Medium | ||
Naive Bayes from scratch | DS AS MLE | Machine Learning Coding | Gaussian Naive Bayes | Medium | Hinge | |
Neural network implementation | AS MLE | Machine Learning Coding | Gradient descent, Neural networks, Neuron | Hard | Uber | |
Principal Component Analysis (PCA) from scratch | DS AS MLE | Machine Learning Coding | Principal Component Analysis (PCA) | Medium | ||
Common causes of data leakage | DS AS MLE | Machine Learning | Data leakage | Medium | ||
Comparing decision trees with random forests | DS AS MLE | Machine Learning | Decision trees, Random forests | Easy | Paypal | |
Correlation with binary variables | DS AS MLE | Machine Learning | Correlation, Hypothesis testing, Point-biserial correlation coefficient | Easy | Meta | |
Cross validation | PDS DS AS MLE | Machine Learning | Cross validation, Offline evaluation | Easy | Amazon Amperity Discord | |
Decide between a multinomial vs. a binary modeling approach | AS MLE | Machine Learning | Modeling, Multinomial, Binary | Easy | ||
Discretization drawbacks | DS AS MLE | Machine Learning | Categorical variables, Discretization | Easy | ||
Ensembles | AS MLE DS | Machine Learning | Ensembles, Numerical example | Medium | ||
Examples of encoder and decoder models | AS MLE | Machine Learning | LLMs, Transformers, Encoder, Decoder | Easy | Fidelity Salesforce | |
Exponentially weighted moving average | AS MLE | Machine Learning | Exponentially weighted moving average, Formula derivation, Proof | Medium | ||
Feature crossing | AS MLE | Machine Learning | Feature engineering, Deep learning | Easy | Meta | |
Feature engineering in the era of deep learning | DS AS MLE | Machine Learning | Feature engineering | Easy | ||
Gini impurity vs. information gain | AS MLE | Machine Learning | Decision tree, Information Gain, Gini impurity | Medium | ||
Gradient boosting vs. random forests | DS AS MLE | Machine Learning | Gradient boosting, Random forests, Bagging, Boosting | Medium | Indeed Instacart Meta Salesforce | |
Gradient descent vs. Stochastic Gradient descent and local minima | AS MLE | Machine Learning | Gradient descent (GD), Stochastic gradient descent (SGD), Local minima, Optimization, Deep learning | Medium | Cruise Discord Google Indeed | |
Gradient descent vs. Stochastic Gradient descent and learning rate | AS MLE | Machine Learning | Gradient descent (GD), Stochastic gradient descent (SGD), Learning rate, Optimization, Deep learning | Medium | Paypal | |
Predict whether a movie will receive good reviews | AS MLE | Machine Learning Hands on | Feature engineering, Data exploration, ML modeling, Logistic regression, One hot encoding | Hard | ||
How to get more labels | AS MLE | Machine Learning | Modeling, Label encoding | Medium | ||
Hypothesis testing in regression coefficients | DS AS MLE | Machine Learning | Linear regression, Hypothesis testing | Medium | ||
Information gain in decision trees | AS MLE | Machine Learning | Decision tree, Entroy, Information Gain, Formula derivation | Medium | ||
Intercept | PDS DS AS MLE | Machine Learning | Linear regression, Intercept | Easy | ||
Interpretability | PDS DS AS MLE | Machine Learning | ML interpretability | Easy | Apple | |
L2 regularization vs. weight decay | AS MLE | Machine Learning | Neural networks, Deep learning, Regularization | Hard | Apple Google Instacart ROKT | |
Linear regression assumptions | PDS DS AS MLE | Machine Learning | Linear regression | Easy | Fidelity | |
Linear regression with duplicated rows | DS AS MLE | Machine Learning | Linear regression, Statistical significance | Easy | ||
Linear regression with stochastic gradient descent (formula derivation) | AS MLE | Machine Learning | Linear regression, Stochastic gradient descent, Formula derivation | Medium | ||
Logistic regression and standardization | DS AS MLE | Machine Learning | Logistic regression, Standardization | Easy | Indeed Paypal | |
Logistic regression assumptions | PDS DS AS MLE | Machine Learning | Logistic regression | Easy | Indeed Paypal Warner Bros | |
Minimization of loss function intuition | AS MLE | Machine Learning | Neural networks, Deep learning, Optimization | Easy | Robinhood Uber | |
Missing data | PDS DS AS MLE | Machine Learning | Missing data | Easy | Apple Flatiron Health | |
Momentum | AS MLE | Machine Learning | Neural networks, Deep learning, Optimization | Hard | Apple Cruise Instacart Paypal | |
MSE vs. MAE | PDS DS AS MLE | Machine Learning | MSE, MAE | Easy | ||
Multi-headed attention and self attention | AS MLE | Machine Learning | Neural networks, Deep learning, Transformers, LLMs | Medium | Amazon Google | |
Multicollinearity | PDS DS AS MLE | Machine Learning | Multicollinearity, Linear regression | Medium | Paypal | |
Negative sampling | AS MLE | Machine Learning | Neural networks, Deep learning, Negative sampling | Medium | Dropbox Reddit | |
Neuaral networks in layman terms | AS MLE | Machine Learning | Neural networks, Deep learning | Easy | Fidelity | |
Non-probability sampling | DS AS MLE | Machine Learning | Sampling, Non-probability | Easy | ||
Normalization in neural networks | AS MLE | Machine Learning | Neural networks, Deep learning, Batch normalization, Layer normalization | Medium | Amazon Google Tinder | |
Normalization vs. Standardization | PDS DS AS MLE | Machine Learning | Linear regression, Standardization, Normalization | Easy | ||
Not enough data to train a model | DS AS MLE | Machine Learning | Data limitations | Easy | ||
Optimize multiple objectives | AS MLE | Machine Learning | Modeling, Multiple objectives | Easy | ||
Outliers | PDS DS AS MLE | Machine Learning | Outliers, Cook’s distance, Regularization | Easy | ||
Overfitting in neural networks | AS MLE | Machine Learning | Neural networks, Deep learning, Overfitting | Medium | Apple Google Indeed Instacart | |
Positional embeddings | AS MLE | Machine Learning | Feature engineering, Deep learning, Transformers, LLMs, Positional embeddings, Positional encodings | Hard | ||
Principal Component Analysis (PCA) | PDS DS AS MLE | Machine Learning | PCA | Easy | ||
Prove that a median minizes MAE | AS MLE | Machine Learning | MAE, Median, Formula derivation, Proof | Hard | LinkedIn Uber | |
Random forest feature importance | AS MLE | Machine Learning | Feature importance, Explainability, Gini importance, Permutation importance | Medium | Discord Grammarly Hinge | |
Random vs. stratified sampling | PDS DS AS MLE | Machine Learning | Sampling, Stratified sampling | Easy | Meta | |
Self-supervised learning | AS MLE | Machine Learning | Neural networks, Deep learning, Contrastive learning | Easy | Dropbox Reddit | |
SMOTE | AS MLE | Machine Learning | Imbalanced classification, SMOTE, Data augmentation | Easy | Paypal Robinhood Snap Thumbtack | |
API patterns | MLE | Machine Learning System Design | APIs, GraphQL, REST | Easy | Lacework | |
Build an ML system to predict Ad clicks | AS MLE | Machine Learning System Design | ML system design, Feature engineering, Data exploration, ML modeling, Monitoring, Deployment, Business metrics | Hard | Meta | |
Cloud vs. on-device deployment | MLE | Machine Learning System Design | Deployment, Cloud, Edge | Medium | ||
Complex vs. simple deployment | MLE | Machine Learning System Design | Deployment | Easy | ||
Crons, schedulers, orchestrattors | MLE | Machine Learning System Design | ML infra | Medium | Dropbox | |
Data, model, and pipeline parallelism | MLE | Machine Learning System Design | Parallelism | Medium | ||
Debug an ML model | MLE | Machine Learning System Design | Best practices | Medium | ||
How to speed up inference | MLE | Machine Learning System Design | Inference | Easy | Dropbox Grammarly Robinhood Uber | |
ML system design tools and use cases | MLE | Machine Learning System Design | ML infra, CDN, Kafka, Reddis, Dynamo, Cassandra, Chubby, PGVector, DBT, Feast, MLFlow, Statsig, Airflow, Fiddler | Hard | Hinge Reddit | |
Online prediction, vs. batch prediction | MLE | Machine Learning System Design | Inference | Medium | Dropbox Reddit Uber | |
Simple model deployment process | MLE | Machine Learning System Design | Deployment, Docker | Easy | ||
Training tracking | MLE | Machine Learning System Design | Best practices | Medium | ||
Types of data distribution shifts | MLE | Machine Learning System Design | Train-serving skew, Covariate shift, Label shift, Concept shift | Medium | Dropbox Lacework | |
Transfer learning | AS MLE | Machine Learning | Neural networks, Deep learning, Transformers, LLMs, Catastrophic forgetting | Easy | ||
Transfer learning vs. knowledge distillation | AS MLE | Machine Learning | LLM, Deep learning, Transfer learning, Knowledge distillation | Medium | ||
Vanishing and exploding gradients (mathematical explaination) | AS MLE | Machine Learning | Neural networks, Deep learning, Mathematical explaination | Hard | Robinhood Salesforce | |
Weight initialization | AS MLE | Machine Learning | Neural networks, Deep learning | Medium | Tinder | |
Weighted and importance sampling | DS AS MLE | Machine Learning | Sampling, Weighted sampling, Importance sampling | Easy | ||
Characteristics of metrics | DS PDS | Metrics | Characteristics of metrics | Easy | Duolingo Snap | |
Types of metrics | DS PDS | Metrics | Types of metrics | Easy | ||
Consecutive tails | DS PDS AS | Probability theory | Permutations, Repetition | Easy | ||
Largest number rolled | DS PDS AS | Probability theory | Counting, Permutations, Repetition | Medium | ||
Median probability | DS PDS AS | Probability theory | Binomial, Uniform, CDF | Medium | Meta | |
Number of emails | DS PDS AS | Probability theory | Poisson distribution | Easy | Cruise | |
Paths to destination | DS PDS AS | Probability theory | Counting, Combinations | Easy | ||
Repeated rolls until 4 | DS PDS AS | Probability theory | Geometric distribution | Easy | ||
Sample digits 1-10 | DS PDS AS | Probability theory | Sample from samples, Uniform | Medium | Compass Google Snap | |
Two fair die rolls | DS PDS AS | Probability theory | Independence, CDF, PMF | Easy | ||
Artists with more songs than others | DS AS PDS | SQL | Subquery, CTE, Join, Window functions | Hard | Meta Snap Tik Tok | |
Concat columns | DS AS PDS | SQL | Concat | Easy | ||
Label recent songs | DS AS PDS | SQL | Case | Easy | ||
Median songs per artist | DS AS PDS | SQL | CTE, Window functions | Hard | ||
Songs in charts with greater durations | DS AS PDS | SQL | Subquery, CTE, Join, Window functions | Hard | ||
Songs per genre | DS AS PDS | SQL | Group by | Easy | ||
Songs with letters | DS AS PDS | SQL | Regexp | Easy | ||
An intuitive way to write power | DS PDS AS | Statistics | Power, Hypothesis testing | Easy | ||
Buy and sell stocks | DS PDS AS | Statistics | Gambler ruin, Expectation, Recursion, Random walk | Medium | D.E. Shaw | |
CI of flipping heads | DS PDS AS | Statistics | Confidence Interval, CLT, Bernoulli trials | Medium | ||
Confidence interval definition | DS PDS AS | Statistics | Confidence interval, Hypothesis testing | Easy | Meta | |
Confidence intervals that overlap | DS PDS AS | Statistics | Confidence interval, Hypothesis testing | Medium | ||
Covariance of dependent variables | DS PDS AS | Statistics | Variance, Uniform, Covariance, Expectation | Medium | ||
Distribution of a CDF | DS PDS AS | Statistics | CDF, Inverse transform | Medium | ||
Dynamic coin flips | DS PDS AS | Statistics | Expectation, Simulation | Hard | Cruise | |
Expected number of consecutive heads | DS PDS AS | Statistics | Expectation | Medium | ||
Number of draws to get greater than 1 | DS PDS AS | Statistics | Normal, Geometric, CDF, Expectation | Medium | ||
P-value definition | DS PDS AS | Statistics | P-value, Hypothesis testing | Easy | Google Meta Pinterest Snap | |
Tests for normality | DS PDS AS | Statistics | Hypothesis testing, Normality | Easy | Duolingo Snap |