Role-specific and topic-specific questions with answers

Easy, medium, and hard questions that cover most topics in machine learning and data science interviews. Solutions that deep dive into explaining complicated concepts, with necessary references and simulations when needed.

mockup
Problem Role Area Topics Difficulty Status Overall rating
Normality assumption PDS DS AS A/B Testing Normality Medium
Reducing variance in AB testing PDS DS AS A/B Testing Variance Medium
Sample Ratio Mismatch PDS DS AS A/B Testing Sample Ratio Mismatch Medium
Simpson’s paradox PDS DS AS A/B Testing Simpson’s paradox Medium
Questions worksheet (recruiter-screen) PDS DS AS MLE Application process Discussion prep, Application prep Medium
Referrals vs. online applications PDS DS AS MLE Application process Application prep Easy
Response time PDS DS AS MLE Application process Application prep Easy
Timeline: from application to offer PDS DS AS MLE Application process Application prep Easy
Questions worksheet (behavioral) PDS DS AS MLE Behavioral Discussion prep Hard
Climbing stairs (Leetcode 70) DS AS MLE Data Structures and Algorithms Recursion, Dynamic programming Easy
Find if Path Exists in Graph (Leetcode 1971) DS AS MLE Data Structures and Algorithms DFS, BFS Medium
K Closest Points to Origin (Leetcode 973) DS AS MLE Data Structures and Algorithms Heap Easy
Longest substring without repeating characters (Leetcode 3) DS AS MLE Data Structures and Algorithms Hash, Sliding Window Medium
Minimum remove to make valid parentheses (Leetcode 1249) DS AS MLE Data Structures and Algorithms Stack Medium
Number of recent calls (Leetcode 933) DS AS MLE Data Structures and Algorithms Queue, DEQueue Easy
Remove duplicates in place (Leetcode 26) DS AS MLE Data Structures and Algorithms Array, Two pointers Easy
Search in Binary Search Tree (Leetcode 700) DS AS MLE Data Structures and Algorithms Binary Search, Binary Search Tree Easy
Sort an Array (Leetcode 912) DS AS MLE Data Structures and Algorithms Recursion, Sorting Medium
Analyze prediction error DS AS MLE Machine Learning Prediction error, Bias-variance tradeoff, Diagnostics, Learning curves Medium
Bias-variance equation DS AS MLE Machine Learning Bias-variance tradeoff, Formula derivation Easy
Linear regression with gradient descent DS AS MLE Machine Learning Coding Gradient descent, Linear regression Medium
Sample from random generator DS AS MLE Machine Learning Coding Sample, Uniform, Random number generator Medium
Gradient descent vs. stochastic gradient descent DS AS MLE Machine Learning Gradient descent, Stochastic gradient descent, Minibatch Medium
Imbalanced dataset PDS DS AS MLE Machine Learning Class imbalance Medium
k-means PDS DS AS MLE Machine Learning k-means Medium
L1 (Lasso) vs. L2 (Ridge) regularization DS AS MLE Machine Learning L1, L2, Redularization, Lasso, Ridge Medium
Linear regression likelihood function DS AS MLE Machine Learning Linear regression, Likelihood function, Formula derivation Medium
Model interpretability DS AS MLE Machine Learning Interpretability Easy
Multiclass evaluation metrics DS AS MLE Machine Learning Multiclass, Diagnostics Medium
NDCG vs. Mean Average Precision (MAP) vs. Mean Reciprocal Rank (MRR) AS MLE Machine Learning Ranking metrics, NDCG, Mean average precision (MAP), Mean Reciprocal Rank (MRR), Recommender Systems Medium
ROC vs. PR curve AS MLE Machine Learning AUC, ROC, AUPR, Precision, Recall, Evaluation metrics Medium
Measuring counterfactual impact PDS DS Metrics Problem solving, Counterfactual Hard
Sudden drop in user engagement PDS DS Metrics Problem solving, Root-cause analysis Medium
Documents edited by AI PDS DS AS Probability theory Bayes rule, Conditional probability Easy
Monty Hall PDS DS AS Probability theory Bayes rule, Conditional independence, Prior evidence Medium
Red and blue balls PDS DS AS Probability theory Counting, Combinations, Repetition, Binomial Easy
Trailing by two: should we go for two or three? PDS DS AS Probability theory Independence, Decision making Easy
Two children I PDS DS AS Probability theory Prior evidence Easy
Artist maxranks PDS DS SQL Join Easy
Histogram of songs PDS DS SQL Recursive CTE, Join Hard
Songs that did not enter the charts or entered high PDS DS SQL Subquery, Join Medium
Songs that stay in the chars for a while PDS DS SQL Subquery, CTE, Join, ALL, Window functions Medium
Gambler’s ruin win probability PDS DS AS Statistics Gambler ruin, Random walk, Expectation Medium
Manual estimation of flips PDS DS AS Statistics Normal, CDF, Binomial, CLT Medium
Measuring sticks PDS DS AS Statistics Variance Medium
Monotonic draws PDS DS AS Statistics Expectation Hard
Relationship between p-val and confidence interval PDS DS AS Statistics Confidence interval, P-value, Hypothesis testing Easy
Two-sample t-test PDS DS AS Statistics Hypothesis testing Easy
Choose a project PDS DS AS MLE Technical deep dive Discussion prep Hard
Questions worksheet (deep dive) PDS DS AS MLE Technical deep dive Discussion prep Hard
   Interview success probability PDS DS AS MLE Application process Application prep Easy
   Offer validity and cooling-off periods PDS DS AS MLE Application process Application prep Easy
   Resume review PDS DS AS MLE Application process Application prep, Resume Easy
   Up-level during interview PDS DS AS MLE Application process Application prep Easy
   AUC ROC and predicted output transformations DS AS MLE Machine Learning AUC ROC Easy
   Simulate dynamic coin flips DS AS MLE Machine Learning Coding Simulation Easy
   Encoding categorical features PDS DS AS MLE Machine Learning Categorical features, Embeddings, One-hot encoding, Hashing Easy
   Example of high-bias and high variance DS AS MLE Machine Learning Bias-variance tradeoff Medium
   Range of R2 when combining regressions PDS DS AS MLE Machine Learning Linear regression, Goodness of fit, R-squared, Correlation Medium
   Games between two players PDS DS AS Probability theory Recursive relationship Medium
   Sum of normally distributed random variables PDS DS AS Probability theory Normal, PDF, CDF Medium
   Unfair coin probability PDS DS AS Probability theory Bayes rule, Conditional probability Easy
   Choose house or techno PDS DS SQL Logical OR Easy
   Expensive house songs I PDS DS SQL Subquery, CTE, Join Medium
   Expensive house songs II PDS DS SQL Subquery, CTE, Join, Window functions Hard
   Songs that ranked 1 to 50 PDS DS SQL Between Easy
   Biased coin PDS DS AS Statistics Expectation, CLT, Binomial, Normal, Bernoulli, Hypothesis testing, CDF Medium
   Prussian horses PDS DS AS Statistics Poisson, Hypothesis testing, CDF Medium
   AA tests PDS DS AS A/B Testing Variance Easy
   Counterfactual definition PDS DS AS A/B Testing Counterfactual Easy
   Equal-sized treatment and control groups PDS DS AS A/B Testing Power, Variance, Sample size Medium
   False discovery control PDS DS AS A/B Testing False discovery rate, Multiple hypotheses testing, Benjamini & Hochberg, Bonferroni Easy
   Interference PDS DS AS A/B Testing Interference Easy
   Multi-armed and contextual bandits in AB testing PDS DS AS A/B Testing Contextual bandits, Multi-armed bandits Medium
   Novelty and primacy effects PDS DS AS A/B Testing Novelty effects, Primacy effects Easy
   Randomization checks PDS DS AS A/B Testing Randomization Easy
   Randomization level PDS DS AS A/B Testing Randomization, Variance Medium
   Activation functions AS MLE Machine Learning Neural networks, Deep learning Easy
   Active learning AS MLE Machine Learning Labels, Label sampling Medium
   Adagrad vs. RMSProp vs. Adam AS MLE Machine Learning Neural networks, Deep learning, Optimization Hard
   Adaptive learning rate AS MLE Machine Learning Neural networks, Deep learning, Optimization Hard
   Approximate nearest neighbors AS MLE Machine Learning ANN, ANNOY, Nearest neighbors Medium
   Attention (intuition) AS MLE Machine Learning Neural networks, Deep learning, Transformers, LLMs Easy
   Baselines AS MLE DS Machine Learning Model evaluation Easy
   Bayesian Frequentist statistics AS MLE Machine Learning Bayesian, Frequentist Easy
   Bias-variance biased estimator DS AS MLE Machine Learning Bias-variance tradeoff Medium
   Bootstrap PDS DS AS MLE Machine Learning Bootstrap Easy
   K-means from scratch DS AS MLE Machine Learning Coding k-means Medium
   Linear regression with stochastic gradient descent DS AS MLE Machine Learning Coding Stochastic Gradient descent, Linear regression Medium
   Logistic regression with gradient descent DS AS MLE Machine Learning Coding Gradient descent, Logistic regression Medium
   Naive Bayes from scratch DS AS MLE Machine Learning Coding Gaussian Naive Bayes Medium
   Neural network implementation AS MLE Machine Learning Coding Gradient descent, Neural networks, Neuron Hard
   Principal Component Analysis (PCA) from scratch DS AS MLE Machine Learning Coding Principal Component Analysis (PCA) Medium
   Common causes of data leakage DS AS MLE Machine Learning Data leakage Medium
   Comparing decision trees with random forests DS AS MLE Machine Learning Decision trees, Random forests Easy
   Correlation with binary variables DS AS MLE Machine Learning Correlation, Hypothesis testing, Point-biserial correlation coefficient Easy
   Cross validation PDS DS AS MLE Machine Learning Cross validation, Offline evaluation Easy
   Decide between a multinomial vs. a binary modeling approach AS MLE Machine Learning Modeling, Multinomial, Binary Easy
   Discretization drawbacks DS AS MLE Machine Learning Categorical variables, Discretization Easy
   Ensembles AS MLE DS Machine Learning Ensembles, Numerical example Medium
   Examples of encoder and decoder models AS MLE Machine Learning LLMs, Transformers, Encoder, Decoder Easy
   Exponentially weighted moving average AS MLE Machine Learning Exponentially weighted moving average, Formula derivation, Proof Medium
   Feature crossing AS MLE Machine Learning Feature engineering, Deep learning Easy
   Feature engineering in the era of deep learning DS AS MLE Machine Learning Feature engineering Easy
   Gini impurity vs. information gain AS MLE Machine Learning Decision tree, Information Gain, Gini impurity Medium
   Gradient boosting vs. random forests DS AS MLE Machine Learning Gradient boosting, Random forests, Bagging, Boosting Medium
   Gradient descent vs. Stochastic Gradient descent and local minima AS MLE Machine Learning Gradient descent (GD), Stochastic gradient descent (SGD), Local minima, Optimization, Deep learning Medium
   Gradient descent vs. Stochastic Gradient descent and learning rate AS MLE Machine Learning Gradient descent (GD), Stochastic gradient descent (SGD), Learning rate, Optimization, Deep learning Medium
   How to get more labels AS MLE Machine Learning Modeling, Label encoding Medium
   Hypothesis testing in regression coefficients DS AS MLE Machine Learning Linear regression, Hypothesis testing Medium
   Information gain in decision trees AS MLE Machine Learning Decision tree, Entroy, Information Gain, Formula derivation Medium
   Intercept PDS DS AS MLE Machine Learning Linear regression, Intercept Easy
   Interpretability PDS DS AS MLE Machine Learning ML interpretability Easy
   L2 regularization vs. weight decay AS MLE Machine Learning Neural networks, Deep learning, Regularization Hard
   Linear regression assumptions PDS DS AS MLE Machine Learning Linear regression Easy
   Linear regression with duplicated rows DS AS MLE Machine Learning Linear regression, Statistical significance Easy
   Linear regression with stochastic gradient descent (formula derivation) AS MLE Machine Learning Linear regression, Stochastic gradient descent, Formula derivation Medium
   Logistic regression and standardization DS AS MLE Machine Learning Logistic regression, Standardization Easy
   Logistic regression assumptions PDS DS AS MLE Machine Learning Logistic regression Easy
   Minimization of loss function intuition AS MLE Machine Learning Neural networks, Deep learning, Optimization Easy
   Missing data PDS DS AS MLE Machine Learning Missing data Easy
   Momentum AS MLE Machine Learning Neural networks, Deep learning, Optimization Hard
   MSE vs. MAE PDS DS AS MLE Machine Learning MSE, MAE Easy
   Multi-headed attention and self attention AS MLE Machine Learning Neural networks, Deep learning, Transformers, LLMs Medium
   Multicollinearity PDS DS AS MLE Machine Learning Multicollinearity, Linear regression Medium
   Negative sampling AS MLE Machine Learning Neural networks, Deep learning, Negative sampling Medium
   Neuaral networks in layman terms AS MLE Machine Learning Neural networks, Deep learning Easy
   Non-probability sampling DS AS MLE Machine Learning Sampling, Non-probability Easy
   Normalization in neural networks AS MLE Machine Learning Neural networks, Deep learning, Batch normalization, Layer normalization Medium
   Normalization vs. Standardization PDS DS AS MLE Machine Learning Linear regression, Standardization, Normalization Easy
   Not enough data to train a model DS AS MLE Machine Learning Data limitations Easy
   Optimize multiple objectives AS MLE Machine Learning Modeling, Multiple objectives Easy
   Outliers PDS DS AS MLE Machine Learning Outliers, Cook’s distance, Regularization Easy
   Overfitting in neural networks AS MLE Machine Learning Neural networks, Deep learning, Overfitting Medium
   Positional embeddings AS MLE Machine Learning Feature engineering, Deep learning, Transformers, LLMs, Positional embeddings, Positional encodings Hard
   Principal Component Analysis (PCA) PDS DS AS MLE Machine Learning PCA Easy
   Prove that a median minizes MAE AS MLE Machine Learning MAE, Median, Formula derivation, Proof Hard
   Random forest feature importance AS MLE Machine Learning Feature importance, Explainability, Gini importance, Permutation importance Medium
   Random vs. stratified sampling PDS DS AS MLE Machine Learning Sampling, Stratified sampling Easy
   Self-supervised learning AS MLE Machine Learning Neural networks, Deep learning, Contrastive learning Easy
   SMOTE AS MLE Machine Learning Imbalanced classification, SMOTE, Data augmentation Easy
   Transfer learning AS MLE Machine Learning Neural networks, Deep learning, Transformers, LLMs, Catastrophic forgetting Easy
   Transfer learning vs. knowledge distillation AS MLE Machine Learning LLM, Deep learning, Transfer learning, Knowledge distillation Medium
   Vanishing and exploding gradients (mathematical explaination) AS MLE Machine Learning Neural networks, Deep learning, Mathematical explaination Hard
   Weight initialization AS MLE Machine Learning Neural networks, Deep learning Medium
   Weighted and importance sampling DS AS MLE Machine Learning Sampling, Weighted sampling, Importance sampling Easy
   Characteristics of metrics PDS DS Metrics Characteristics of metrics Easy
   Types of metrics PDS DS Metrics Types of metrics Easy
   Consecutive tails PDS DS AS Probability theory Permutations, Repetition Easy
   Largest number rolled PDS DS AS Probability theory Counting, Permutations, Repetition Medium
   Median probability PDS DS AS Probability theory Binomial, Uniform, CDF Medium
   Number of emails PDS DS AS Probability theory Poisson distribution Easy
   Paths to destination PDS DS AS Probability theory Counting, Combinations Easy
   Repeated rolls until 4 PDS DS AS Probability theory Geometric distribution Easy
   Sample digits 1-10 PDS DS AS Probability theory Sample from samples, Uniform Medium
   Two fair die rolls PDS DS AS Probability theory Independence, CDF, PMF Easy
   Artists with more songs than others PDS DS SQL Subquery, CTE, Join, Window functions Hard
   Concat columns PDS DS SQL Concat Easy
   Label recent songs PDS DS SQL Case Easy
   Median songs per artist PDS DS SQL CTE, Window functions Hard
   Songs in charts with greater durations PDS DS SQL Subquery, CTE, Join, Window functions Hard
   Songs per genre PDS DS SQL Group by Easy
   Songs with letters PDS DS SQL Regexp Easy
   An intuitive way to write power PDS DS AS Statistics Power, Hypothesis testing Easy
   Buy and sell stocks PDS DS AS Statistics Gambler ruin, Expectation, Recursion, Random walk Medium
   CI of flipping heads PDS DS AS Statistics Confidence Interval, CLT, Bernoulli trials Medium
   Confidence interval definition PDS DS AS Statistics Confidence interval, Hypothesis testing Easy
   Confidence intervals that overlap PDS DS AS Statistics Confidence interval, Hypothesis testing Medium
   Covariance of dependent variables PDS DS AS Statistics Variance, Uniform, Covariance, Expectation Medium
   Distribution of a CDF PDS DS AS Statistics CDF, Inverse transform Medium
   Dynamic coin flips PDS DS AS Statistics Expectation, Simulation Hard
   Expected number of consecutive heads PDS DS AS Statistics Expectation Medium
   Number of draws to get greater than 1 PDS DS AS Statistics Normal, Geometric, CDF, Expectation Medium
   P-value definition PDS DS AS Statistics P-value, Hypothesis testing Easy
   Tests for normality PDS DS AS Statistics Hypothesis testing, Normality Easy