Minimum remove to make valid parentheses (Leetcode 1249)
Gradient descent vs. stochastic gradient descent
\[ Error = \sigma^2 + Bias^2(\hat{f}(x)) + Var(\hat{f}(x)) \]
where:
\[ \begin{align} Bias^2(\hat{f}(x)) &= E[\hat{f}(x) - f(x)] \\ Var(\hat{f}(x)) &= E \bigg( \big ( E[\hat{f}(x)] - \hat{f}(x)\big )^2 \bigg ) \\ \sigma^2 &= E[(y - f(x))^2] \end{align} \]
(\(\sigma^2\) is the variance of the unobserved error (hence noise)). High-bias models tend (are likely) to underfit; high-variance models tend (are likely) to overfit. Trying to lower the bias typically increases variance and vice a versa, hence the trade off.
Â
Bias-variance tradeoff, Formula derivation