Ridge Regression

Online Ridge Regression is the application of the regularized Least Squares method to the online linear regression setting.

The algorithm works as follows (all the vectors are columns):

Initialize vector $b=0$, and matrix $A = aI$;
FOR $t=1,2,\dots$
Read new $x_t \in \mathbb{R}^n$
Output prediction $\gamma_t = b'A^{-1}x_t$
$A = A + x_t x_t'$
Read new $y_t \mathbb{R}$
$b = b+y_tx_t$
END FOR.

The Aggregating Algorithm Regression (AAR) almost repeats RR, with the difference that in AAR the matrix $A$ is updated before the prediction has made. Upper bounds for the the square loss of Online Ridge Regression are proven by Azoury and Warmuth, 2001, Theorem 4.6, and then in the form of equality in Zhdanov and Vovk, 2010 (see also Zhdanov and Kalnishkan, 2010, for the kernelized version). The upper bound is

$$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + 4nY^2\ln(TB^2\slash{}a+1),$$

where $L_T(\theta)$ is the loss of any linear function of $x$. The third term in the bound is 4 times larger than the one of the bound for AAR.

After $T$ steps Ridge Regression minimizes the regularized error: its predictor $\theta$ achieves $\min_\theta \left(\sum_{t=1}^T (\theta' x_t - y_t)^2 + a\|\theta\|^2 \right)$, where $a \in \mathbb{R}$.

See also http://en.wikipedia.org/wiki/Ridge_regression.

  • Arthur E. Hoerl and Robert W. Kennard. Ridge Regression: biased estimation for nonorthogonal problems. Technometrics, 42:8086, 2000.
  • Katy S. Azoury and Manfred K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43:211246, 2001.
  • Fedor Zhdanov and Vladimir Vovk. Competing with Gaussian linear experts. Technical report, arXiv:0910.4683 [cs.LG], arXiv.org e-Print archive, 2009.
  • Fedor Zhdanov and Yuri Kalnishkan. An identity for kernel ridge regression. In Proceedings of the 21st International Conference on Algorithmic Learning Theory, 2010.