# Gradient Descent

Gradient Descent is an algorithm for online linear regression. In the simplest case, the algorithm predicts a sequence of numbers `⚠ $ y_t\in[0,1] $`

. The prediction space is `⚠ $ [0,1] $`

. The master's prediction is the weighted average of the input vectors `⚠ $x_t$`

. At each step the weights are updated by the rule, which depends on the gradient of a loss function: `⚠ $w^k_{t+1} := w^k_t - \eta \lambda_\gamma^{'}(\omega_t,\gamma_{t}) \gamma^k_t$`

.

For the bounded signal `⚠ $x: ||x_t||_2 \le B$`

and convex twice differentiable by the second argument loss function the Gradient Descent algorithm can achieve

`⚠ $L_T \le L_T(\theta) + BLU\sqrt{T}$`

,

where `⚠ $L_T(\theta)$`

is the loss of any linear function of `⚠ $x$`

, and `⚠ $L$`

is a bound for the derivative of the loss function: `⚠ $ |\nabla_\gamma\lambda(\omega_t,\gamma_{t})| \le L$`

. The best `⚠ $\eta=U / (BL\sqrt{T})$`

. Here `⚠ $U$`

us the norm of the experts: `⚠ $ ||\theta||_2\le U$`

The Gradient Descent algorithm was intestigated for this problem by Cesa-Bianchi et al. (1994), and is also known as Least Mean Squares algorithm for the square-loss function.

### Bibliography

- N. Cesa-Bianchi, P.M. Long, and M.K. Warmuth: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent.
*IEEE Transactions on Neural Networks*, 7(3):604-619, 1996.