Gradient Descent is an algorithm for online linear regression. In the simplest case, the algorithm predicts a sequence of numbers $y_t\in[0,1]$. The prediction space is $[0,1]$. The master's prediction is the weighted average of the input vectors $x_t$. At each step the weights are updated by the rule, which depends on the gradient of a loss function: $w^k_{t+1} := w^k_t - \eta \lambda_\gamma^{'}(\omega_t,\gamma_{t}) \gamma^k_t$.

For the bounded signal $x: ||x_t||_2 \le B$ and convex twice differentiable by the second argument loss function the Gradient Descent algorithm can achieve

$L_T \le L_T(\theta) + BLU\sqrt{T}$,

where $L_T(\theta)$ is the loss of any linear function of $x$, and $L$ is a bound for the derivative of the loss function: $|\nabla_\gamma\lambda(\omega_t,\gamma_{t})| \le L$. The best $\eta=U / (BL\sqrt{T})$. Here $U$ us the norm of the experts: $||\theta||_2\le U$

The Gradient Descent algorithm was intestigated for this problem by Cesa-Bianchi et al. (1994), and is also known as Least Mean Squares algorithm for the square-loss function.

### Bibliography

• N. Cesa-Bianchi, P.M. Long, and M.K. Warmuth: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.