Gradient Descent

Gradient Descent is an algorithm for online linear regression. In the simplest case, the algorithm predicts a sequence of numbers ⚠ $ y_t\in[0,1] $. The prediction space is ⚠ $ [0,1] $. The master's prediction is the weighted average of the input vectors ⚠ $x_t$. At each step the weights are updated by the rule, which depends on the gradient of a loss function: ⚠ $w^k_{t+1} := w^k_t - \eta \lambda_\gamma^{'}(\omega_t,\gamma_{t}) \gamma^k_t$.

For the bounded signal ⚠ $x: ||x_t||_2 \le B$ and convex twice differentiable by the second argument loss function the Gradient Descent algorithm can achieve

⚠ $L_T \le L_T(\theta) + BLU\sqrt{T}$,

where ⚠ $L_T(\theta)$ is the loss of any linear function of ⚠ $x$, and ⚠ $L$ is a bound for the derivative of the loss function: ⚠ $ |\nabla_\gamma\lambda(\omega_t,\gamma_{t})| \le L$. The best ⚠ $\eta=U / (BL\sqrt{T})$. Here ⚠ $U$ us the norm of the experts: ⚠ $ ||\theta||_2\le U$

The Gradient Descent algorithm was intestigated for this problem by Cesa-Bianchi et al. (1994), and is also known as Least Mean Squares algorithm for the square-loss function.

Bibliography

  • N. Cesa-Bianchi, P.M. Long, and M.K. Warmuth: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.