Strong Aggregating Algorithm

Main.StrongAggregatingAlgorithm History

Hide minor edits - Show changes to output

Changed line 22 from:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm can achieve
to:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm ([[Aggregating Algorithm Regression]]) can achieve
Changed line 22 from:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the squre-loss function Aggregating Algorithm can achieve
to:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm can achieve
Changed line 7 from:
--> Calculate the ''generalized prediction'' {$g_t(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma_t^k)}w_k, \forall \omega \in \Omega$}.
to:
--> Calculate the ''generalized prediction'' {$g_t(\omega)=-\frac{1}{\eta}\ln\sum_{k=1}^K w_k e^{-\eta \lambda(\omega,\gamma_t^k)}, \forall \omega \in \Omega$}.
July 11, 2008, at 09:16 PM by Zhdanov - insert picture
Changed lines 13-14 from:
-> END FOR.
to:
-> END FOR.
%rfloat rframe height=420px% Attach:subs.png | '''Figure 1. Aggregating algorithm in case of two possible outcomes, [[<<]] two experts, and square-loss function'''
.
June 12, 2008, at 12:02 PM by Zhdanov - add reference to (Vovk 2001)
Changed lines 33-34 from:
* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.
* Vladimir Vovk.  A game of prediction with expert advice.  ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.
to:
* [[Profiles.Vovk|Vladimir Vovk]]. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.
* [[Profiles.Vovk|Vladimir Vovk]].  A game of prediction with expert advice.  ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.
* [[Profiles.Vovk|Vladimir Vovk]]. Competitive on-line statistics. ''International Statistical Review'' 69, 213&#8211;248 (2001)
.
May 05, 2008, at 07:06 PM by Zhdanov - mixable
Changed line 19 from:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials.  For the ''mixable'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}.  In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.
to:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials.  For the ''[[mixable]]'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}.  In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.
Changed line 25 from:
where {$L_T(\theta)$} is the loss of the best linear function of {$x$}.
to:
where {$L_T(\theta)$} is the loss of any linear function of {$x$}.
Changed line 21 from:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve
to:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the squre-loss function Aggregating Algorithm can achieve
Changed line 21 from:
For the [[on-line linear regresssion]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve
to:
For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve
Changed lines 23-25 from:
{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$}.
to:
{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$},

where {$L_T(\theta)$} is the loss of the best linear function of {$x
$}.
April 29, 2008, at 04:54 PM by Zhdanov - on-line linear regression
Changed lines 1-2 from:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]].  It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'.  In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):
to:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]].  It usually enjoys the strongest theoretical performance guarantees, although its computational efficiency may not be as good as some of its competitors'.  In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):
Added lines 20-23:

For the [[on-line linear regresssion]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve

{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$}.
Changed lines 6-12 from:
--> Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.
--> Calculate the ''generalized prediction'' {$g(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.
--> Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is the chosen [[substitution function]].
--> Reality announces {$\omega\in\Omega$}.
--> {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.
--> {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.
--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
to:
--> Experts announce predictions {$\gamma_t^k \in \Gamma, k=1,\ldots,K$}.
--> Calculate the ''generalized prediction'' {$g_t(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma_t^k)}w_k, \forall \omega \in \Omega$}.
--> Algorithm announces {$\gamma_t := S(g_t) \in \Gamma$}, where {$S$} is the chosen [[substitution function]].
--> Reality announces {$\omega_t\in\Omega$}.
--> {$L_t:=L_{t-1}+\lambda(\omega_t,\gamma_t)$}.
--> {$L_t^k:=L_{t-1}^k+\lambda(\omega_t,\gamma_t^k), k=1,\ldots,K$}.
--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega_t,\gamma_t^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
April 23, 2008, at 04:54 PM by Vovk - small corrections
Changed line 7 from:
--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.
to:
--> Calculate the ''generalized prediction'' {$g(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.
Changed line 12 from:
--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
to:
--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
Changed lines 15-16 from:
In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm.  The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction.  It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it.  The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s
to:
In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm.  The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction.  It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it.  The substitution function {$S$} is such a function that for {$\gamma=S(g)$} we have {$\lambda(\omega,\gamma) \le c(\eta) g(\omega)$} for the smallest possible {$c(\eta)\ge1$} and for all {$\omega\in\Omega$}. For various [[loss function]]s it can be shown that
Changed lines 19-20 from:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.
to:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials.  For the ''mixable'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}.  In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.
Changed line 27 from:
* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.
to:
* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.
Changed line 7 from:
--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$.
to:
--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.
Changed lines 1-2 from:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]].  It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'.  In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme:
to:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]].  It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'.  In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):
Changed lines 6-17 from:

# Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is a [[loss function]] chosen for the current game.
# Algorithm announces
{$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is a [[substitution function]].
# Reality announces
{$\omega\in\Omega$}.
# {$
L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.
# {$L_t^
k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.
# Update weights
{$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

END FOR.

Here {$
\eta$} is a learning rate parameter. The protocol above means, that Strong Aggregating Algorithm takes losses on predictions of experts, transfer them into exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, then mix it using weights of experts, transfer back from exponential space, and receive the so-called generalized prediction this way. It is clear that the generalized prediction is not always a real prediction, and one should use the [[substitution function]] to make it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s
to:
--> Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.
--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$.
--> Algorithm announces
{$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is the chosen [[substitution function]].
--> Reality announces
{$\omega\in\Omega$}.
-->
{$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.
-->
{$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.
--> Update the weights {$w_
k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
-> END FOR.

In this description,
{$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm.  The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction.  It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s
Changed lines 1-5 from:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound, proven for this moment. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:
# Initialize weights of experts {$w_k
:=1/K$}, {$k=1,\ldots,K$}
# Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.

FOR {$t=1,2,\dots$}
to:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]].  It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'.  In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme:

->Initialize weights of experts
{$w_k:=1/K$}, {$k=1,\ldots,K$}.
->Initialize
cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.
->FOR {$t=1,2,\dots$}
Changed line 8 from:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} is a [[loss function]] chosen for the current game.
to:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is a [[loss function]] chosen for the current game.
Changed line 8 from:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} ia s loss function chosen for the current game.
to:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} is a [[loss function]] chosen for the current game.
Changed line 8 from:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}
to:
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} ia s loss function chosen for the current game.
Changed line 1 from:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:
to:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound, proven for this moment. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:
Changed lines 1-2 from:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. It makes a prediction by the following scheme:
# Initialize weights
{$w_k:=1/K$}, {$k=1,\ldots,K$}
to:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:
# Initialize weights of experts
{$w_k:=1/K$}, {$k=1,\ldots,K$}
Changed line 9 from:
# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is a [[substitution function]].
to:
# Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is a [[substitution function]].
Changed line 13 from:
# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}.
to:
# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
Added line 3:
# Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.
April 20, 2008, at 12:53 PM by Fedor Zhdanov -
Changed lines 28-29 from:
* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.
* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'', '''56''', 153-173 (1998).
to:
* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.
* Vladimir Vovk.  A game of prediction with expert advice.  ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.
April 20, 2008, at 12:49 PM by Fedor Zhdanov -
Changed line 20 from:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss.
to:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.
April 20, 2008, at 12:48 PM by Fedor Zhdanov -
Changed lines 20-21 from:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/\eta$}. In particular, {$\eta=2$} for the square-loss, {$\eta=1$} for the log-loss.
to:
for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss.
Changed lines 28-29 from:
* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371&#8211;383. San Mateo, CA: Morgan Kaufmann, 1990.
* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).
to:
* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.
* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'', '''56''', 153-173 (1998).
April 20, 2008, at 12:44 PM by Fedor Zhdanov -
Changed line 8 from:
# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is the [[substitution function]].
to:
# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is a [[substitution function]].
Changed lines 25-29 from:
* [[Universal Portfolios Algorithm]]
to:
* [[Universal Portfolios Algorithm]]

!!!Bibliography
* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371&#8211;383. San Mateo, CA: Morgan Kaufmann, 1990.
* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).
April 20, 2008, at 12:37 PM by Fedor Zhdanov -
Added lines 1-21:
Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. It makes a prediction by the following scheme:
# Initialize weights {$w_k:=1/K$}, {$k=1,\ldots,K$}

FOR {$t=1,2,\dots$}

# Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.
# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}
# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is the [[substitution function]].
# Reality announces {$\omega\in\Omega$}.
# {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.
# {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.
# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}.

END FOR.

Here {$\eta$} is a learning rate parameter. The protocol above means, that Strong Aggregating Algorithm takes losses on predictions of experts, transfer them into exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, then mix it using weights of experts, transfer back from exponential space, and receive the so-called generalized prediction this way. It is clear that the generalized prediction is not always a real prediction, and one should use the [[substitution function]] to make it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s

{$L_T \le c(\eta)L_T^k + a(\eta)\ln K$}

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/\eta$}. In particular, {$\eta=2$} for the square-loss, {$\eta=1$} for the log-loss.

April 13, 2008, at 06:57 PM by Volodya Vovk -
Added lines 1-4:
Important special cases:
* [[Bayesian Merging Scheme]]
* [[Weighted Majority Algorithm]]
* [[Universal Portfolios Algorithm]]