# Strong Aggregating Algorithm

## Main.StrongAggregatingAlgorithm History

Hide minor edits - Show changes to output

Changed line 22 from:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm can achieve

to:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm ([[Aggregating Algorithm Regression]]) can achieve

Changed line 22 from:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the ~~squre~~-loss function Aggregating Algorithm can achieve

to:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the square-loss function Aggregating Algorithm can achieve

Changed line 7 from:

--> Calculate the ''generalized prediction'' {$g_t(\omega)=\~~log_~~{~~\eta~~}~~ ~~\sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma_t^k)}~~w_k~~, \forall \omega \in \Omega$}.

to:

--> Calculate the ''generalized prediction'' {$g_t(\omega)=-\frac{1}{\eta}\ln\sum_{k=1}^K w_k e^{-\eta \lambda(\omega,\gamma_t^k)}, \forall \omega \in \Omega$}.

July 11, 2008, at 09:16 PM
by - insert picture

Changed lines 13-14 from:

-> END FOR.

to:

-> END FOR.

%rfloat rframe height=420px% Attach:subs.png | '''Figure 1. Aggregating algorithm in case of two possible outcomes, [[<<]] two experts, and square-loss function'''.

%rfloat rframe height=420px% Attach:subs.png | '''Figure 1. Aggregating algorithm in case of two possible outcomes, [[<<]] two experts, and square-loss function'''.

June 12, 2008, at 12:02 PM
by - add reference to (Vovk 2001)

Changed lines 33-34 from:

* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.

* Vladimir Vovk. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

* Vladimir Vovk. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

to:

* [[Profiles.Vovk|Vladimir Vovk]]. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.

* [[Profiles.Vovk|Vladimir Vovk]]. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

* [[Profiles.Vovk|Vladimir Vovk]]. Competitive on-line statistics. ''International Statistical Review'' 69, 213–248 (2001).

* [[Profiles.Vovk|Vladimir Vovk]]. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

* [[Profiles.Vovk|Vladimir Vovk]]. Competitive on-line statistics. ''International Statistical Review'' 69, 213–248 (2001).

May 05, 2008, at 07:06 PM
by - mixable

Changed line 19 from:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the ''mixable'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}. In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.

to:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the ''[[mixable]]'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}. In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.

Changed line 25 from:

where {$L_T(\theta)$} is the loss of ~~the best~~ linear function of {$x$}.

to:

where {$L_T(\theta)$} is the loss of any linear function of {$x$}.

Changed line 21 from:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve

to:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that for the squre-loss function Aggregating Algorithm can achieve

Changed line 21 from:

For the [[~~on-line~~ linear ~~regresssion~~]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve

to:

For the [[online linear regression]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve

Changed lines 23-25 from:

{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$}.

to:

{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$},

where {$L_T(\theta)$} is the loss of the best linear function of {$x$}.

where {$L_T(\theta)$} is the loss of the best linear function of {$x$}.

April 29, 2008, at 04:54 PM
by - on-line linear regression

Changed lines 1-2 from:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]. It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'. In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):

to:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]. It usually enjoys the strongest theoretical performance guarantees, although its computational efficiency may not be as good as some of its competitors'. In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):

Added lines 20-23:

For the [[on-line linear regresssion]] problem with bounded signal {$x: ||x_t||_\infty \le B$}, Vovk (2001) proves that the Aggregating Algorithm can achieve

{$L_T \le \inf_\theta (L_T(\theta)+a||\theta||^2_2) + nY^2\ln(TB^2/a+1)$}.

Changed lines 6-12 from:

--> Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.

--> Calculate the ''generalized prediction'' {$g(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.

--> Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S~~(g)~~$} is the chosen [[substitution function]].

--> Reality announces {$\omega\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

--> Calculate the ''generalized prediction'' {$g(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.

--> Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S

--> Reality announces {$\omega\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

to:

--> Experts announce predictions {$\gamma_t^k \in \Gamma, k=1,\ldots,K$}.

--> Calculate the ''generalized prediction'' {$g_t(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma_t^k)}w_k, \forall \omega \in \Omega$}.

--> Algorithm announces {$\gamma_t := S(g_t) \in \Gamma$}, where {$S$} is the chosen [[substitution function]].

--> Reality announces {$\omega_t\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega_t,\gamma_t)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega_t,\gamma_t^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega_t,\gamma_t^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

--> Calculate the ''generalized prediction'' {$g_t(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma_t^k)}w_k, \forall \omega \in \Omega$}.

--> Algorithm announces {$\gamma_t := S(g_t) \in \Gamma$}, where {$S$} is the chosen [[substitution function]].

--> Reality announces {$\omega_t\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega_t,\gamma_t)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega_t,\gamma_t^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega_t,\gamma_t^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

April 23, 2008, at 04:54 PM
by - small corrections

Changed line 7 from:

--> Calculate ~~generalized~~ prediction {$g(\omega)=\log_\eta \sum~~\limits~~_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.

to:

--> Calculate the ''generalized prediction'' {$g(\omega)=\log_{\eta} \sum_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.

Changed line 12 from:

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum~~\limits~~_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

to:

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

Changed lines 15-16 from:

In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm. The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction. It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} ~~the ~~{$\lambda(\omega,\gamma) \le ~~g~~(\~~omega~~)~~$}~~ for ~~all ~~{$\~~omega ~~\~~in \Omega~~$}~~. For various [[loss~~ function]]s

to:

In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm. The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction. It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} we have {$\lambda(\omega,\gamma) \le c(\eta) g(\omega)$} for the smallest possible {$c(\eta)\ge1$} and for all {$\omega\in\Omega$}. For various [[loss function]]s it can be shown that

Changed lines 19-20 from:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by ~~master~~ over the first {$T$} trials~~,~~ and {$L_T(k)$} is the loss suffered by the {$k$}~~-~~th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}~~, in~~ particular {$\eta=2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.

to:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by the algorithm over the first {$T$} trials and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the ''mixable'' [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}. In particular, {$c(\eta)=1$} and {$a(\eta)=1/2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.

Changed line 27 from:

* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.

to:

* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371 - 383. San Mateo, CA: Morgan Kaufmann, 1990.

Changed line 7 from:

--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$.

to:

--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}.

Changed lines 1-2 from:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]. It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'. In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme:

to:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]. It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'. In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme (some explanations can be found after the description of the algorithm):

Changed lines 6-17 from:

# Algorithm announces

# Reality announces

# {$

# {$L_t^

# Update weights

END FOR.

Here {$

to:

--> Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.

--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$.

--> Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is the chosen [[substitution function]].

--> Reality announces {$\omega\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

-> END FOR.

In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm. The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction. It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s

--> Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$.

--> Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is the chosen [[substitution function]].

--> Reality announces {$\omega\in\Omega$}.

--> {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

--> {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

--> Update the weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$} and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

-> END FOR.

In this description, {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is the [[loss function]] chosen for the current game, and {$\eta$} is the ''learning rate'', a parameter of the algorithm. The protocol above means that Strong Aggregating Algorithm takes the losses suffered by the experts, transfers them into an exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, mixes them using the weights of experts, and transfers them back from the exponential space; this way we receive the so-called generalized prediction. It is clear that the generalized prediction is not always a real prediction, and one should apply the [[substitution function]] to it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s

Changed lines 1-5 from:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]~~, and for the case of finite number of experts has the strongest theoretical bound, proven for this moment. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:~~

# Initialize weights of experts {$w_k:~~=1/K$}, ~~{$k=1,~~\ldots,K~~$}

~~# Initialize ~~cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.

FOR {$t=1,2,\dots$}

# Initialize weights of experts {$w_k

FOR {$t=1,2,\dots$}

to:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]]. It usually enjoys the strongest performance guarantees, although its computational efficiency may not be as good as some of its competitors'. In the case of a finite number of experts {$K$}, it makes predictions according to the following scheme:

->Initialize weights of experts {$w_k:=1/K$}, {$k=1,\ldots,K$}.

->Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.

->FOR {$t=1,2,\dots$}

->Initialize weights of experts {$w_k:=1/K$}, {$k=1,\ldots,K$}.

->Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.

->FOR {$t=1,2,\dots$}

Changed line 8 from:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} is a [[loss function]] chosen for the current game.

to:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma): \Omega \times \Gamma \to \mathbb{R}$} is a [[loss function]] chosen for the current game.

Changed line 8 from:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} ~~ia s~~ loss function chosen for the current game.

to:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} is a [[loss function]] chosen for the current game.

Changed line 8 from:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}

to:

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}, where {$\lambda(\omega,\gamma)$} ia s loss function chosen for the current game.

Changed line 1 from:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:

to:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound, proven for this moment. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:

Changed lines 1-2 from:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. ~~It makes a prediction by the following scheme:~~

# Initialize weights {$w_k:=1/K$}, {$k=1,\ldots,K$}

# Initialize weights

to:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. In case we have finite number of experts {$ K $}, it makes a prediction by the following scheme:

# Initialize weights of experts {$w_k:=1/K$}, {$k=1,\ldots,K$}

# Initialize weights of experts {$w_k:=1/K$}, {$k=1,\ldots,K$}

Changed line 9 from:

# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is a [[substitution function]].

to:

# Algorithm announces {$\gamma = S(g) \in \Gamma$}, where {$S(g)$} is a [[substitution function]].

Changed line 13 from:

# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}.

to:

# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}, {$k=1,\ldots,K$}.

Added line 3:

# Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.

Changed lines 28-29 from:

* ~~V.~~ Vovk~~,~~ Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.

*~~V~~.~~Vovk,~~ A game of prediction with expert advice. ''Journal of Computer and System Sciences''~~, '''~~56~~''', ~~153~~-173 (~~1998~~)~~.

*

to:

* Vladimir Vovk. Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.

* Vladimir Vovk. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

* Vladimir Vovk. A game of prediction with expert advice. ''Journal of Computer and System Sciences'' 56:153 - 173, 1998.

Changed line 20 from:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss.

to:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss and the case when {$\Omega=[0,1], \Gamma=[0,1]$}.

Changed lines 20-21 from:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/\eta$}~~. In~~ particular~~,~~ {$\eta=2$} for the square~~-loss, {$\eta=1$} for the log~~-loss.

to:

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}-th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/ \eta $}, in particular {$\eta=2$} for the square-loss.

Changed lines 28-29 from:

* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371~~–~~383. San Mateo, CA: Morgan Kaufmann, 1990.

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).

to:

* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371-383. San Mateo, CA: Morgan Kaufmann, 1990.

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'', '''56''', 153-173 (1998).

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'', '''56''', 153-173 (1998).

Changed line 8 from:

# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is ~~the~~ [[substitution function]].

to:

# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is a [[substitution function]].

Changed lines 25-29 from:

* [[Universal Portfolios Algorithm]]

to:

* [[Universal Portfolios Algorithm]]

!!!Bibliography

* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371–383. San Mateo, CA: Morgan Kaufmann, 1990.

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).

!!!Bibliography

* V. Vovk, Aggregating strategies. In: ''Proceedings of the 3rd Annual Workshop on Computational Learning Theory'' (ed by M Fulk and J Case), pp 371–383. San Mateo, CA: Morgan Kaufmann, 1990.

* V.Vovk, A game of prediction with expert advice. ''Journal of Computer and System Sciences'' '''56''', 153-173 (1998).

Added lines 1-21:

Strong Aggregating Algorithm is one of the [[exponential weights algorithms]], and for the case of finite number of experts has the strongest theoretical bound. It makes a prediction by the following scheme:

# Initialize weights {$w_k:=1/K$}, {$k=1,\ldots,K$}

FOR {$t=1,2,\dots$}

# Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}

# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is the [[substitution function]].

# Reality announces {$\omega\in\Omega$}.

# {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

# {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}.

END FOR.

Here {$\eta$} is a learning rate parameter. The protocol above means, that Strong Aggregating Algorithm takes losses on predictions of experts, transfer them into exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, then mix it using weights of experts, transfer back from exponential space, and receive the so-called generalized prediction this way. It is clear that the generalized prediction is not always a real prediction, and one should use the [[substitution function]] to make it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s

{$L_T \le c(\eta)L_T^k + a(\eta)\ln K$}

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/\eta$}. In particular, {$\eta=2$} for the square-loss, {$\eta=1$} for the log-loss.

# Initialize weights {$w_k:=1/K$}, {$k=1,\ldots,K$}

FOR {$t=1,2,\dots$}

# Experts announce predictions {$\gamma^k \in \Gamma, k=1,\ldots,K$}.

# Calculate generalized prediction {$g(\omega)=\log_\eta \sum\limits_{k=1}^K e^{-\eta \lambda(\omega,\gamma^k)}w_k, \forall \omega \in \Omega$}

# Algorithm announces {$\gamma = S(g)$}, where {$S(g)$} is the [[substitution function]].

# Reality announces {$\omega\in\Omega$}.

# {$L_t:=L_{t-1}+\lambda(\omega,\gamma)$}.

# {$L_t^k:=L_{t-1}^k+\lambda(\omega,\gamma^k), k=1,\ldots,K$}.

# Update weights {$w_k := w_k e^{-\eta \lambda(\omega,\gamma^k)}$}, and normalize them {$w_k := w_k / \sum\limits_{k=1}^K w_k$}.

END FOR.

Here {$\eta$} is a learning rate parameter. The protocol above means, that Strong Aggregating Algorithm takes losses on predictions of experts, transfer them into exponential space by {$e^{-\eta \lambda(\omega,\gamma^k)}$}, then mix it using weights of experts, transfer back from exponential space, and receive the so-called generalized prediction this way. It is clear that the generalized prediction is not always a real prediction, and one should use the [[substitution function]] to make it. The substitution function {$S$} is such a function that for {$\gamma=S(g)$} the {$\lambda(\omega,\gamma) \le g(\omega)$} for all {$\omega \in \Omega$}. For various [[loss function]]s

{$L_T \le c(\eta)L_T^k + a(\eta)\ln K$}

for all {$T$} and {$k$}, where {$L_T$} is the loss suffered by master over the first {$T$} trials, and {$L_T(k)$} is the loss suffered by the {$k$}th expert over the first {$T$} trials. For the mixable [[loss function]]s {$c(\eta)=1, a(\eta)=1/\eta$}. In particular, {$\eta=2$} for the square-loss, {$\eta=1$} for the log-loss.

Added lines 1-4:

Important special cases:

* [[Bayesian Merging Scheme]]

* [[Weighted Majority Algorithm]]

* [[Universal Portfolios Algorithm]]

* [[Bayesian Merging Scheme]]

* [[Weighted Majority Algorithm]]

* [[Universal Portfolios Algorithm]]