Weak Teacher

Weak teachers represent a class of prediction algorithms for the case of the relaxation of the on-line protocol - the case when Reality provides true labels $y_n$ of examples with a delay or only occasionally, for a subset of trials (or both).

Definition

Teaching schedule - a function $L: N \rightarrow \mathbb{N}$ defined on an infinite set $N  = \{n_1, n_2, \ldots\}\subseteq \mathbb{N}$, $n_1 < n_2 < \ldots$ and satisfying

$L(n) \le n$ for all $n \in N$

and

$m \neq n \Rightarrow L(m) \neq L(n)$ for all $m \in N$ and $n \in N$.

The teaching schedule describes the way the data is disclosed: after the trial $n$, Reality provides the lable $y_{L(n)}$ for the object $x_{L(n)}$.

Weak teacher or $L$-taught version $\Gamma^L$ of a confidence predictor $\Gamma$ is

$\Gamma^{L, \epsilon}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) = \Gamma^{\epsilon}(x_{L_{n_1}}, y_{L_{n_1}}, \ldots, x_{L_{n_{s(n)}}}, y_{L_{n_{s(n)}}}, x_n)$,

where $s(n) := |\{i:i \in  N, i< n\}|$. In words, weak teacher is a confidence predictor whose prediction sets are based only on real lables disclosed by the end of the current trial.

An $L$-taught (smoothed) conformal predictor is a confidence predictor that can be represented as $\Gamma^L$ for some (smoothed) conformal predictor $\Gamma$.

Examples

Ideal teacher (TCM). If $N = \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L = \Gamma$.

Slow teacher. If lag: $\mathbb{N} \rightarrow \mathbb{N}$ is an increasing function, $l(n) = n + \text{lag}(n)$, $N:= l(\mathbb{N})$ and $L(n) := l^{-1}(n), n \in N$ then $\Gamma^L$ is a predictor that learns the true label for each object $x_n$ but with a delay equal to $\text{lag}(n)$.

Lazy teacher. If $N \neq \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L$ is given the true lables immediately but not for every object.

Validity

In case of weak teachers there is no validity in the strongest possible way (conservative validity). However, the following weaker types of validity can be defined:

  • weak validity;
  • strong validity;
  • validity in the sense of the law of the iterated algorithm.

Weak validity

All the statements in the section are given under the randomness assumption.

A randomized confidence predictor $\Gamma$ is asymptotically exact in probability if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon \rightarrow 0$ in probability,

where $\text{err}_{n} ^{\epsilon}(\Gamma, P)$ is the random variable defined as follows:

$\text{err}_{n} ^{\epsilon}(\Gamma, (x_1, y_1, x_2, y_2, \ldots)) := 1$ if $y_n \not \in \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ ; $0$ otherwise.

Similarly, a confidence predictor $\Gamma$ is asymptotically conservative in probability if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$(\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon)^{+} \rightarrow 0$ in probability.

Theorem Let $L$ be a teaching schedule with domain $N = \{n_1, n_2, \ldots\}$, $n_1 < n_2 < \ldots$.

  • If $\lim_{k\rightarrow\infty}(n_k\slash{}n_{k-1}) = 1$, any $L$-taught smoothed conformal predictor is asymptotically exact in probability.
  • Otherwise, there exists an $L$-taught smoothed conformal predictor which is not asymptotically exact in probability.

Corollary If $\lim_{k\rightarrow\infty}(n_k\slash{}n_{k-1}) = 1$, any $L$-taught conformal predictor is asymptotically conservative in probability.