# Weak Teacher

*Weak teachers* represent a class of prediction algorithms for the case of
the relaxation of the on-line protocol - the case when
Reality provides true labels $y_n$ of examples with a delay or only occasionally, for a subset of trials (or both).

## Definition

*Teaching schedule* - a function $L: N \rightarrow \mathbb{N}$ defined on an infinite set $N = \{n_1, n_2, \ldots\}\subseteq \mathbb{N}$, $n_1 < n_2 < \ldots$ and satisfying

$L(n) \le n$ for all $n \in N$

and

$m \neq n \Rightarrow L(m) \neq L(n)$ for all $m \in N$ and $n \in N$.

The teaching schedule describes the way the data is disclosed: after the trial $n$, Reality provides the lable $y_{L(n)}$ for the object $x_{L(n)}$.

*Weak teacher* or *$L$-taught version* $\Gamma^L$ of a confidence predictor $\Gamma$ is

$\Gamma^{L, \epsilon}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) = \Gamma^{\epsilon}(x_{L_{n_1}}, y_{L_{n_1}}, \ldots, x_{L_{n_{s(n)}}}, y_{L_{n_{s(n)}}}, x_n)$,

where $s(n) := |\{i:i \in N, i< n\}|$. In words, weak teacher is a confidence predictor whose prediction sets are based only on real lables disclosed by the end of the current trial.

An *$L$-taught (smoothed) conformal predictor* is a confidence predictor that can be represented as $\Gamma^L$ for some (smoothed) conformal predictor $\Gamma$.

## Examples

**Ideal teacher (TCM).** If $N = \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L = \Gamma$.

**Slow teacher.** If lag: $\mathbb{N} \rightarrow \mathbb{N}$ is an increasing function, $l(n) = n + \text{lag}(n)$, $N:= l(\mathbb{N})$ and $L(n) := l^{-1}(n), n \in N$ then $\Gamma^L$
is a predictor that learns the true label for each object $x_n$ but with a delay equal to $\text{lag}(n)$.

**Lazy teacher.** If $N \neq \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L$ is given the true lables immediately but not for every object.

## Validity

In case of weak teachers there is no validity in the strongest possible way (conservative validity). However, the following weaker types of validity can be defined:

- weak validity;
- strong validity;
- validity in the sense of the law of the iterated algorithm.

### Weak validity

All the statements in the section are given under the randomness assumption.

A randomized confidence predictor $\Gamma$ is *asymptotically exact in probability* if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon \rightarrow 0$ in probability,

where $\text{err}_{n} ^{\epsilon}(\Gamma, P)$ is the random variable defined as follows:

$\text{err}_{n} ^{\epsilon}(\Gamma, (x_1, y_1, x_2, y_2, \ldots)) := 1$ if $y_n \not \in \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ ; $0$ otherwise.

Similarly, a confidence predictor $\Gamma$ is *asymptotically conservative in probability* if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$(\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon)^{+} \rightarrow 0$ in probability.

**Theorem** *Let $L$ be a teaching schedule with domain $N = \{n_1, n_2, \ldots\}$, $n_1 < n_2 < \ldots$.*

*If $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$, any $L$-taught smoothed conformal predictor is asymptotically exact in probability.**Otherwise, there exists an $L$-taught smoothed conformal predictor which is not asymptotically exact in probability.*

**Corollary** *If $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$, any $L$-taught conformal predictor is asymptotically conservative in probability.*