# Weak Teacher

Weak teachers represent a class of prediction algorithms for the case of the relaxation of the on-line protocol - the case when Reality provides true labels $y_n$ of examples with a delay or only occasionally, for a subset of trials (or both).

## Definition

Teaching schedule - a function $L: N \rightarrow \mathbb{N}$ defined on an infinite set $N = \{n_1, n_2, \ldots\}\subseteq \mathbb{N}$, $n_1 < n_2 < \ldots$ and satisfying

$L(n) \le n$ for all $n \in N$

and

$m \neq n \Rightarrow L(m) \neq L(n)$ for all $m \in N$ and $n \in N$.

The teaching schedule describes the way the data is disclosed: after the trial $n$, Reality provides the lable $y_{L(n)}$ for the object $x_{L(n)}$.

Weak teacher or $L$-taught version $\Gamma^L$ of a confidence predictor $\Gamma$ is

$\Gamma^{L, \epsilon}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) = \Gamma^{\epsilon}(x_{L_{n_1}}, y_{L_{n_1}}, \ldots, x_{L_{n_{s(n)}}}, y_{L_{n_{s(n)}}}, x_n)$,

where $s(n) := |\{i:i \in N, i< n\}|$. In words, weak teacher is a confidence predictor whose prediction sets are based only on real lables disclosed by the end of the current trial.

An $L$-taught (smoothed) conformal predictor is a confidence predictor that can be represented as $\Gamma^L$ for some (smoothed) conformal predictor $\Gamma$.

## Examples

Ideal teacher (TCM). If $N = \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L = \Gamma$.

Slow teacher. If lag: $\mathbb{N} \rightarrow \mathbb{N}$ is an increasing function, $l(n) = n + \text{lag}(n)$, $N:= l(\mathbb{N})$ and $L(n) := l^{-1}(n), n \in N$ then $\Gamma^L$ is a predictor that learns the true label for each object $x_n$ but with a delay equal to $\text{lag}(n)$.

Lazy teacher. If $N \neq \mathbb{N}$ and $L(n) = n$ for each $n \in N$, then $\Gamma^L$ is given the true lables immediately but not for every object.

## Validity

In case of weak teachers there is no validity in the strongest possible way (conservative validity). However, the following weaker types of validity can be defined:

• weak validity;
• strong validity;
• validity in the sense of the law of the iterated algorithm.

### Weak validity

All the statements in the section are given under the randomness assumption.

A randomized confidence predictor $\Gamma$ is asymptotically exact in probability if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon \rightarrow 0$ in probability,

where $\text{err}_{n} ^{\epsilon}(\Gamma, P)$ is the random variable defined as follows:

$\text{err}_{n} ^{\epsilon}(\Gamma, (x_1, y_1, x_2, y_2, \ldots)) := 1$ if $y_n \not \in \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ ; $0$ otherwise.

Similarly, a confidence predictor $\Gamma$ is asymptotically conservative in probability if, for all significance levels $\epsilon$ and all probability distributions $Q$ on $Z$,

$(\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon)^{+} \rightarrow 0$ in probability.

Theorem Let $L$ be a teaching schedule with domain $N = \{n_1, n_2, \ldots\}$, $n_1 < n_2 < \ldots$.

• If $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$, any $L$-taught smoothed conformal predictor is asymptotically exact in probability.
• Otherwise, there exists an $L$-taught smoothed conformal predictor which is not asymptotically exact in probability.

Corollary If $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$, any $L$-taught conformal predictor is asymptotically conservative in probability.