# Weak Teacher

*Weak teachers* represent a class of prediction algorithms for the case of
the relaxation of the on-line protocol - the case when
Reality provides true labels `⚠ $y_n$`

of examples with a delay or only occasionally, for a subset of trials (or both).

## Definition

*Teaching schedule* - a function `⚠ $L: N \rightarrow \mathbb{N}$`

defined on an infinite set `⚠ $N = \{n_1, n_2, \ldots\}\subseteq \mathbb{N}$`

, `⚠ $n_1 < n_2 < \ldots$`

and satisfying

`⚠ $L(n) \le n$`

for all `⚠ $n \in N$`

and

`⚠ $m \neq n \Rightarrow L(m) \neq L(n)$`

for all `⚠ $m \in N$`

and `⚠ $n \in N$`

.

The teaching schedule describes the way the data is disclosed: after the trial `⚠ $n$`

, Reality provides the lable `⚠ $y_{L(n)}$`

for the object `⚠ $x_{L(n)}$`

.

*Weak teacher* or `⚠ $L$`

-taught version`⚠ $\Gamma^L$`

of a confidence predictor `⚠ $\Gamma$`

is

`⚠ $\Gamma^{L, \epsilon}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) = \Gamma^{\epsilon}(x_{L_{n_1}}, y_{L_{n_1}}, \ldots, x_{L_{n_{s(n)}}}, y_{L_{n_{s(n)}}}, x_n)$`

,

where `⚠ $s(n) := |\{i:i \in N, i< n\}|$`

. In words, weak teacher is a confidence predictor whose prediction sets are based only on real lables disclosed by the end of the current trial.

An * ⚠ $L$-taught (smoothed) conformal predictor* is a confidence predictor that can be represented as

`⚠ $\Gamma^L$`

for some (smoothed) conformal predictor `⚠ $\Gamma$`

.
## Examples

**Ideal teacher (TCM).** If `⚠ $N = \mathbb{N}$`

and `⚠ $L(n) = n$`

for each `⚠ $n \in N$`

, then `⚠ $\Gamma^L = \Gamma$`

.

**Slow teacher.** If lag: `⚠ $\mathbb{N} \rightarrow \mathbb{N}$`

is an increasing function, `⚠ $l(n) = n + \text{lag}(n)$`

, `⚠ $N:= l(\mathbb{N})$`

and `⚠ $L(n) := l^{-1}(n), n \in N$`

then `⚠ $\Gamma^L$`

is a predictor that learns the true label for each object `⚠ $x_n$`

but with a delay equal to `⚠ $\text{lag}(n)$`

.

**Lazy teacher.** If `⚠ $N \neq \mathbb{N}$`

and `⚠ $L(n) = n$`

for each `⚠ $n \in N$`

, then `⚠ $\Gamma^L$`

is given the true lables immediately but not for every object.

## Validity

In case of weak teachers there is no validity in the strongest possible way (conservative validity). However, the following weaker types of validity can be defined:

- weak validity;
- strong validity;
- validity in the sense of the law of the iterated algorithm.

### Weak validity

All the statements in the section are given under the randomness assumption.

A randomized confidence predictor `⚠ $\Gamma$`

is *asymptotically exact in probability* if, for all significance levels `⚠ $\epsilon$`

and all probability distributions `⚠ $Q$`

on `⚠ $Z$`

,

`⚠ $\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon \rightarrow 0$`

in probability,

where `⚠ $\text{err}_{n} ^{\epsilon}(\Gamma, P)$`

is the random variable defined as follows:

`⚠ $\text{err}_{n} ^{\epsilon}(\Gamma, (x_1, y_1, x_2, y_2, \ldots)) := 1$`

if `⚠ $y_n \not \in \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$`

; `⚠ $0$`

otherwise.

Similarly, a confidence predictor `⚠ $\Gamma$`

is *asymptotically conservative in probability* if, for all significance levels `⚠ $\epsilon$`

and all probability distributions `⚠ $Q$`

on `⚠ $Z$`

,

`⚠ $(\frac{1}{n} \sum_{i=1}^{n} \text{err}_n^\epsilon(\Gamma, Q^\infty) - \epsilon)^{+} \rightarrow 0$`

in probability.

**Theorem** *Let ⚠ $L$ be a teaching schedule with domain ⚠ $N = \{n_1, n_2, \ldots\}$, ⚠ $n_1 < n_2 < \ldots$.*

*If*`⚠ $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$`

, any`⚠ $L$`

-taught smoothed conformal predictor is asymptotically exact in probability.*Otherwise, there exists an*`⚠ $L$`

-taught smoothed conformal predictor which is not asymptotically exact in probability.

**Corollary** *If ⚠ $\lim_{k\rightarrow\infty}(n_k/n_{k-1}) = 1$, any ⚠ $L$-taught conformal predictor is asymptotically conservative in probability.*