Transductive Conformal Predictor

Definition

We assume that Reality outputs successive pairs

$(x_1, y_1), (x_2, y_2), \ldots $

called examples. The objects $x_i$ are elements of a measurable space $X$ and the labels $y_i$ are elements of a measurable space $Y$.

We call $Z:=X \times Y$ the example space, $\epsilon \in(0, 1)$ the significance level, the complimentary value $1 - \epsilon$ the confidence level.

Nonconformity measure is a measurable mapping

$A: Z^{(*)} \times Z \to (-\infty, +\infty]$,

where $Z^{(*)}$ is the set of all bags (multisets) of elements of $Z$. Intuitively, this function assigns a numerical score (sometimes called the nonconformity score) indicating how different a new example is from a set of old ones.

A transductive conformal predictor (TCM) determined by a nonconformity measure $(A_n)$ is a confidence predictor $\Gamma: Z^*\times X\times (0,1)\rightarrow 2^Y$ ($2^Y$ is a set of all subsets of $Y$) obtained by setting

$ \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$

equal to the set of all labels $y \in  Y$ such that

$|{i = 1, \ldots,n: \alpha_i \ge \alpha_n }|  \slash{}n > \epsilon$,

where

$\alpha_i := A_n(\{(x_1, y_1), \ldots, (x_{i-1}, y_{i-1}), (x_{i+1}, y_{i+1}), \ldots, (x_{n-1}, y_{n-1}),(x_n,y)\}, (x_i, y_i)), \quad i = 1, \ldots, n-1,$,

$\alpha_n := A_n(\{(x_1, y_1), \ldots, (x_{n-1}, y_{n-1})\}, (x_n, y))$,

and $\{\ldots\}$ designates the bag (multiset) of examples.

"Transductive conformal predictor" is often abbreviated to "conformal predictor".

The standard assumption for conformal predictors is the randomness assumption (also called the i.i.d. assumption).

Transductive conformal predictors can be generalized by inductive conformal predictors or Mondrian conformal predictors to a wider class of confidence predictors.

Desiderata

Validity

All the statements in the section are given under the randomness assumption.

The statement of validity is easiest for smoothed conformal predictors, for which

$ \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$

is set to the set of all labels $y \in  Y$ such that

$(|\{i=1,\ldots,n:\alpha_i>\alpha_n\}|+\eta_y|\{i=1,\ldots,n:\alpha_i=\alpha_n\}|)\slash{}n > \epsilon$,

where the nonconformity scores $\alpha_i$ are defined as before and $\eta_y\in[0,1]$ is a random number (chosen from the uniform distribution on [0,1].

Theorem All smoothed conformal predictors are exactly valid, in the sense that, for any exchangeable probability distribution $P$ on $Z^{\infty}$ and any significance level $\epsilon$, the random variables $err_n^{\epsilon} (\Gamma), \epsilon \in (0,1)$, $n = 1, 2, \ldots$, are independent Bernoulli variables with parameter $\epsilon$, where $err_{n} ^{\epsilon}(\Gamma)$ is the random variable

$err_{n} ^{\epsilon}(\Gamma)(x_1, y_1, x_2, y_2, \ldots) := 1$ if $y_n \not \in \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ ; $0$ otherwise.

The idea of the proof is quite simple.

Corollary All smoothed conformal predictors are asymptotically exact, in the sense that for any exchangeable probability distribution $P$ on $Z^{\infty}$ and any significance level $\epsilon$,

$\lim_{n\to\infty} \frac{1}{n}\sum_{i=1}^{n} err_n^{\epsilon}(\Gamma) = \epsilon$

with probability one.

Corollary All conformal predictors are asymptotically conservative, in the sense that for any exchangeable probability distribution $P$ on $Z^{\infty}$ and any significance level $\epsilon$,

$\limsup_{n\rightarrow \infty} \frac{1}{n} \sum_{i=1}^{n} err_n^{\epsilon}(\Gamma) \le \epsilon$

with probability one.

To put it simply, in the long run the frequency of erroneous predictions does not exceed $\epsilon$ at each confidence level $1 - \epsilon$.

One can also give a formal notion of validity for conformal predictors, although the usufulness of this notion is limited: it is intuitively clear that conformal predictors are valid or "more than valid" since the number of errors made by a conformal predictor never exceeds the number of errors made by the corresponding smoothed conformal predictor.

Efficiency

As conformal predictors are automatically valid, the main goal is to improve their efficiency: to make the prediction sets conformal predictors output as small as possible. In classification problem, a natural measure of effeciency of conformal predictors is the number of multiple predictions - the number of prediction sets containing more than one label. In regression problems, the prediction set is often an interval of values, hence, a natural measure of efficiency of such predictors is the length of the interval.