Conformal Predictor

Definition

Suppose we have a training sequence of pairs

\[
  (z_1,z_2,\ldots,z_n) = ((x_1, y_1), (x_2, y_2), \ldots, (x_n,y_n))
\]

called observations. The objects $x_i$ are elements of a measurable space $\mathbf{X}$ and the labels $y_i$ are elements of a measurable space $\mathbf{Y}$.

We call $\mathbf{Z}:=\mathbf{X}\times\mathbf{Y}$ the observation space, $\epsilon\in(0,1)$ the significance level, and the complimentary value $1 - \epsilon$ the confidence level.

A conformity measure is a measurable mapping

\[
  A: \mathbf{Z}^{(*)} \times \mathbf{Z} \to [-\infty, +\infty],
\]

where $\mathbf{Z}^{(*)}$ is the set of all bags (multisets) of elements of $\mathbf{Z}$. Intuitively, this function assigns a numerical score (sometimes called the conformity score) indicating how similar a new observation is to a multiset of old ones.

The conformal predictor determined by the conformity measure $A$ is a confidence predictor $\Gamma$ whose value on the training sequence $z_1,\ldots,z_n$ and a test object $x_{n+1}\in\mathbf{X}$ at a significance level $\epsilon$ is obtained by setting

\[
  \Gamma^\epsilon(z_1,\ldots,z_n,x_{n+1})
  :=
  \left\{
    y\in\mathbf{Y}
    \mid
    \frac{|{i = 1, \ldots,n,n+1: \alpha_i^y \le \alpha^y_{n+1} }|}{n+1}
    >
    \epsilon
  \right\},
\]

where

\[
  \alpha^y_i := A(\{z_1,\ldots,z_{i-1}, z_{i+1},\ldots,z_n,(x_{n+1},y)\}, z_i), \quad i = 1,\ldots,n,
\]
\[
  \alpha^y_{n+1} := A(\{z_1,\ldots,z_n\}, (x_{n+1},y)),
\]

and $\{\ldots\}$ designates the bag (multiset) of observations.

The standard assumption for conformal predictors is the randomness assumption (also called the i.i.d. assumption).

Conformal predictors can be generalized by inductive conformal predictors or Mondrian conformal predictors to a wider class of confidence predictors.

Desiderata

Validity

All the statements in the section are given under the randomness assumption in the on-line prediction protocol.

The statement of validity is easiest for smoothed conformal predictors, for which

\[
  \Gamma^\epsilon(z_1,\ldots,z_n,x_{n+1})
  :=
  \left\{
    y\in\mathbf{Y}
    \mid
    \frac{|\{i=1,\ldots,n+1:\alpha^y_i>\alpha^y_{n+1}\}|+\eta_y|\{i=1,\ldots,n+1:\alpha^y_i=\alpha^y_{n+1}\}|}{n+1} > \epsilon
  \right\},
\]

where the nonconformity scores $\alpha^y_i$ are defined as before and $\eta_y\in[0,1]$ is a random number (chosen from the uniform distribution on [0,1]. In the rest of this section we will assume that the random numbers $\eta_y$ used at different trials $n$ are independent among themselves and of the observations.

Theorem All smoothed conformal predictors are exactly valid, in the sense that, for any exchangeable probability distribution $P$ on $\mathbf{Z}^{\infty}$ and any significance level $\epsilon\in(0,1)$, the random variables $\text{err}_n^{\epsilon}(\Gamma)$, $n = 1, 2, \ldots$, are independent Bernoulli variables with parameter $\epsilon$, where $\text{err}_{n} ^{\epsilon}(\Gamma)$ is the random variable

\[
  \text{err}_{n} ^{\epsilon}(\Gamma)(z_1,z_2,\ldots)
  :=
  \begin{cases}
    1 & \text{if $y_{n+1} \not \in \Gamma^\epsilon(z_1,\ldots,z_n,x_{n+1})$}\\
    0 & \text{otherwise}.
  \end{cases}
\]

The idea of the proof is quite simple.

Corollary All smoothed conformal predictors are asymptotically exact, in the sense that for any exchangeable probability distribution $P$ on $\mathbf{Z}^{\infty}$ and any significance level $\epsilon$,

\[
  \lim_{n\to\infty} \frac{1}{n}\sum_{i=1}^{n} \text{err}_n^{\epsilon}(\Gamma) = \epsilon
\]

with probability one.

Corollary All conformal predictors are asymptotically conservative, in the sense that for any exchangeable probability distribution $P$ on $\mathbf{Z}^{\infty}$ and any significance level $\epsilon$,

\[
  \limsup_{n\to \infty} \frac{1}{n} \sum_{i=1}^{n} \text{err}_n^{\epsilon}(\Gamma) \le \epsilon
\]

with probability one.

To put it simply, in the long run the frequency of erroneous predictions does not exceed $\epsilon$ at each confidence level $1 - \epsilon$.

One can also give a formal notion of validity for conformal predictors, although the usefulness of this notion is limited: it is intuitively clear that conformal predictors are valid or "more than valid" since the number of errors made by a conformal predictor never exceeds the number of errors made by the corresponding smoothed conformal predictor.

Efficiency

As conformal predictors are automatically valid, the main goal is to improve their efficiency (also known as predictive efficiency): to make the prediction sets conformal predictors output as small as possible. In classification problem, a natural measure of efficiency of conformal predictors is the number of multiple predictions - the number of prediction sets containing more than one label. In regression problems, the prediction set is often an interval of values, hence, a natural measure of efficiency of such predictors is the length of the interval. There are many possible criteria of efficiency.

There are separate articles about validity, efficiency, and a third desideratum, conditionality:

Conformal transducers

In many cases, it is more convenient to consider conformal transducers which output, for each training set $z_1,\ldots,z_n$, each test object $x_{n+1}\in\mathbf{X}$, and each potential label $y\in\mathbf{Y}$ for $x_{n+1}$ the p-value given by

\[
  p^y
  :=
  \frac{|\{i=1,\ldots,n+1:\alpha^y_i>\alpha^y_{n+1}\}|+\eta_y|\{i=1,\ldots,n+1:\alpha^y_i=\alpha^y_{n+1}\}|}{n+1}
\]

(cf. the definition of the smoothed conformal predictor). Conformal predictors and conformal transducers can be regarded as different ways of packaging the same object. For further details, see Vovk et al. (2005).

Bibliography