Inductive Conformal Predictor

Inductive conformal predictors (ICPs) were first discovered as a modification of transductive conformal predictors in order to improve the computational efficiency of the algoithm for large datasets.

Definition

An inductive conformal predictor (ICM) determined by a nonconformity measure $(A_n)$ and an ascending sequence of positive integers $ 0 < m_1 < m_2 < \ldots$ is a confidence predictor $\Gamma: Z^*\times X\times (0,1)\rightarrow 2^Y$ ($2^Y$ is a set of all subsets of $Y$) such that the prediction sets $ \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ are computed as follows:

  • if $n \le m_1$, set

$ \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) := \{y \in  Y: |{i = 1, \ldots,n: \alpha_i \ge \alpha_n }|  \slash{}n > \epsilon\}$,

where

$\alpha_i := A_n(\{(x_1, y_1), \ldots, (x_{i-1}, y_{i-1}), (x_{i+1}, y_{i+1}), \ldots, (x_{n-1}, y_{n-1}),(x_n,y)\}, (x_i, y_i)), i = 1, \ldots, n-1,$

$\alpha_n := A_n(\{(x_1, y_1), \ldots, (x_{n-1}, y_{n-1})\}, (x_n, y)).$

  • otherwise, find the $k$ such that $m_k<n \le m_{k+1}$ and set

$\Gamma^{\epsilon}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n) := \{y \in  Y: |{i = m_k + 1, \ldots,n: \alpha_i \ge \alpha_n }|  \slash{}(n - m_k) > \epsilon\}$,

where the nonconformity scores $\alpha_i$ are defined by

$\alpha_i := A_{m_k +1}(\{(x_1, y_1), \ldots, (x_{m_k}, y_{m_k})\}, (x_i, y_i)), i = m_k + 1, \ldots, n-1,$

$\alpha_n := A_{m_k + 1}(\{(x_1, y_1), \ldots, (x_{m_k}, y_{m_k})\}, (x_n, y))$

and $\{\ldots\}$ designates the bag (multiset) of examples.

The standard assumption for ICPs as well as for transductive conformal predictors is randomness assumption (also called the i.i.d. assumption).

Inductive conformal predictors can be generalized by Mondrian conformal predictors to a wider class of confidence predictors. Transductive conformal predictors can be considered as an important special case of ICPs.

Desiderata

Validity

All the statements in the section are given under the randomness assumption.

Smoothed ICPs are defined analogously to smoothed conformal predictors:

$ \Gamma^\epsilon(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$

is set to the set of all labels $y \in  Y$ such that

$(|\{i=1,\ldots,n:\alpha_i>\alpha_n\}|+(\eta_y|\{i=1,\ldots,n:\alpha_i=\alpha_n\}|)\slash{}(n - m_k) > \epsilon$,

where $j = m_k + 1, ..., n$, the nonconformity scores $\alpha_i$ are defined as before and $\eta_y\in[0,1]$ is a random number.

As in the case of TCPs, we can formulate analogous statements for ICPs using the same terminology.

Theorem All smoothed ICPs are exactly valid.

Corollary All smoothed ICPs are asymptotically exact.

Corollary All ICPs are conservatively valid.

Corollary All ICPs are asymptotically conservative.

To put it simply, in the long run the frequency of erroneous predictions given by ICPs does not exceed $\epsilon$ at each confidence level $1 - \epsilon$.

Effeciency

As inductive conformal predictors are automatically valid, the main goal is to improve their efficiency: to make the prediction sets ICPs output as small as possible.

In comparison with corresponding transductive conformal predictors determined by the same non-conformity measure, ICPs are inferior in terms of effeciency (however, the loss in efficiency appears to be small), but outperform the TCPs in computational effeciency.