Inductive Conformal Predictor

Inductive conformal predictors (IC Ps) is a modification of conformal predictors in order to improve the computational efficiency for large data sets. They are also know as "split conformal predictors".

Definition

An inductive conformal predictor (ICM) determined by a conformity measure {⚠ $A$ } and an ascending sequence of positive integers {⚠ $0 < m_{1} < m_{2} < \dots$ } is a confidence predictor {⚠ $Γ : Z^{*} \times X \times (0, 1) \to 2^{Y}$ } ({⚠ $2^{Y}$ } is a set of all subsets of {⚠ $Y$ }) such that the prediction sets {⚠ $Γ^{ε} (x_{1}, y_{1}, \dots, x_{n}, y_{n}, x_{n + 1})$ } are computed as follows:

if {⚠ $n \leq m_{1}$ }, set

{⚠ $Γ^{ε} (x_{1}, y_{1}, \dots, x_{n}, y_{n}, x_{n + 1}) := {y \in Y : ∣ i = 1, \dots, n + 1 : α_{i} \leq α_{n + 1} ∣ / (n + 1) > ε}$ },

where

{⚠ $α_{i} := A ({(x_{1}, y_{1}), \dots, (x_{i - 1}, y_{i - 1}), (x_{i + 1}, y_{i + 1}), \dots, (x_{n}, y_{n}), (x_{n + 1}, y)}, (x_{i}, y_{i})), i = 1, \dots, n,$ }

{⚠ $α_{n + 1} := A ({(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}, (x_{n + 1}, y)) .$ }

otherwise, find the {⚠ $k$ } such that {⚠ $m_{k} < n \leq m_{k + 1}$ } and set

{⚠ $Γ^{ε} (x_{1}, y_{1}, \dots, x_{n}, y_{n}, x_{n + 1}) := {y \in Y : ∣ i = m_{k} + 1, \dots, n + 1 : α_{i} \leq α_{n + 1} ∣ / (n + 1 - m_{k}) > ε}$ },

where the nonconformity scores {⚠ $α_{i}$ } are defined by

{⚠ $α_{i} := A ({(x_{1}, y_{1}), \dots, (x_{m_{k}}, y_{m_{k}})}, (x_{i}, y_{i})), i = m_{k} + 1, \dots, n,$ }

{⚠ $α_{n} := A_{m_{k} + 1} ({(x_{1}, y_{1}), \dots, (x_{m_{k}}, y_{m_{k}})}, (x_{n + 1}, y))$ }

and {⚠ ${\dots}$ } designates the bag (multiset) of observations.

The standard assumption for IC Ps as well as for conformal predictors is randomness assumption (also called the i.i.d. assumption).

Inductive conformal predictors can be generalized by Mondrian conformal predictors to a wider class of confidence predictors. Conformal predictors can be considered as an important special case of IC Ps.

Desiderata

Validity

All the statements in the section are given under the randomness assumption.

Smoothed IC Ps are defined analogously to smoothed conformal predictors:

{⚠ $Γ^{ε} (x_{1}, y_{1}, \dots, x_{n}, y_{n}, x_{n + 1})$ }

is set to the set of all labels {⚠ $y \in Y$ } such that

{⚠ $(∣ {i = 1, \dots, n + 1 : α_{i} < α_{n + 1}} ∣ + (η_{y} ∣ {i = 1, \dots, n + 1 : α_{i} = α_{n + 1}} ∣) / (n + 1 - m_{k}) > ε$ },

where {⚠ $j = m_{k} + 1, . . ., n + 1$ }, the conformity scores {⚠ $α_{i}$ } are defined as before and {⚠ $η_{y} \in [0, 1]$ } is a random number.

As in the case of conformal predictors, we can formulate analogous statements for IC Ps using the same terminology.

Theorem All smoothed IC Ps are exactly valid.

Corollary All smoothed IC Ps are asymptotically exact.

Corollary All IC Ps are conservatively valid.

Corollary All IC Ps are asymptotically conservative.

To put it simply, in the long run the frequency of erroneous predictions given by IC Ps does not exceed {⚠ $ε$ } at each confidence level {⚠ $1 - ε$ }.

Efficiency

As inductive conformal predictors are automatically valid, the main goal is to improve their efficiency: to make the prediction sets IC Ps output as small as possible. See criteria of efficiency.

In comparison with corresponding conformal predictors determined by the same non-conformity measure, IC Ps are inferior in terms of predictive efficiency (however, the loss in predictive efficiency appears to be small), but outperform the C Ps in computational efficiency.