Conformal Predictive System

Conformal predictive systems are introduced in the recent technical report Vovk et al. (2017). Essentially, these are conformal transducers that, for each training sequence and each test object, output p-values $p^y$ that are increasing as a function of the label $y$, assumed to be a real number. The function $y\mapsto p^y$ is then called a predictive distribution.

A wide class of conformity measures that often lead to conformal predictive systems is

  A((z_1,\ldots,z_n),(x,y)) := y - \hat y,

where $\hat y$ is the prediction for the label of $x$ based on the training sequence $z_1,\ldots,z_n$ and $(x,y)$. An even wider class is

  A((z_1,\ldots,z_n),(x,y)) := (y - \hat y)\slash{}\sigma_y,

where $\sigma_y > 0$ is an estimate of the variability or difficulty of $y$ computed from the training sequence and $(x,y)$. (The methods for computing $\hat y$ and $\sigma_y$ are supposed invariant with respect to permutations of $z_1,\ldots,z_n$.) The width $p^y(1)-p^y(0)$ of such conformal predictive distributions is typically equal to $1\slash{}(n+1)$, where $n$ is the length of the training sequence, except for at most $n$ values of $y$.

The formal definition of conformal predictive systems takes account of the fact that, in the case of smoothed conformal predictors, $p^y$ also depends on the random number $\eta\in[0,1]$, and a fuller notation is $p^y(\eta)$. It is also required that $p^y(0)\to0$ as $y\to-\infty$ and $p^y(1)\to1$ as $y\to\infty$.

Notice that in the context of conformal predictive systems the p-values acquire properties of probabilities. Besides, they have some weak properties of object conditionality: e.g., the central prediction regions $\{y\mid\epsilon\slash{}2\le p^y\le 1-\epsilon\slash{}2\}$ are not empty, except in very pathological cases.

There are universally consistent predictive distributions (Vovk, 2017).

Conformal decision making

Conformal predictive systems can be applied for the purpose of decision making. Universally consistent predictive distributions can be used for making asymptotically efficient decisions.