Mondrian Conformal Predictor

Mondrian conformal predictors (MCPs) represent a wide class of confidence predictors which is the generalisation of transductive conformal predictors and inductive conformal predictors with a new main property - validity within categories (see section Validity), a kind of validity that is stronger than usual conservative validity for TCMs.


Mondrian taxonomy is a measurable function $\kappa:N \times Z \rightarrow K$, where $Z$ is a set of examples, $K$ is the measurable space (at most countable with the discrete $\sigma$-algebra) of elements called categories, with the following property: the elements $\kappa^{-1}(k)$ of each category $k \in K$ form a rectangle $A \times B$, for some $A \subseteq N$ and $B \subseteq Z$. In words, a Mondrian taxonomy defines a division of the Cartesian product $N \times Z$ into categories.

Mondrian nonconformity measure based on a mondrian taxonomy $\kappa$ is a family of measurable functions $\{A_n:n\in N\}$ of the type

$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$,

where $(Z ^ {(*)})^K$ is a set of all functions mapping $K$ to the set of all bags (multisets) of elements of $Z$.

The Mondrian conformal predictor determined by a the Mondrian nonconformity measure $A_n$ and a set of significance levels $\epsilon_k, k \in K$ is a confidence predictor $\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$ ($2^Y$ is a set of all subsets of $Y$) such that the prediction set $ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$ is defined as the set of all lables $y\in Y$ such that

$p_n > \epsilon_{\kappa(n, (x_n, y))}$, $p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  \slash{}|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$

where $\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$ and

$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$

for $i = 1, \ldots, n$ such that $\kappa_i = \kappa_n$ ($\{z_j: \ldots \}$ denotes a multiset).

The standard assumption for MCPs is randomness assumption (also called the i.i.d. assumption).

Important special cases of MCPs:


All the statements in the section are given under the randomness assumption.

Smoothed MCPs are defined analogously to smoothed conformal predictors:

$ \Gamma^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$

is set to the set of all labels $y \in  Y$ such that

$p_n > \epsilon{\kappa(n, (x_n, y))} $, where $p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i = \alpha_n \}|  )\slash{}|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$,

the nonconformity scores $\alpha_i$ are defined as before and $\eta_n\in[0,1]$ is a random number.

Theorem Any smoothed MCP based on a Mondrian taxonomy $\kappa$ is category-wise exact w.r. to $\kappa$, that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$ is uniform on [0, 1], where $z_1, z_2, \ldots$ are examples generated from an exchangeable distribution on $Z^{\infty}$ and $p_1, p_2, \ldots$ are the p-values output by $f$.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category $k$ is equal to $\epsilon_k$ for each $k$. Therefore, the frequency of errors made by MCPs on examples of category $k$ does not exceed $\epsilon_k$ for each $k$.