# Mondrian Conformal Predictor

Mondrian conformal predictors (MCPs) represent a wide class of confidence predictors which is the generalisation of conformal predictors and inductive conformal predictors with a new property of validity, validity within categories; this kind of validity is stronger (for large data sets) than the usual conservative validity for conformal predictors.

## Definition

A Mondrian taxonomy is a measurable function  , where  is a set of observations and  is a countable set of elements called categories, that satisfies some natural properties given in Vovk et al. (2005), Section 4.5. The definition involve partitions into rectangles, reminiscent of Piet Mondrian's paintings (see also Scott Locklin's blog). However, in this article we will use a simpler definition given in Balasubramanian et al. (2014), Section 2.2.

A simple Mondrian taxonomy is a function  that maps any finite sequence of observations  to a same-length sequence  of categories that is equivariant with respect to permutations: We refer to  as the category of  . A  -conditional conformity measure is a measurable function  that maps every data sequence  to a same-length sequence  and that is equivariant with respect to permutations leaving the categories intact.

The Mondrian conformal predictor determined by a  -conditional conformity measure  and a set of significance levels  ,  , is defined by where where  and (Formally, as defined here, an MCP is a confidence predictor only when all  coincide.)

The standard assumption for MCPs is the randomness assumption (also called the IID assumption).

A simple example is given by label-conditional conformal predictors, which require the label space  to be finite. In this case the set of categories  coincides with the set of labels, and the category of an observation  is simply its label,  .

Another example is where we want to make the analysis conditional on a type of objects. For example, if the objects are people, we might want to define  if  is male and  if  is female.

## Validity

All the statements in the section are given under the randomness assumption.

Smoothed MCPs are defined analogously to smoothed conformal predictors: the definition of  is now modified to and the rest of the definition is the same. Here  are random numbers (chosen independently at different trials and independently of the observations).

In the next theorem we assume that  depends only on  (without this assumption, we can still guarantee the right coverage probability but the independence of errors may be lost).

Theorem Any smoothed MCP is category-wise exact w.r. to  , that is, for all  , the conditional probability distribution of  given  is uniform on  , where the observations  are generated from an exchangeable probability distribution on  and  are the p-values for the true labels.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category  is equal to  for each  . Therefore, the frequency of errors made by MCPs on examples of category  does not exceed  for each  .

An important feature of MCPs is that the theorem only requires the observations in the same category to be exchangeable: it will remain true if the equality in the definition of exchangeability is only required to hold for permutations  leaving the categories intact. A nice example (due to Daniil Ryabko) showing the significance of this relaxation is recognizing handwritten text (assuming the text has been divided into separate symbols): the label-conditional conformal predictor will be valid provided that different instances of the same symbol are exchangeable; we do not need to require that the sequence of symbols in the text be exchangeable (this assumption would be patently wrong).

## Applications

Bibliography

• Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014). Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications. Morgan Kaufmann, Chennai.
• Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). Algorithmic learning in a random world. Springer, New York.