Validity Idea

This article gives an intuitive explanation of why smoothed conformal predictors are valid in the on-line prediction protocol: the random variables $\text{err}_1,\text{err}_2,\ldots$ are independent Bernoulli with parameter $\epsilon$ (see the article about conformal prediction for the definitions). Our heuristic explanation will ignore the difference between conformal predictors and smoothed conformal predictors.

Let us show first that the probability that $\text{err}_n=1$ is approximately $\epsilon$. We will even show that the conditional probability, given the bag of the first $n$ observations, that $\text{err}_n=1$ is approximately $\epsilon$. The bag of observations determines the bag of the alphas; because of the randomness assumption all orderings of the alphas have the same probability ($1/n!$). An error is made ($\text{err}_n=1$) if and only if the p-value $p_{y_n}$ corresponding to the true label is ${}\le\epsilon$ if and only if $\alpha_n$ is among the $\epsilon n$ largest $\alpha_i$. And the last event has probability $\epsilon$.

Let us see that $\text{err}_1,\text{err}_2,\ldots$ are independent. It suffices to show that, for each $n$, $\text{err}_1,\ldots,\text{err}_n$ are independent. Fix $n$. It suffices to show that $err_n,\ldots,err_1$ are independent. We will show that they are independent given the bag of the first $n$ observations. We already know that the probability of $\text{err}_n=1$, given the bag, is $\epsilon$. Draw $(x_n,y_n)$ from the bag. The same argument shows that the probability of $\text{err}_{n-1}=1$, given the bag and $(x_n,y_n)$, is $\epsilon$. Therefore, the probability of $\text{err}_{n-1}=1$, given the bag and $\text{err}_n$, is $\epsilon$. In the same way we obtain that $\epsilon$ is the probability of $\text{err}_{n-2}=1$, given the bag and $\text{err}_n,\text{err}_{n-1}$.

For a formal proof see, e.g., Vovk et al. (2005), Theorem 8.1. For a less general but easier to understand proof, see Vovk (2002), Theorem 1.