# Venn Predictor

Venn Predictor (or Venn Machine) is a multiprobability classification system. The term multiprobability means that we announce several probability distributions for the new label rather than a single one.

Venn Predictor divides the old examples into categories, puts the current examples into one of the categories, and then uses the frequencies of labels in the chosen category as probabilities for the current object's label.

We observe a sequence of examples, where each example is a pair (object, label). Objects are drawn from a measurable space called the object space,  , and labels are drawn from a measurable space called the label space,  . Their cartesian product,  is called the example space.

Venn Predictor deals with the case of classification, i.e., the case when  is finite. Let us consider a general multiprobability prediction protocol. At each step of the protocol Reality announces  , Predictor announces  (where  is the set of all probability distributions on  ), and then Reality announces  .

To define Venn Predictor formally, we need to introduce the concept of taxonomy. This is a sequence  , where each  is a measurable finite partition of the set  . We write  for the element of the partition  that contains  . Venn Predictor can be defined for every taxonomy  . Let us consider the case when we have observed examples  and a new object,  . We denote  . Let us consider the case when the current object has label  and let us write (for now)  . At each step of the protocol Venn Predictor divides the examples  into categories assigning  and  to the same category if and only if  , where  denotes a multiset (a bag), i.e., a set of elements where each element has a multiplicity, i.e., a natural number indicating how many memberships it has in the multiset.

The category  containing  is nonempty (it contains at least this one element). Let  be the empirical probability distribution on the labels in this category:  ; this is a probability distribution on  . Venn Predictor determined by taxonomy  is the multiprobability predictor  . The set  consists of between one and  distinct probability distributions on  .

There are many Venn Predictors, one for each taxonomy. Some of them perform better then the others on particular datasets.