# MCMC

### Introduction

Markov Chain Monte-Carlo (MCMC) is a method of sampling from a distribution, when it is difficult to sample from the distribution directly. It can be used with Monte-Carlo Integration for numerical calculation of integrals.

It explores the state space `⚠ $X$`

using a Markov Chain mechanism, and produces samples `⚠ $x_i$`

which mimic samples drawn from the distribution `⚠ $p(x)$`

directly. We assume that we can evaluate `⚠ $p(x)$`

up to a normalizing constant, but it is difficult to draw samples from it.

Let us consider the situation when the number of possible states in a Markov chain is `⚠ $n$`

. Then we can choose an initial distribution `⚠ $q(x) \in \mathbb{R}^n$`

and have a transition matrix `⚠ $T \in \mathbb{R}^{n \times n}$`

. Following the transition matrix at the first step, we get the distribution `⚠ $qT$`

. If we repeat the process several times and it converges to a distribution `⚠ $p(x)$`

, we say it converged to an invariant distribution. The process converges if the matrix `⚠ $T$`

has properties of irreducibility and aperiodicity.

**Definition 1.** A transition matrix `⚠ $T$`

is *irreducible* if for any state of the Markov chain there is a positive probability of visiting all other states.

**Definition 2.** A transition matrix `⚠ $T$`

is *aperiodical* if the Markov chain does not get trapped into cycles.

A sufficient condition for the transition matrix to satisfy these requirenments is the *reversibility* condition:

`⚠ $\displaystyle{p(x_i) T(x_j|x_i) = p(x_j) T(x_i|x_j).}$`

In this case `⚠ $p(x)$`

here is the invariant distribution. Indeed, summing both sides over `⚠ $x_j$`

gives us

`⚠ $\displaystyle{p(x_i) = \sum _{x_j} p(x_j) T(x_i|x_j).}$`

In continuous state spaces the transition matrix `⚠ $T$`

becomes an integral kernel `⚠ $K$`

and `⚠ $p(x)$`

becomes the corresponding eigenfunction.

MCMC samples are organized in a way that the desired `⚠ $p(x)$`

is the invariant distribution:

### The Metropolis-Hastings algorithm

Let us have a proposal distribution `⚠ $q(x^* | x)$`

and invariant distribution `⚠ $p(x)$`

. An MH step involves sampling from the proposal distribution a candidate value `⚠ $x^*$`

. The Markov chain then moves towards `⚠ $x^*$`

with acceptance probability `⚠ $A(x,x^*) = \min\{1,\frac{p(x^*)q(x|x^*)}{p(x)q(x^*|x)}\}$`

, otherwise it remains at `⚠ $x$`

.

`⚠ $x_0$`

;
`⚠ $i=0,...,N-1$`

}
`⚠ $u \sim U_{[0,1]}$`

from the uniform distribution.
`⚠ $x^* \sim q(x^* |x_i)$`

from the proposal distribution.
`⚠ $u < A(x_i,x^*)$`

then `⚠ $x_{i+1} = x^*$`

else `⚠ $x_{i+1}=x_i$`

.
If `⚠ $q(x^* | x) = q(x | x^*)$`

the MH algorithm transforms to the Metropolis algorithm.

### Integration using MCMC

To calculate the integral `⚠ $\int f(x) p(x)dx$`

it is possible to sample from the uniform distribution and calculate the integral using Monte-Carlo integration. But MCMC method of sampling from `⚠ $p(x)$`

may lead to much better convergence. First one needs to perform *burn-in* state, when a Markov chain finds a way to sample from a distribution close to `⚠ $p(x)$`

using MH algorithm. Then search for the distribution continues, but the sum `⚠ $f(x_i)$`

is calculated. The resulting approximation of the integral is `⚠ $1/N_2 \sum_{j=N_1}^{N_1+N_2} f(x_j)$`

, where `⚠ $N_1$`

is the length of the burn-in stage, and `⚠ $N_2$`

is the length of the sampling stage.

MCMC is also used to solve optimization problems like `⚠ $\hat x = \arg\max_{x \in X} p(x)$`

by applying a so-called simulated annealing technique.

### Bibliography

- Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. Equations of State Calculations by Fast Computing Machines.
*Journal of Chemical Physics*,**21**(6): 1087–1092 (1953). - Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications.
*Biometrika*,**57**(1):97–109 (1970). - Christophe Andrieu et al. An Introduction to MCMC for Machine Learning,
*Machine Learning*,**50**: 5–43 (2003).