Originally Conjugate Gradient method was developed for solving certain systems of linear equations

where is a known vector, is unknown and is a square, symmetric, positive-definite matrix. Solving this equation is equivalent to finding minimum of a corresponding square function

This method is very similar to Gradient Descent in the sense that from the initial point we choose a vector which will take us to the next point, where the objective function will decrease. The above is repeated until we reach function minimum, i.e. gradient becomes zero.

Unlike Gradient Descent the direction is not set to , is chosen such that the next direction is conjugate to with respect to Hessian matrix.

A few approaches exist for calculating . The most popular one are ones by Polak-Ribiere(1) and Fletcher-Reeves(2)

So the algorithm looks like this:

1. Initialize to a random value
2. Initialize direction
3. Perform a line search:
4. Move to the next point:
5. Calculate gradient in next point:
6. Calculate new direction:
7. If is not small enough go to 3

Line search on step 3 can be replaced with an explicit solution, in cases where we can calculate Hessian matrix of our objective function. In this case we will get