Sylvester Identity

The Sylvester identity states that for any two matrices $A$ and $B$ such that the both products $AB$ and $BA$ exist (i.e., $A$ is of the same size as $B'$ and therefore the both matrices $AB$ and $BA$ are square), we have

\begin{equation*} \det(I+AB)=\det(I+BA)\enspace. \end{equation*}
In this equation $I$ denotes identity matrices (also called unit matrices) possibly of different sizes; in this article we will use the notation $I$ without a reference to size.

The reader may refer to [Henderson and Searle] for the history of the identity (see eq. (6)).

Proof by Partitioned Matrices

Perhaps the shortest proof of the identity is as follows. The matrix identity

$$\begin{pmatrix} I & B\\ 0 & I \end{pmatrix} \begin{pmatrix} I+BC  & 0\\ -C    & I \end{pmatrix} = \begin{pmatrix} I  & B\\ -C & I \end{pmatrix}= \begin{pmatrix} I     & 0    \\ -C    & I+CB \end{pmatrix} \begin{pmatrix} I & B\\ 0 & I \end{pmatrix} $$
can be checked by direct calculation dealing with block matrices. It remains to take the determinants of the right- and left-hand sides.

Proof by Eigenvalues

Although the proof based on partitioned matrices is concise and easy to check, the intuition behind it is not so easy to grasp. In what remains we present a lengthier but arguably more intuitive proof done in the operator theory fashion promoted by [Axler].

We derive the identity from the following statement of independent interest.

Theorem 1. Matrices $AB$ and $BA$ have the same non-zero eigenvalues with the same multiplicities.

The "non-zero" clause cannot be dropped. Indeed, let $AB$ and $BA$ be of different sizes (this happens when $A$ and $B$ are not square). One of the matrices clearly has more eigenvalues than the other. We will show that the "extra" eigenvalues will be zeroes.

It is easy to see that non-zero eigenvalues of $AB$ and $BA$ are the same. Indeed, by definition if $\lambda\ne 0$ is an eigenvalue of $AB$, then

\begin{equation*} (AB)v=\lambda v \end{equation*}
for some non-zero vector $v$. Multiplying this by $B$ on the left we get
\begin{equation*} (BA)(Bv)=\lambda (Bv)\enspace, \end{equation*}
where $Bv\ne 0$ (or $ABv=0$ and the first equation implies that $v=0$).

It is a bit more difficult to show that the multiplicities are the same. We will rely on the "geometric" definition of multiplicity from [Axler, Chapter 8 (p. 171)]. A vector $v$ is a generalised eigenvector corresponding to an eigenvalue $\lambda$ of a square matrix (or an operator) $C$ if $(C-\lambda I)^dv=0$ for some positive integer $d$. It is easy to see that generalised eigenvectors for the same $\lambda$ form a subspace; the dimension of this subspace is the multiplicity of the eigenvalue $\lambda$.

One can show that it is sufficient to consider powers $d\le n$, where $n$ is the dimension of the space where $C$ operates (see  [Axler, Corollary 8.7, p. 166]. Therefore the space of generalised eigenvectors can be defined as the null space of the operator $(C-\lambda I)^n$; however, we will not need this definition.

The reader who is more familiar with the "algebraic" definition of the multiplicity as that of a root of the characteristic polynomial may want to recall the Jordan form of a matrix in order to see that the definitions are equivalent.

Proof: The idea of the proof is suggested by the argument above. Let $v_1,v_2,\ldots,v_k$ be linearly independent generalised eigenvectors of $AB$ corresponding to the same eigenvalue $\lambda\ne 0$. We will show that $Bv_1,Bv_2,\ldots,Bv_k$ are generalised eigenvectors of $BA$ and are linearly independent.

Let us start with the eigenvector property. For every $i=1,2,\ldots,k$ there is a positive integer $d$ such that $(AB-\lambda I)^dv_i=0$. Let us open up the brackets. The binomial identity implies that

\begin{equation*} (AB-\lambda I)^dv_i = \sum_{j=0}^d\binom{d}{j}(-1)^{d-j}\lambda^{d-j}(AB)^jv_i =0 \enspace. \end{equation*}
Multiplying this equation by $B$ on the left and observing that $B(AB)^j=(BA)^jB$ yields
\begin{equation*} 0 = \sum_{j=0}^d\binom{d}{j}(-1)^{d-j}\lambda^{d-j}(AB)^jBv_i = (BA-\lambda I)^dBv_i \enspace. \end{equation*}

Now suppose that $Bv_1,Bv_2,\ldots,Bv_k$ are linearly dependent and there are coefficients $a_1,a_2,\ldots,a_k$, not all zero, such that $a_1Bv_1+a_2Bv_2+\ldots+a_kBv_k=0$. The vector $u=a_1v_1+a_2v_2+\ldots+a_kv_k$ is not zero because $v_1,v_2,\ldots,v_k$ were assumed to be independent, but $Bu=0$.

Multiplying the last equation by $A$ on the left yields $(AB)u=0$ and subtracting $\lambda u$ yields $(AB-\lambda I)u=-\lambda u$. Thus $u\ne 0$ is an eigenvector of $(AB-\lambda I)$ with a non-zero eigenvalue $-\lambda$. On the other hand, $u$ is a generalised eigenvector of $AB$ and applying $(AB-\lambda I)$ to it repeatedly should turn it to zero (after finitely many iterations). This is a contradiction. $\Box$

We can now prove the Sylvester identity. We will rely on the representation (taken by [Axler] to be the definition) of the determinant as the product of (complex) eigenvalues counting multiplicities. The eigenvalues of $I+AB$ equal the eigenvalues of $AB$ plus 1. All non-zero eigenvalues of $AB$ equal the non-zero eigenvalues of $BA$, while zero eigenvalues are unimportant as they contribute 1s as eigenvalues of $I+AB$ and $I+BA$. The products of eigenvalues are therefore equal.

References

[Henderson and Searle] H.V.Henderson and S.R.Searle. On deriving the inverse of a sum of matrices. SIAM Review, 23(1), 1981.

[Axler] S.Axler. Linear Algebra Done Right. Springer, 2nd edition, 1997.