Metamath Proof Explorer

Table of Contents - 11.7.3. The Cayley-Hamilton theorem

In this section, a direct algebraic proof for the Cayley-Hamilton theorem is provided, according to Wikipedia ("Cayley-Hamilton theorem", 09-Nov-2019, https://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem, section "A direct algebraic proof" (this approach is also used for proving Lemma 1.9 in [Hefferon] p. 427):

"This proof uses just the kind of objects needed to formulate the Cayley-Hamilton theorem: matrices with polynomials as entries. The matrix (t * I_n - A) whose determinant is the characteristic polynomial of A is such a matrix, and since polynomials [over a commutative ring] form a commutative ring, it has an adjugate

(1) B = adj(t * I_n - A) .

Then, according to the right-hand fundamental relation of the adjugate, one has

(2) ( t * I_n - A ) x B = det(t * I_n - A) x I_n = p(t) * I_n .

Since B is also a matrix with polynomials in t as entries, one can, for each i, collect the coefficients of t^i in each entry to form a matrix B_i of numbers, such that one has

(3) B = sum_{i = 0 to (n-1)} t^i B_i .

(The way the entries of B are defined makes clear that no powers higher than t^(n-1) occur). While this looks like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of n constant matrices, and the coefficient t^i has been written to the left of the matrix to stress this point of view.

Now, one can expand the matrix product in our equation by bilinearity

(4) p(t) * I_n = ( t * I_n - A ) x B
= ( t * I_n - A ) x sum_{i = 0 to (n-1)} t^i * B_i = sum_{i = 0 to (n-1)} t * I_n x t^i B_i - sum_{i = 0 to (n-1)} A * t^i B_i = sum_{i = 0 to (n-1)} t^(i+1) * B_i - sum_{i = 0 to (n-1)} t^i * A x B_i = t^n B_n-1 + sum_{i = 1 to (n-1)} t^i * ( B_i-1 - A x B_i ) - A x B₀ .

Writing

(5) p(t) I_n = t^n * I_n + t^(n-1) * c_(n-1) x I_n + ... + t * c₁ I_n + c₀ I_n ,

one obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of t as coefficients. Such an equality can hold only if in any matrix position the entry that is multiplied by a given power t^i is the same on both sides; it follows that the constant matrices with coefficient t^i in both expressions must be equal. Writing these equations then for i from n down to 0, one finds

(6) B_n-1 = I_n , B_i-1 - A x B_i = c_i * I_n for 1 <= i <= n-1 , - A x B₀ = c₀ * I_n .

Finally, multiply the equation of the coefficients of t^i from the left by A^i, and sum up:

(7) A^n B_n-1 + sum_{i
= 1 to (n-1)} ( A^i x B_i-1 - A^(i+1) x B_i ) - A x B₀ = A^n + c_n-1 * A^(n-1) + ... + c₁ * A + c₀ * I_n .

The left-hand sides form a telescoping sum and cancel completely; the right-hand sides add up to p(A):

(8) 0 = p(A) .

This completes the proof."

To formalize this approach, the steps mentioned in Wikipedia must be elaborated in more detail.

The first step is to formalize the preliminaries and the objective of the theorem. In Wikipedia, the Cayley-Hamilton theorem is stated as follows: "... the Cayley-Hamilton theorem ... states that every square matrix over a commutative ring ... satisfies its own characteristic equation." Or in more detail: "If A is a given n x n matrix and I_n is the n x n identity matrix, then the characteristic polynomial of A is defined as p(t) = det(t * I_n - A), where det is the determinant operation and t is a variable for a scalar element of the base ring. Since the entries of the matrix (t * I_n - A) are (linear or constant) polynomials in t, the determinant is also an n-th order monic polynomial in t. The Cayley-Hamilton theorem states that if one defines an analogous matrix equation, p(A), consisting of the replacement of the scalar eigenvalues t with the matrix A, then this polynomial in the matrix A results in the zero matrix,

p(A) = 0.

The powers of A, obtained by substitution from powers of t, are defined by repeated matrix multiplication; the constant term of p(t) gives a multiple of the power A^0, which is defined as the identity matrix. The theorem allows A^n to be expressed as a linear combination of the lower matrix powers of A. When the ring is a field, the Cayley-Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix divides its characteristic polynomial."

Actually, the definition of the characteristic polynomial of a square matrix requires some attention. According to df-chpmat, the characteristic polynomial of an x matrix over a ring is defined as

where is the function mapping an x matrix over the polynomial ring over the ring to its determinant, is the variable of the polynomials over , is the x identity matrix as matrix over the polynomial ring over the ring (not the x identity matrix of the matrices over the ring !) and is the matrix over a ring transformed into a constant matrix over the polynomial ring over the ring . Thus is the multiplication of a polynomial matrix with a "scalar", i.e. a polynomial (see chpmatval).

By this definition, it is ensured that , the matrix whose determinat is the characteristic polynomial of the matrix , is actually a matrix over the polynomial ring over the ring , as stated in Wikipedia ("matrix with polynomials as entries"). This matrix is called the characteristic matrix of matrix (see Wikipedia "Polynomial matrix", 16-Nov-2019, https://en.wikipedia.org/wiki/Polynomial_matrix). Following the notation in Wikipedia, we denote the characteristic polynomial of the matrix , formally defined by as "p(M)" in the comments. The characteristric matrix will be denoted by "c(M)", so that "p(M) = det( c(M) )".

After the preliminaries are clarified, the objective of the Cayley-Hamilton theorem must be considered. As described in Wikipedia, the matrix must be "inserted" into its characteristic polynomial in an appropriate way. Since every polynomial can be represented as infinite, but finitely supported sum of monomials scaled by the corresponding coefficients (see ply1coe), also the characteristic polynomial can be written in this way:

p(M) = sum_i ( p_i * M^i )

Here, * is the scalar multiplication in the algebra of the polynomials over the ring , and the coefficients are elements of the ring .

By this, we can "define" the insertion of the matrix M into its characteristic polynomial by "p(M) = sum( p_i * M^i)", see also cayleyhamilton1. Here, * is the scalar multiplication in the algebra of the matrices over the ring .

To prove the Cayley-Hamilton theorem, we have to show that "p(M) = 0", where 0 is the zero matrix.

In this section, the following class variables and informal identifiers (acronyms in the form "A(B)" or "A_B") are used:

<table> <tr> <th>class variable/ informal identifier</th> <th>definiens</th> <th>explanation</th> </tr> <tr> <td> </td> <td></td> <td>An arbitrary finite set, used as dimension for matrices</td> </tr> <tr> <td> </td> <td></td> <td>An arbitrary (commutative) ring: </td> </tr> <tr> <td> B(R) </td> <td> </td> <td>Base set of (the ring) </td> </tr> <tr> <td> </td> <td> </td> <td>Algebra of x matrices over (the ring) </td> </tr> <tr> <td> </td> <td> </td> <td>Base set of the algebra of x matrices, i .e. the set of all x matrices</td> </tr> <tr> <td> </td> <td></td> <td>An arbitrary x matrix</td> </tr> <tr> <td> </td> <td> </td> <td>The algebra of polynomials over (the ring) </td> </tr> <tr> <td> B(P) </td> <td> </td> <td>Base set of the algebra of polynomials, i .e. the set of all polynomials</td> </tr> <tr> <td> , XR </td> <td> </td> <td>The variable of polynomials over (the ring) </td> </tr> <tr> <td> , XA </td> <td> </td> <td>The variable of polynomials over matrices of the algebra </td> </tr> <tr> <td> </td> <td> </td> <td>The group exponentiation for polynomials over (the ring) </td> </tr> <tr> <td> ^ </td> <td> </td> <td>Arbitrary group exponentiation</td> </tr> <tr> <td> </td> <td> </td> <td>The injection of scalars, i.e. elements of (the ring) into the base set of the algebra of polynomials over </td> </tr> <tr> <td> , S(p) </td> <td> </td> <td>The element of (the ring) represented as polynomial over </td> </tr> <tr> <td> </td> <td> </td> <td>Algebra of x matrices over (the polynomial ring) over the ring </td> </tr> <tr> <td> B(Y) </td> <td> </td> <td>Base set of the algebra of polynomial x matrices, i .e. the set of all polynomial x matrices</td> </tr> <tr> <td> </td> <td> </td> <td>Algebra of polynomials over the ring of x matrices over the ring </td> </tr> <tr> <td> B(Q) </td> <td> </td> <td>Base set of the algebra of polynomials over the ring of x matrices over the ring , i .e. the set of all polynomials having x matrices as coefficients</td> </tr> <tr> <td> , +</td> <td> </td> <td>The addition of polynomial matrices</td> </tr> <tr> <td> , -</td> <td> </td> <td>The subtraction of polynomial matrices</td> </tr> <tr> <td> , *Y </td> <td> </td> <td>The multiplication of a polynomial matrix with a scalar ( i. e. a polynomial)</td> </tr> <tr> <td> *A </td> <td> </td> <td>The multiplication of a matrix with a scalar ( i. e. an element of the underlying ring)</td> </tr> <tr> <td> *Q </td> <td> </td> <td>The multiplication of a polynomial over matrices with a scalar ( i. e. a matrix)</td> </tr> <tr> <td> , xY </td> <td> </td> <td>The multiplication of polynomial matrices</td> </tr> <tr> <td> xA </td> <td> </td> <td>The multiplication of matrices</td> </tr> <tr> <td> xQ </td> <td> </td> <td>The multiplication of polynomials over matrices</td> </tr> <tr> <td> , 1Y </td> <td> </td> <td>The identity matrix in the algebra of polynomial matrices over </td> </tr> <tr> <td> 1A </td> <td> </td> <td>The identity matrix in the algebra of matrices over </td> </tr> <tr> <td> , 0Y </td> <td> </td> <td>The zero matrix in the algebra of matrices consisting of polynomials</td> </tr> <tr> <td> </td> <td> </td> <td>The transformation of an x matrix over into a polynomial x matrix over </td> </tr> <tr> <td> T1(M) </td> <td> </td> <td>The matrix M transformed into a polynomial x matrix over </td> </tr> <tr> <td> U(M) </td> <td> </td> <td>The (constant) polynomial x matrix M transformed into a matrix over the ring . Inverse function of : (see m2cpminvid2)</td> </tr> <tr> <td> T2(M) </td> <td> </td> <td>The polynomial x matrix M transformed into a polynomial over the x matrices over </td> </tr> <tr> <td> , c(M) </td> <td> </td> <td>The characteristic matrix of a matrix , i.e. the matrix whose determinant is the characteristic polynomial of the matrix </td> </tr> <tr> <td> </td> <td> </td> <td>The function mapping a matrix over (a ring) to its characteristic polynomial</td> </tr> <tr> <td> , p(M) </td> <td> </td> <td>The characteristic polynomial of a matrix over (a ring) </td> </tr> <tr> <td> </td> <td> </td> <td>The scalar matrix (diagonal matrix) with the characteristic polynomial of a matrix as diagional elements</td> </tr> <tr> <td> </td> <td> </td> <td>The function mapping a matrix consisting of polynomials to its adjugate ("matrix of cofactors")</td> </tr> <tr> <td> , adj(cm(M)) </td> <td> </td> <td>The adjugate of the characteristic matrix of the matrix </td> </tr> </table>

Using this notation, we have: <ol type="a"> <li>"c(M) e. B(Y)", or , see chmatcl </li> <li>"p(M) e. B(P)", or , see chpmatply1 </li> <li>"T(M) e. B(Y)", or , see mat2pmatbas </li> <li> , see maduf </li> <li>"adj(cm(M)) e. B(Y)", or </li> </ol>

Following the proof shown in Wikipedia, the following steps are performed: <ol> <li>Write , the adjugate of the characteristic matrix, as a finite sum of scaled monomials, see pmatcollpw3fi1: adj(cm(M)) = sumi=0 to s ( X_R ^i *_Y T_1(b(i)) ) where b(i) are matrices over the ring , so T_1(b(i)) are constant polynomial matrices. This step corresponds to (3) in Wikipedia. In contrast to Wikipedia, we write as finite sum of not exactly determined number of summands, which may be greater than needed (including summands of value 0). This will be sufficient to provide a representation of as infinite, but finitely supported sum, see step 3.</li> <li>Write , the product of the characteristic matrix and its adjugate as finite sum of scaled monomials, see cpmadugsumfi. This representation is obtained by replacing by the representation resulting from step 1. and performing calculation rules available for the associative algebra of matrices over polynomials over a commutative ring: cm(M) *_Y adj(cm(M)) = sumi=0 to s ( X_R ^i *_Y ( T_1(b(i-1)) - T_1(M) x_Y T_1(b(i)) ) ) + X_R ^(s+1) *_Y ( T_1(b(s)) - T_1(M) x_Y T_1(b(0)) where b(i) are matrices over , so T_1(b(i)) are constant polynomial matrices: cm(M) *_Y adj(cm(M)) = cm(M) *_Y sumi=0 to s ( X_R ^i *_Y T_1(b(i)) ) [see pmatcollpw3fi1 (step 1.)] = ( ( X_A *_Y 1_Y ) - T_1(M) ) *_Y sumi=0 to s ( X_R ^i *_Y T_1(b(i)) ) [def. of cm(M)] = ( X_A *_Y 1_Y ) *_Y sumi=0 to s ( X_R ^i *_Y T_1(b(i)) ) - T_1(M) *_Y sumi=0 to s ( X_R ^i *_Y T_1(b(i)) ) [see rngsubdir] = sumi=0 to s ( X_R ^i *_Y ( T_1(b(i-1)) - T_1(M) x_Y T_1(b(i)) ) ) + X_R ^(s+1) *_Y ( T_1(b(s)) - T_1(M) x_Y T_1(b(0)) [see cpmadugsumlemF] This step corresponds partially to (4) in Wikipedia.</li> <li>Write as infinite, but finitely supported sum of scaled monomials, see cpmadugsum: cm(M) * adj(cm(M)) = sum_i ( X_R ^i *_Y G(i) ) This representation is obtained by defining a function _G_ for the coefficients, which we call "characteristic factor function", see chfacfisf, which covers the special terms and the padding with 0. G(i) is a constant polynomial matrix (see chfacfisfcpmat). This step corresponds partially to (4) in Wikipedia, with summands of value 0 added.</li> <li>Write , the scalar matrix (diagonal matrix) with the characteristic polynomial of a matrix as diagional elements, as infinite, but finitely supported sum of scaled monomials. See cpmidgsum: p(m) *_Y I_Y = sum_i ( X_R ^i *_Y ( S(p_i) *_Y I_Y ) ) The proof of cpmidgsum is making use of pmatcollpwscmat, because is a scalar/diagonal polynomial matrix with the characteristic polynomial "p(M)" as diagonal entries (since p_i is an element of the ring , S(p_i) is a (constant) polynomial). This corresponds to (5) in Wikipedia, with summands of value 0 added. </li> <li>Transform the sum representation of from step 3. into polynomials over matrices: T_2(cm(M) * adj(cm(M))) = sum_i ( U(G(i)) *_Q X_A ^i ) [see cpmadumatpoly] where U(G(i)) is a matrix over the ring .</li> <li>Transform the sum representation of from step 4. into polynomials over matrices: T_2(p(m) *_Y I_Y) = sum_i ( p_i *_A I_A ) *_Q X_A ^i ) [see cpmidpmat]</li> <li>Equate the sum representations resulting from steps 5. and 6. by using cpmadurid to obtain the equation sum_i ( U(G(i)) *_Q X_A ^i ) = sum_i ( p_i *_A I_A ) *_Q X_A ^i ): sum_i ( U(G(i)) *_Q X_A ^i ) = T_2(cm(M) * adj(cm(M))) [see step 5.] = T_2(p(m) *_Y I_Y) [see cpmadurid] = sum_i ( p_i *_A I_A ) *_Q X_A ^i ) [see step 6.] <it>Note that this step is contained in the proof of chcoeffeq, see step 9.</it> This step corresponds to the conclusion from (4) and (5) in Wikipedia, with summands of value 0 added.</li> <li>Compare the sum representations of step 7. to obtain the equations U(G(i)) = p_i *_A I_A , see chcoeffeqlem. This corresponds to (6) in Wikipedia. Since the coefficients of the transformed representations and the original representations are identical, the equations of the coefficients are also valid for the original representations of steps 3. and 4.</li> <li>Multiply the equations of the coefficients from step 8. from the left by M^i, and sum up, see chcoeffeq: sum_i ( M^i x_A U(G(i)) ) = sum_i ( M^i x_A ( p_i *_A I_A) ) This corresponds to (7) in Wikipedia.</li> <li>Transform the right hand side of the equation in step 9. into an appropriate form, see cayhamlem3: sum_i ( p_i *_A M^i ) = sum_i ( M^i x_A ( p_i *_A I_A) ) [see cayhamlem2] = sum_i ( M^i x_A U(G(i)) ) [see chcoeffeq]</li> <li>Apply the theorem for telescoping sums, see telgsumfz, to the sum sum_i ( T_1(M)^i x_Y G(i) ), which results in an equation to 0: sum_i ( T_1(M)^i x_Y G(i) ) = 0_Y, see cayhamlem1: sum_i ( T_1(M)^i x_Y G(i) ) = sumi=1 to s ( T_1(M)^i x_Y T_1(b(i-1)) - T_1(M)^(i+1) x_Y T_1(b(i)) ) + ( T_1(M)^(s+1) x_Y T_1(b(s)) - T_1(M) x_Y T_1(b(0)) ) [see chfacfpmmulgsum2] = ( T_1(M) x_Y T_1(b(0)) - T_1(M)^(s+1) x_Y T_1(b(s)) ) + ( T_1 M)^(s+1) x_Y T_1(b(s)) - T_1(M) x_Y T_1(b(0)) ) [see telgsumfz] = 0_Y [see grpnpncan0] This step corresponds partially to (8) in Wikipedia.</li> <li>Since is a ring homomorphism (see mat2pmatrhm), the left hand side of the equation in step 10. can be transformed into a representation appropriate to apply the result of step 11., see cayhamlem4: sum_i ( p_i *_A M^i ) = sum_i ( M^i x_A U(G(i)) ) [see cayhamlem3 (step 10.)] = U(T_1(sum_i ( M^i x_A U(G(i)) ))) [see m2cpminvid] = U(sum_i T_1( M^i x_A U(G(i)) )) [see gsummptmhm] = U(sum_i ( T_1(M^i) x_Y T_1(U(G(i))) )) [see rhmmul] = U(sum_i ( T_1(M)^i x_Y T_1(U(G(i))) )) [see mhmmulg] = U(sum_i ( T_1(M)^i x_Y G(i) )) [see m2cpminvid2] </li> <li>Finally, combine the results of steps 11. and 12., and use the fact that (and therefore also its inverse ) is an injective ring homomorphism (see mat2pmatf1 and mat2pmatrhm) to transform the equality resulting from steps 11. and 12. into the desired equation sum_i ( p_i *_A M^i ) = 0_A , see cayleyhamilton resp. cayleyhamilton0: sum_i ( p_i *_A M^i ) = U(sum_i ( T_1(M)^i x_Y G(i) )) [see cayhamlem4 (step 12.)] = U(0_Y ) [see cayhamlem1 (step 11.)] = 0_A [see m2cpminv0]</li> </ol> The transformations in steps 5., 6., 10., 12. and 13. are not mentioned in the proof provided in Wikipedia, since it makes no distinction between a matrix over a ring and its representation as matrix over the polynomial ring over the ring in general!