In this section, a direct algebraic proof for the Cayley-Hamilton theorem is
provided, according to Wikipedia ("Cayley-Hamilton theorem", 09-Nov-2019,
https://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem, section
"A direct algebraic proof" (this approach is also used for proving Lemma 1.9 in
[Hefferon] p. 427):
"This proof uses just the kind of objects needed to formulate the
Cayley-Hamilton theorem: matrices with polynomials as entries. The matrix
(t * In - A) whose determinant is the characteristic polynomial of A is such a
matrix, and since polynomials [over a commutative ring] form a commutative
ring, it has an adjugate
(1) B = adj(t * In - A) .
Then, according to the right-hand fundamental relation of the adjugate, one has
(2) ( t * In - A ) x B = det(t * In - A) x In = p(t) * In .
Since B is also a matrix with polynomials in t as entries, one can, for each i,
collect the coefficients of t^i in each entry to form a matrix Bi of numbers,
such that one has
(3) B = sumi = 0 to (n-1) t^i Bi .
(The way the entries of B are defined makes clear that no powers higher than
t^(n-1) occur). While this looks like a polynomial with matrices as
coefficients, we shall not consider such a notion; it is just a way to write a
matrix with polynomial entries as a linear combination of n constant matrices,
and the coefficient t^i has been written to the left of the matrix to stress
this point of view.
Now, one can expand the matrix product in our equation by bilinearity
(4) p(t) * In = ( t * In - A ) x B
= ( t * In - A ) x sumi = 0 to (n-1) t^i * Bi
= sumi = 0 to (n-1) t * In x t^i Bi
- sumi = 0 to (n-1) A * t^i Bi
= sumi = 0 to (n-1) t^(i+1) * Bi
- sumi = 0 to (n-1) t^i * A x Bi
= t^n Bn-1
+ sumi = 1 to (n-1) t^i * ( Bi-1 - A x Bi )
- A x B0 .
Writing
(5) p(t) In = t^n * In + t^(n-1) * c(n-1) x In + ...
+ t * c1 In + c0 In ,
one obtains an equality of two matrices with polynomial entries, written as
linear combinations of constant matrices with powers of t as coefficients.
Such an equality can hold only if in any matrix position the entry that is
multiplied by a given power t^i is the same on both sides; it follows that the
constant matrices with coefficient t^i in both expressions must be equal.
Writing these equations then for i from n down to 0, one finds
(6) Bn-1 = In ,
Bi-1 - A x Bi = ci * In for 1 <= i <= n-1 ,
- A x B0 = c0 * In .
Finally, multiply the equation of the coefficients of t^i from the left by A^i,
and sum up:
(7) A^n Bn-1 + sumi
= 1 to (n-1) ( A^i x Bi-1 - A^(i+1) x Bi )
- A x B0 = A^n + cn-1 * A^(n-1) + ... + c1 * A + c0 * In .
The left-hand sides form a telescoping sum and cancel completely; the
right-hand sides add up to p(A):
(8) 0 = p(A) .
This completes the proof."
To formalize this approach, the steps mentioned in Wikipedia must be elaborated
in more detail.
The first step is to formalize the preliminaries and the objective of the
theorem. In Wikipedia, the Cayley-Hamilton theorem is stated as follows:
"... the Cayley-Hamilton theorem ... states that every square matrix over a
commutative ring ... satisfies its own characteristic equation." Or in more
detail: "If A is a given n x n matrix and In is the n x n identity matrix,
then the characteristic polynomial of A is defined as p(t) = det(t * In - A),
where det is the determinant operation and t is a variable for a scalar element
of the base ring. Since the entries of the matrix (t * In - A) are (linear or
constant) polynomials in t, the determinant is also an n-th order monic
polynomial in t. The Cayley-Hamilton theorem states that if one defines an
analogous matrix equation, p(A), consisting of the replacement of the scalar
eigenvalues t with the matrix A, then this polynomial in the matrix A results
in the zero matrix,
p(A) = 0.
The powers of A, obtained by substitution from powers of t, are defined by
repeated matrix multiplication; the constant term of p(t) gives a multiple of
the power A^0, which is defined as the identity matrix. The theorem allows A^n
to be expressed as a linear combination of the lower matrix powers of A. When
the ring is a field, the Cayley-Hamilton theorem is equivalent to the statement
that the minimal polynomial of a square matrix divides its characteristic
polynomial."
Actually, the definition of the characteristic polynomial of a square matrix
requires some attention. According to df-chpmat, the characteristic
polynomial of an x matrix over a ring is defined as
where is the function mapping an x matrix
over the polynomial ring over the ring to its determinant,
is the variable of the polynomials over ,
is the x identity matrix as matrix over the polynomial ring over
the ring (not the x identity matrix of the matrices over the
ring !) and is the matrix
over a ring transformed into a constant matrix over the polynomial
ring over the ring . Thus is the multiplication of a polynomial
matrix with a "scalar", i.e. a polynomial (see chpmatval).
By this definition, it is ensured that ,
the matrix whose determinant is the characteristic polynomial of the matrix
, is actually a matrix over the polynomial ring over the ring ,
as stated in Wikipedia ("matrix with polynomials as entries"). This matrix is
called the characteristic matrix of matrix (see Wikipedia "Polynomial
matrix", 16-Nov-2019, https://en.wikipedia.org/wiki/Polynomial_matrix).
Following the notation in Wikipedia, we denote the characteristic polynomial of
the matrix , formally defined by as
"p(M)" in the comments. The characteristric matrix will be denoted by "c(M)", so that "p(M) = det( c(M) )".
After the preliminaries are clarified, the objective of the Cayley-Hamilton
theorem must be considered. As described in Wikipedia, the matrix must be
"inserted" into its characteristic polynomial in an appropriate way. Since
every polynomial can be represented as infinite, but finitely supported sum
of monomials scaled by the corresponding coefficients (see ply1coe), also
the characteristic polynomial can be written in this way:
p(M) = sumi ( pi * M^i )
Here, * is the scalar multiplication in the algebra of the polynomials over the
ring , and the coefficients are elements of the ring .
By this, we can "define" the insertion of the matrix M into its characteristic
polynomial by "p(M) = sum( pi * M^i)", see also cayleyhamilton1.
Here, * is the scalar multiplication in the algebra of the matrices over the
ring .
To prove the Cayley-Hamilton theorem, we have to show that "p(M) = 0", where 0
is the zero matrix.
In this section, the following class variables and informal identifiers
(acronyms in the form "A(B)" or "AB") are used:
<table>
<tr>
<th>class variable/ informal identifier</th>
<th>definiens</th>
<th>explanation</th>
</tr>
<tr>
<td> </td>
<td></td>
<td>An arbitrary finite set, used as dimension for matrices</td>
</tr>
<tr>
<td> </td>
<td></td>
<td>An arbitrary (commutative) ring: </td>
</tr>
<tr>
<td> B(R) </td>
<td> </td>
<td>Base set of (the ring) </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>Algebra of x matrices over (the ring) </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>Base set of the algebra of x matrices, i .e. the set of all
x matrices</td>
</tr>
<tr>
<td> </td>
<td></td>
<td>An arbitrary x matrix</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The algebra of polynomials over (the ring) </td>
</tr>
<tr>
<td> B(P) </td>
<td> </td>
<td>Base set of the algebra of polynomials, i .e. the set of all
polynomials</td>
</tr>
<tr>
<td> , X<sub>R</sub> </td>
<td> </td>
<td>The variable of polynomials over (the ring) </td>
</tr>
<tr>
<td> , X<sub>A</sub> </td>
<td> </td>
<td>The variable of polynomials over matrices of the algebra </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The group exponentiation for polynomials over (the ring) </td>
</tr>
<tr>
<td> ^ </td>
<td> </td>
<td>Arbitrary group exponentiation</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The injection of scalars, i.e. elements of (the ring) into the
base set of the algebra of polynomials over </td>
</tr>
<tr>
<td> , S(p) </td>
<td> </td>
<td>The element of (the ring) represented as polynomial over
</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>Algebra of x matrices over (the polynomial ring)
over the ring </td>
</tr>
<tr>
<td> B(Y) </td>
<td> </td>
<td>Base set of the algebra of polynomial x matrices, i .e. the
set of all polynomial x matrices</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>Algebra of polynomials over the ring of x matrices over
the ring </td>
</tr>
<tr>
<td> B(Q) </td>
<td> </td>
<td>Base set of the algebra of polynomials over the ring of x
matrices over the ring , i .e. the set of all polynomials having
x matrices as coefficients</td>
</tr>
<tr>
<td> , +</td>
<td> </td>
<td>The addition of polynomial matrices</td>
</tr>
<tr>
<td> , -</td>
<td> </td>
<td>The subtraction of polynomial matrices</td>
</tr>
<tr>
<td> , *<sub>Y</sub> </td>
<td> </td>
<td>The multiplication of a polynomial matrix with a scalar
( i. e. a polynomial)</td>
</tr>
<tr>
<td> *<sub>A</sub> </td>
<td> </td>
<td>The multiplication of a matrix with a scalar ( i. e. an element of the
underlying ring)</td>
</tr>
<tr>
<td> *<sub>Q</sub> </td>
<td> </td>
<td>The multiplication of a polynomial over matrices with a scalar ( i. e.
a matrix)</td>
</tr>
<tr>
<td> , x<sub>Y</sub> </td>
<td> </td>
<td>The multiplication of polynomial matrices</td>
</tr>
<tr>
<td> x<sub>A</sub> </td>
<td> </td>
<td>The multiplication of matrices</td>
</tr>
<tr>
<td> x<sub>Q</sub> </td>
<td> </td>
<td>The multiplication of polynomials over matrices</td>
</tr>
<tr>
<td> , 1<sub>Y</sub> </td>
<td> </td>
<td>The identity matrix in the algebra of polynomial matrices over
</td>
</tr>
<tr>
<td> 1<sub>A</sub> </td>
<td> </td>
<td>The identity matrix in the algebra of matrices over </td>
</tr>
<tr>
<td> , 0<sub>Y</sub> </td>
<td> </td>
<td>The zero matrix in the algebra of matrices consisting of
polynomials</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The transformation of an x matrix over into a
polynomial x matrix over </td>
</tr>
<tr>
<td> T<sub>1</sub>(M) </td>
<td> </td>
<td>The matrix M transformed into a polynomial x matrix over
</td>
</tr>
<tr>
<td> U(M) </td>
<td> </td>
<td>The (constant) polynomial x matrix M transformed into a
matrix over the ring . Inverse function of :
(see m2cpminvid2)</td>
</tr>
<tr>
<td> T<sub>2</sub>(M) </td>
<td> </td>
<td>The polynomial x matrix M transformed into a polynomial
over the x matrices over </td>
</tr>
<tr>
<td> , c(M) </td>
<td> </td>
<td>The characteristic matrix of a matrix , i.e. the matrix whose
determinant is the characteristic polynomial of the matrix </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The function mapping a matrix over (a ring) to its
characteristic polynomial</td>
</tr>
<tr>
<td> , p(M) </td>
<td> </td>
<td>The characteristic polynomial of a matrix over (a ring) </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The scalar matrix (diagonal matrix) with the characteristic polynomial
of a matrix as diagional elements</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>The function mapping a matrix consisting of polynomials to its
adjugate ("matrix of cofactors")</td>
</tr>
<tr>
<td> , adj(cm(M)) </td>
<td> </td>
<td>The adjugate of the characteristic matrix of the matrix </td>
</tr>
</table>
Using this notation, we have:
<ol type="a">
<li>"c(M) e. B(Y)", or , see chmatcl </li>
<li>"p(M) e. B(P)", or , see chpmatply1 </li>
<li>"T(M) e. B(Y)", or ,
see mat2pmatbas </li>
<li> , see maduf </li>
<li>"adj(cm(M)) e. B(Y)", or </li>
</ol>
Following the proof shown in Wikipedia, the following steps are performed:
<ol>
<li>Write , the adjugate of the characteristic matrix, as a finite sum
of scaled monomials, see pmatcollpw3fi1:<br>
adj(cm(M)) = sum<sub>i=0 to s</sub> ( X_R ^i *_Y T_1(b(i)) )<br>
where b(i) are matrices over the ring , so T_1(b(i)) are constant
polynomial matrices.<br>
This step corresponds to (3) in Wikipedia. In contrast to Wikipedia, we
write as finite sum of not exactly determined number of summands,
which may be greater than needed (including summands of value 0). This
will be sufficient to provide a representation of as
infinite, but finitely supported sum, see step 3.</li>
<li>Write , the product of the characteristic matrix and its
adjugate as finite sum of scaled monomials, see cpmadugsumfi. This
representation is obtained by replacing by the representation
resulting from step 1. and performing calculation rules available for the
associative algebra of matrices over polynomials over a commutative ring:
<br>
cm(M) *_Y adj(cm(M))
= sum<sub>i=0 to s</sub> ( X_R ^i *_Y ( T_1(b(i-1))
- T_1(M) x_Y T_1(b(i)) ) )
+ X_R ^(s+1) *_Y ( T_1(b(s)) - T_1(M) x_Y T_1(b(0))<br>
where b(i) are matrices over , so T_1(b(i)) are constant polynomial
matrices:<br>
cm(M) *_Y adj(cm(M))<br>
= cm(M) *_Y sum<sub>i=0 to s</sub> ( X_R ^i *_Y T_1(b(i)) )
[see pmatcollpw3fi1 (step 1.)]<br>
= ( ( X_A *_Y 1_Y ) - T_1(M) )
*_Y sum<sub>i=0 to s</sub> ( X_R ^i *_Y T_1(b(i)) )
[def. of cm(M)]<br>
= ( X_A *_Y 1_Y ) *_Y sum<sub>i=0 to s</sub> ( X_R ^i *_Y T_1(b(i)) )
- T_1(M) *_Y sum<sub>i=0 to s</sub> ( X_R ^i *_Y T_1(b(i)) )
[see rngsubdir]<br>
= sum<sub>i=0 to s</sub> ( X_R ^i *_Y ( T_1(b(i-1))
- T_1(M) x_Y T_1(b(i)) ) )
+ X_R ^(s+1) *_Y ( T_1(b(s)) - T_1(M) x_Y T_1(b(0))
[see cpmadugsumlemF]<br>
This step corresponds partially to (4) in Wikipedia.</li>
<li>Write as infinite, but finitely supported sum of scaled
monomials, see cpmadugsum:<br>
cm(M) * adj(cm(M)) = sum_i ( X_R ^i *_Y G(i) )<br>
This representation is obtained by defining a function _G_ for the
coefficients, which we call "characteristic factor function", see
chfacfisf, which covers the special terms and the padding with 0.
G(i) is a constant polynomial matrix (see chfacfisfcpmat). This step
corresponds partially to (4) in Wikipedia, with summands of value 0
added.</li>
<li>Write , the scalar matrix (diagonal matrix) with
the characteristic polynomial of a matrix as diagional elements, as
infinite, but finitely supported sum of scaled monomials. See
cpmidgsum:<br>
p(m) *_Y I_Y = sum_i ( X_R ^i *_Y ( S(p_i) *_Y I_Y ) )<br>
The proof of cpmidgsum is making use of pmatcollpwscmat, because
is a scalar/diagonal polynomial matrix with the
characteristic polynomial "p(M)" as diagonal entries (since p_i is
an element of the ring , S(p_i) is a (constant) polynomial). This
corresponds to (5) in Wikipedia, with summands of value 0 added. </li>
<li>Transform the sum representation of from step 3. into
polynomials over matrices:<br>
T_2(cm(M) * adj(cm(M))) = sum_i ( U(G(i)) *_Q X_A ^i )
[see cpmadumatpoly]<br>
where U(G(i)) is a matrix over the ring .</li>
<li>Transform the sum representation of from step 4. into polynomials
over matrices:<br>
T_2(p(m) *_Y I_Y) = sum_i ( p_i *_A I_A ) *_Q X_A ^i )
[see cpmidpmat]</li>
<li>Equate the sum representations resulting from steps 5. and 6. by using
cpmadurid to obtain the equation<br>
sum_i ( U(G(i)) *_Q X_A ^i ) = sum_i ( p_i *_A I_A ) *_Q X_A ^i ):<br>
sum_i ( U(G(i)) *_Q X_A ^i )<br>
= T_2(cm(M) * adj(cm(M))) [see step 5.]<br>
= T_2(p(m) *_Y I_Y) [see cpmadurid]<br>
= sum_i ( p_i *_A I_A ) *_Q X_A ^i ) [see step 6.]<br>
<it>Note that this step is contained in the proof of chcoeffeq, see
step 9.</it> This step corresponds to the conclusion from (4) and (5)
in Wikipedia, with summands of value 0 added.</li>
<li>Compare the sum representations of step 7. to obtain the equations
U(G(i)) = p_i *_A I_A , see chcoeffeqlem. This corresponds to (6) in
Wikipedia. Since the coefficients of the transformed representations and
the original representations are identical, the equations of the
coefficients are also valid for the original representations of steps 3.
and 4.</li>
<li>Multiply the equations of the coefficients from step 8. from the left by
M^i, and sum up, see chcoeffeq:<br>
sum_i ( M^i x_A U(G(i)) ) = sum_i ( M^i x_A ( p_i *_A I_A) )<br>
This corresponds to (7) in Wikipedia.</li>
<li>Transform the right hand side of the equation in step 9. into an
appropriate form, see cayhamlem3:<br>
sum_i ( p_i *_A M^i )<br>
= sum_i ( M^i x_A ( p_i *_A I_A) ) [see cayhamlem2]<br>
= sum_i ( M^i x_A U(G(i)) ) [see chcoeffeq]</li>
<li>Apply the theorem for telescoping sums, see telgsumfz, to the sum
sum_i ( T_1(M)^i x_Y G(i) ), which results in an equation to 0:<br>
sum_i ( T_1(M)^i x_Y G(i) ) = 0_Y, see cayhamlem1:<br>
sum_i ( T_1(M)^i x_Y G(i) )<br>
= sum<sub>i=1 to s</sub>
( T_1(M)^i x_Y T_1(b(i-1)) - T_1(M)^(i+1) x_Y T_1(b(i)) )<br>
+ ( T_1(M)^(s+1) x_Y T_1(b(s)) - T_1(M) x_Y T_1(b(0)) )
[see chfacfpmmulgsum2]<br>
= ( T_1(M) x_Y T_1(b(0)) - T_1(M)^(s+1) x_Y T_1(b(s)) )
+ ( T_1 M)^(s+1) x_Y T_1(b(s)) - T_1(M) x_Y T_1(b(0)) )
[see telgsumfz]<br>
= 0_Y [see grpnpncan0]
This step corresponds partially to (8) in Wikipedia.</li>
<li>Since is a ring homomorphism (see mat2pmatrhm), the left hand
side of the equation in step 10. can be transformed into a representation
appropriate to apply the result of step 11., see cayhamlem4:<br>
sum_i ( p_i *_A M^i ) <br>
= sum_i ( M^i x_A U(G(i)) ) [see cayhamlem3 (step 10.)] <br>
= U(T_1(sum_i ( M^i x_A U(G(i)) ))) [see m2cpminvid] <br>
= U(sum_i T_1( M^i x_A U(G(i)) )) [see gsummptmhm] <br>
= U(sum_i ( T_1(M^i) x_Y T_1(U(G(i))) )) [see rhmmul] <br>
= U(sum_i ( T_1(M)^i x_Y T_1(U(G(i))) )) [see mhmmulg] <br>
= U(sum_i ( T_1(M)^i x_Y G(i) )) [see m2cpminvid2] <br>
</li>
<li>Finally, combine the results of steps 11. and 12., and use the fact that
(and therefore also its inverse ) is an injective ring
homomorphism (see mat2pmatf1 and mat2pmatrhm) to transform the
equality resulting from steps 11. and 12. into the desired equation
sum_i ( p_i *_A M^i ) = 0_A , see cayleyhamilton resp.
cayleyhamilton0:<br>
sum_i ( p_i *_A M^i ) <br>
= U(sum_i ( T_1(M)^i x_Y G(i) )) [see cayhamlem4 (step 12.)]<br>
= U(0_Y ) [see cayhamlem1 (step 11.)]<br>
= 0_A [see m2cpminv0]</li>
</ol>
The transformations in steps 5., 6., 10., 12. and 13. are not mentioned in the
proof provided in Wikipedia, since it makes no distinction between a matrix
over a ring and its representation as matrix over the polynomial ring
over the ring in general!