Matrix inverse

We have already briefly defined that the inverse of a matrix $\mathbf{A}$ is denoted as $\mathbf{A}^{-1}$ and satisfies the equation:

\[\mathbf{A}\mathbf{A}^{-1} = \mathbf{I} \]

where $\mathbf{I}$ is the identity matrix. This means that when we multiply a matrix by its inverse, we get the identity matrix, so it acts like the multiplicative inverse. Why do we care about the inverse of a matrix? The inverse is useful for solving linear equations of the form $\mathbf{A}\mathbf{x} = \mathbf{b}$, where $\mathbf{A}$ is an $n \times n$ square matrix, $\mathbf{x}$ is a column vector of unknowns, and $\mathbf{b}$ is a column vector. If $A$ is invertible (i.e., has an inverse), we can “divide both sides” by $A$ by multiplying both sides by $A^{-1}$:

\[\mathbf{A}^{-1}\mathbf{A}\mathbf{x} = \mathbf{A}^{-1}\mathbf{b} \]

Since $A^{-1}A = I$, the identity matrix, this simplifies to

\[\mathbf{x} = \mathbf{A}^{-1}\mathbf{b} \]

So the inverse gives us a direct formula for solving $A\mathbf{x} = \mathbf{b}$, provided the inverse exists. However, if the inverse exists, then it must be unique.

Proof

Let’s prove that the inverse of a matrix is unique. We assume we have two inverses $\mathbf{A}^{-1}$ and $\mathbf{B}^{-1}$ for the matrix $\mathbf{A}$. Then we can see that:

\[\begin{align*} \mathbf{A} \cdot \mathbf{A}^{-1} &= \mathbf{I} \\ \mathbf{A} \cdot \mathbf{B}^{-1} &= \mathbf{I} \end{align*} \]

Now we can multiply both sides of the second equation by $\mathbf{A}^{-1}$:

\[\begin{align*} \mathbf{A} \cdot \mathbf{B}^{-1} \cdot \mathbf{A}^{-1} &= \mathbf{I} \cdot \mathbf{A}^{-1} \\ \mathbf{B}^{-1} &= \mathbf{A}^{-1} \end{align*} \]

So we can see that the two inverses are equal and thus the inverse of a matrix is unique.

Now the question is when does the inverse exist? For this let’s take a step back and start with a simple $1 \times 1$ matrix, which is just a normal number, say $a \in \mathbb{R}$. The inverse of $a$ is the number $a^{-1}$ such that:

\[a \cdot a^{-1} = a^{-1} \cdot a = 1 \]

The only way this is possible is if $a \neq 0$, and in that case $a^{-1} = \frac{1}{a}$. If $a = 0$, then there is no inverse because $0 \cdot x = 0$ for any $x$.

So we have a condition for when the inverse of a $1 \times 1$ matrix exists: it must be non-zero. Now let’s derive a similar condition for $2 \times 2$ matrices. So let’s consider the following general $2 \times 2$ matrix:

\[\mathbf A=\begin{bmatrix}a & b\\ c & d\end{bmatrix} \]

We want $\mathbf A^{-1}$ such that $\mathbf A\mathbf A^{-1}=\mathbf I$. So in other words, we want to find a matrix that satisfies the following:

\[\mathbf{A}\mathbf{A}^{-1} = \begin{bmatrix}a & b\\ c & d\end{bmatrix}\begin{bmatrix}w & x\\ y & z\end{bmatrix} = \begin{bmatrix}aw +by & ax+bz\\ cw+dy & cx+dz\end{bmatrix} = \begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix} \]

Calculating out the multiplications and equating to the identity matrix gives us the following system of equations:

\[\begin{vmatrix} aw + by = 1 \\ ax + bz = 0 \\ cw + dy = 0 \\ cx + dz = 1 \end{vmatrix}\]

This system of equations can be solved using various methods, such as substitution or elimination. We will first use the first and third equation to solve for $w$ and $y$:

\[\begin{align*} cw + dy &= 0 \\ dy &= -cw \\ y &= -\frac{c}{d}w \quad (\text{assuming } d \neq 0) \\ aw + b\left(-\frac{c}{d}w\right) &= 1 \\ aw - \frac{bc}{d}w &= 1 \\ w\left(a - \frac{bc}{d}\right) &= 1 \\ w\left(ad - bc\right) &= d \\ w &= \frac{d}{ad - bc} \quad (\text{assuming } ad - bc \neq 0) \end{align*} \]

Now we can substitute $w$ back into the equations to find $y$:

\[\begin{align*} y &= -\frac{c}{d}w \\ y &= -\frac{c}{d}\left(\frac{d}{ad - bc}\right) \\ y &= -\frac{c}{ad - bc} \quad (\text{assuming } ad - bc \neq 0) \end{align*} \]

So we have $w$ and $y$ in terms of $a, b, c, d$. Now we can use the second and fourth equations to find $x$ and $z$:

\[\begin{align*} ax + bz = 0 \\ bz = -ax \\ z = -\frac{a}{b}x \quad (\text{assuming } b \neq 0) \\ cx + dz = 1 \\ cx + d\left(-\frac{a}{b}x\right) = 1 \\ cx - \frac{da}{b}x = 1 \\ x\left(c - \frac{da}{b}\right) = 1 \\ x\left(cb - da\right) = b \\ x = \frac{b}{ad - bc} \quad (\text{assuming } ad - bc \neq 0) \end{align*} \]

Now substituting $x$ back into the equation for $z$ gives us:

\[\begin{align*} z &= -\frac{a}{b}x \\ z &= -\frac{a}{b}\left(\frac{b}{ad - bc}\right) \\ z &= -\frac{a}{ad - bc} \quad (\text{assuming }ad - bc \neq 0) \end{align*} \]

Notice that we can factor out $ad - bc$ from the denominators of $w, x, y, z$. This leads us to the final formula for the inverse of a $2 \times 2$ matrix:

\[\mathbf A^{-1} = \begin{bmatrix} w & x\\ y & z\end{bmatrix} = \begin{bmatrix} \frac{d}{ad - bc} & -\frac{b}{ad - bc}\\ -\frac{c}{ad - bc} & \frac{a}{ad - bc}\end{bmatrix} = \frac{1}{ad - bc}\begin{bmatrix} d & -b\\ -c & a\end{bmatrix} \]

where $ad - bc$ is the so-called determinant of the $2 \times 2$ matrix $\mathbf A$. So the determinant must be non-zero for the inverse to exist. If the rank of the matrix is $0$ then we have the zero matrix and the inverse does not exist because we have $ad - bc = 0$. If the rank is $1$, so their is some linear dependence between the rows or columns, then we also have $ad - bc = 0$ because of the following:

\[\begin{align*} b = \lambda a \\ d = \lambda c \\ ad - bc &= a(\lambda c) - (\lambda a)c \\ &= \lambda ac - \lambda ac \\ &= 0 \end{align*} \]

So for the inverse to exist, the matrix must be full rank (rank 2) and square (2x2). This generalizes to larger matrices as well, where the inverse exists if and only if the matrix is square and has full rank, which also means that the determinant is non-zero, so $\operatorname{det}(\mathbf A) \neq 0$.

Todo

Above is a special case of MCA algorithm?

How can we generalize this to larger matrices? The key is that the inverse exists if and only if the matrix is square and has full rank, which means that the determinant is non-zero. We will explore this in more detail in the next sections?

We have seen that in general the matrix multiplication is not commutative, i.e., $\mathbf A\mathbf B \neq \mathbf B\mathbf A$, but it is associative, i.e., $\mathbf A(\mathbf B\mathbf C) = (\mathbf A\mathbf B)\mathbf C$. However, the inverse of a matrix has some special properties that we will explore in the next sections, one of which is that the inverse can be multiplied on either side of the equation so we have:

\[\mathbf{A}^{-1}\mathbf{A} = \mathbf{I} = \mathbf{A}\mathbf{A}^{-1} \]

Proof

This follows from the proof that the matrix inverse is unique. Let’s assume we have two inverses $\mathbf{B}$ and $\mathbf{C}$ for the matrix $\mathbf{A}$ such that:

\[\mathbf{A}\mathbf{B} = \mathbf{I} = \mathbf{C}\mathbf{A} \]

Then we can get the following using the associative property of matrix multiplication:

\[\begin{align*} \mathbf{AB} = \mathbf{I} \\ \mathbf{C}(\mathbf{AB}) &= \mathbf{C}\mathbf{I} \\ (\mathbf{C}\mathbf{A})\mathbf{B} &= \mathbf{C} \\ \mathbf{IB} = \mathbf{C} \\ \mathbf{B} &= \mathbf{C} \end{align*} \]

Another key property is that the inverse of the inverse is the original matrix, i.e.,

\[(\mathbf{A}^{-1})^{-1} = \mathbf{A} \]

Proof

So we want to find a matrix $\mathbf{B}$ such that:

\[\mathbf{A}^{-1}\mathbf{B} = \mathbf{I} = \mathbf{B}\mathbf{A}^{-1} \]

However, by definition of the inverse and showing that the inverse can be multiplied on either side, we have:

\[\begin{align*} \mathbf{A}^{-1}(\mathbf{A}^{-1})^{-1} &= \mathbf{I} \\ \mathbf{A}^{-1}\mathbf{B} &= \mathbf{I} \\ \mathbf{A}^{-1}\mathbf{A} &= \mathbf{I} \\ \mathbf{B} &= \mathbf{A} \end{align*} \]

We can show the same for the right side. Which means that the inverse of the inverse is the original matrix.

Another simple but useful property, especially when working with symmetric matrices, is that the transpose of the inverse is the inverse of the transpose:

\[(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T \]

Proof

First let’s start with the definition and take the transpose of both sides:

\[\begin{align*} \mathbf{A}\mathbf{A}^{-1} &= \mathbf{I} \\ (\mathbf{A}\mathbf{A}^{-1})^T &= \mathbf{I}^T \end{align*} \]

the transpose of the identity matrix is itself, and using the property of the transpose of a product where we have to reverse the order of multiplication, we get:

\[\begin{align*} (\mathbf{A}\mathbf{A}^{-1})^T &= \mathbf{I}^T \\ (\mathbf{A}^{-1})^T\mathbf{A}^T &= \mathbf{I} \\ \end{align*} \]

So $(\mathbf{A}^{-1})^T$ is a right inverse of $\mathbf{A}^T$. But we can also show that it is a left inverse and therefore the inverse of the transpose is two-sided:

\[\begin{align*} (A^{-1}A)^T &= I \\ (\mathbf{A}^{-1})^T\mathbf{A}^T &= \mathbf{I} \\ \mathbf{A}^T(\mathbf{A}^{-1})^T &= \mathbf{I} \end{align*} \]

So we have shown that $(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T$. In other words the inverse of the transpose is the transpose of the inverse.

For the transpose of a product, we have the property that:

\[(\mathbf{A}\mathbf{B})^T = \mathbf{B}^T\mathbf{A}^T \]

For the inverse of a product, we also have a similar property, so the inverse of a product is the product of the inverses in reverse order:

\[(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1} \]

This also has the key consequence that if we have two invertible matrices $\mathbf{A}$ and $\mathbf{B}$, then their product $\mathbf{A}\mathbf{B}$ is also invertible, and its inverse is given by the product of the inverses in reverse order. This also means that the following rank inequality:

\[\operatorname{rank}(\mathbf{A}\mathbf{B}) \leq \min(\operatorname{rank}(\mathbf{A}), \operatorname{rank}(\mathbf{B})) \]

Is equal if both matrices are invertible, so we have for the $n \times n$ matrices

\[\operatorname{rank}(\mathbf{A}\mathbf{B}) = \operatorname{rank}(\mathbf{A}) = \operatorname{rank}(\mathbf{B}) = n \]

Proof

We can show this using associative property of matrix multiplication. Let’s assume we have two matrices $\mathbf{A}$ and $\mathbf{B}$ that are invertible, so they have inverses $\mathbf{A}^{-1}$ and $\mathbf{B}^{-1}$. We can then verify that the inverse of the product is the product of the inverses in reverse order:

\[\begin{align*} \mathbf{I} &= (\mathbf{A}\mathbf{B})(\mathbf{B}^{-1}\mathbf{A}^{-1}) \\ &= \mathbf{A}(\mathbf{B}\mathbf{B}^{-1})\mathbf{A}^{-1} \\ &= \mathbf{A}\mathbf{I}\mathbf{A}^{-1} \ &= \mathbf{A}\mathbf{A}^{-1} \\ &= \mathbf{I} \end{align*} \]

Similarly for the other side:

\[\begin{align*} \mathbf{I} &= (\mathbf{B}^{-1}\mathbf{A}^{-1})(\mathbf{A}\mathbf{B}) \\ &= \mathbf{B}^{-1}(\mathbf{A}^{-1}\mathbf{A})\mathbf{B} \\ &= \mathbf{B}^{-1}\mathbf{I}\mathbf{B} \\ &= \mathbf{B}^{-1}\mathbf{B} \\ &= \mathbf{I} \end{align*} \]

Todo

We can also derive this property from linear transformations.

The last property to show is that the inverse of a sum of matrices is not equal to the sum of the inverses, i.e.,

\[(\mathbf{A} + \mathbf{B})^{-1} \neq \mathbf{A}^{-1} + \mathbf{B}^{-1} \]

Proof

To prove this statement, we can use a simple counterexample with two invertible matrices:

\[\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} 3 & 0 \\ 0 & 4 \end{bmatrix}\]

Compute their inverses:

\[\mathbf{A}^{-1} = \begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{2} \end{bmatrix}, \quad \mathbf{B}^{-1} = \begin{bmatrix} \frac{1}{3} & 0 \\ 0 & \frac{1}{4} \end{bmatrix}\]

Compute the sum of inverses:

\[\mathbf{A}^{-1} + \mathbf{B}^{-1} = \begin{bmatrix} 1 + \frac{1}{3} & 0 \\ 0 & \frac{1}{2} + \frac{1}{4} \end{bmatrix} = \begin{bmatrix} \frac{4}{3} & 0 \\ 0 & \frac{3}{4} \end{bmatrix}\]

Now compute the sum $\mathbf{A} + \mathbf{B}$ and its inverse:

\[\begin{align*} \mathbf{A} + \mathbf{B} = \begin{bmatrix} 1 + 3 & 0 \\ 0 & 2 + 4 \end{bmatrix} = \begin{bmatrix} 4 & 0 \\ 0 & 6 \end{bmatrix} \\ (\mathbf{A} + \mathbf{B})^{-1} = \begin{bmatrix} \frac{1}{4} & 0 \\ 0 & \frac{1}{6} \end{bmatrix} \end{align*}\]

Clearly,

\[(\mathbf{A} + \mathbf{B})^{-1} = \begin{bmatrix} \frac{1}{4} & 0 \\ 0 & \frac{1}{6} \end{bmatrix} \neq \begin{bmatrix} \frac{4}{3} & 0 \\ 0 & \frac{3}{4} \end{bmatrix} = \mathbf{A}^{-1} + \mathbf{B}^{-1}\]

Thus, in general the inverse of the sum is not equal to the sum of the inverses.

Inverse Theorem

We can summarize the above conditions for the existence of an inverse in the so-called Inverse Theorem. This theorem states that for an $n\times n$ matrix $\mathbf{A}$ the following statements are equivalent:

$\mathbf{A}$ is invertible ($\exists\,\mathbf{A}^{-1}$).
$\operatorname{rank}(\mathbf{A})=n$ (full rank).
The columns of $\mathbf{A}$ are linearly independent.
The rows of $\mathbf{A}$ are linearly independent.
For every $\mathbf{b}\in\mathbb{R}^n$ the system $\mathbf{A}\mathbf{x}=\mathbf{b}$ has a unique solution $\mathbf{x}=\mathbf{A}^{-1}\mathbf{b}$.
The linear transformation $\mathbf{A}:\mathbb{R}^n\to\mathbb{R}^n$ is a bijection (one-to-one and onto).

Inverse of Diagonal Matrices

We know that the inverse of the identity matrix is itself, i.e., $\mathbf{I}^{-1} = \mathbf{I}$. For a diagonal matrix, the inverse is almost just as easy as if we look at the multiplication of a diagonal matrix with an arbitrary matrix then the result is just the columns of the arbitrary matrix scaled by the diagonal elements. So to get the identity matrix we just need to have columns that once scaled become the identity matrix. Let’s look at the $3 \times 3$ diagonal matrix:

\[\begin{bmatrix} d_1 & 0 & 0 \\ 0 & d_2 & 0 \\ 0 & 0 & d_3 \end{bmatrix}\begin{bmatrix} x_1 & x_2 & x_3 \\ y_1 & y_2 & y_3 \\ z_1 & z_2 & z_3 \end{bmatrix} = \begin{bmatrix} d_1x_1 & d_1x_2 & d_1x_3 \\ d_2y_1 & d_2y_2 & d_2y_3 \\ d_3z_1 & d_3z_2 & d_3z_3 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

From this we get the following equations:

\[\begin{vmatrix} d_1x_1 = 1 \\ d_2y_2 = 1 \\ d_3z_3 = 1 \\ d_1x_2 = 0 \\ \vdots \\ d_3z_2 = 0 \end{vmatrix} \]

To solve the equations where we need a zero we can just set the corresponding variable to zero, e.g., $x_2 = 0$, $y_1 = 0$, etc. For the equations where we need a one, we can solve for the variable in terms of the diagonal elements, e.g., $x_1 = \frac{1}{d_1}$, $y_2 = \frac{1}{d_2}$, etc. This gives us the inverse of the diagonal matrix:

\[\begin{bmatrix} d_1 & 0 & 0 \\ 0 & d_2 & 0 \\ 0 & 0 & d_3 \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{d_1} & 0 & 0 \\ 0 & \frac{1}{d_2} & 0 \\ 0 & 0 & \frac{1}{d_3} \end{bmatrix} \]

So the inverse of a diagonal matrix is simply the diagonal matrix with the reciprocal of the diagonal elements. However, this only works if all diagonal elements are non-zero, otherwise the inverse does not exist as we would have division by zero.

Inverse of Symmetric Matrices

Apart from diagonal matrices we also seen symmetric matrices as a special case of matrices. A matrix is symmetric if it is equal to its transpose, i.e., $\mathbf{A} = \mathbf{A}^T$. This already has a good precondition for the inverse, because for this to hold the matrix must be square, so it has the same number of rows and columns.

Let’s assume the matrix is symmetric and has an inverse, then it turns out that the inverse is also symmetric. So in other words, if $\mathbf{A}$ is a symmetric matrix and $\mathbf{A}^{-1}$ exists, then:

\[(\mathbf{A}^{-1})^T = \mathbf{A}^{-1} \]

Proof

To show this we can use the properties of the inverse and the transpose that we have already established. Specifically, we have already shown that the inverse of the transpose is the transpose of the inverse:

\[(\mathbf{A}^{-1})^T = (\mathbf{A}^T)^{-1} \]

Since $\mathbf{A}$ is symmetric, we have $\mathbf{A}^T = \mathbf{A}$. Therefore, we can substitute this into the equation and get:

\[(\mathbf{A}^{-1})^T = (\mathbf{A})^{-1} \]

However, not every symmetric matrix is invertible. For a symmetric matrix to be invertible, it must also be full rank. If the matrix is symmetric and has full rank, then it has an inverse that is also symmetric.

Example

The following symmetric matrix is not full rank and therefore not invertible:

\[\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \]

As we can see that the second row is a multiple of the first row, so the matrix is not full rank and the transpose of the matrix is:

\[\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}^T = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \]

Inverse of Triangular Matrices

Let’s now look at the inverse of triangular matrices. As with diagonal matrices, triangular matrices (if invertible) have a particularly nice structure for their inverse. Let $\mathbf{L}$ be an $n\times n$ lower triangular matrix with entries $l\_{ij}$, that is,

\[l_{ij} = 0 \qquad \text{if } i < j. \]

Suppose that $\mathbf{L}$ is invertible, and denote its inverse by $\mathbf{M} = \mathbf{L}^{-1}$, with entries $m\_{ij}$. Then it turns out that $\mathbf{M}$ is also lower triangular. To see this let us look at a explicit calculation:

\[\mathbf{L} = \begin{bmatrix} a & 0 & 0 \\ b & c & 0 \\ d & e & f \end{bmatrix} \quad \mathbf{M} = \begin{bmatrix} m_{11} & 0 & 0 \\ m_{21} & m_{22} & 0 \\ m_{31} & m_{32} & m_{33} \end{bmatrix} \]

We solve $\mathbf{L}\mathbf{M} = \mathbf{I}$. We can do this step by step by writing out the equations for each column of $\mathbf{M}$. The first column of $\mathbf{M}$ must satisfy:

\[\begin{bmatrix} a & 0 & 0 \\ b & c & 0 \\ d & e & f \end{bmatrix} \begin{bmatrix} m_{11} \\ m_{21} \\ m_{31} \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \]

We then can write out the equations for each row of the first column:

Row 1: $a m\_{11} = 1 \implies m\_{11} = \frac{1}{a}$
Row 2: $b m\_{11} + c m\_{21} = 0 \implies m\_{21} = -\frac{b}{c} m\_{11} = -\frac{b}{a c}$
Row 3: $d m\_{11} + e m\_{21} + f m\_{31} = 0 \implies m\_{31} = -\frac{d m\_{11} + e m\_{21}}{f} = -\frac{d/a + e \cdot (-b/(a c))}{f} = -\frac{d}{a f} + \frac{e b}{a c f}$

And so on for the other columns. We can always solve the system by back substitution for triangular matrices. For the second column, we have:

Row 1: $a m\_{12} = 0 \implies m\_{12} = 0$
Row 2: $b m\_{12} + c m\_{22} = 1 \implies m\_{22} = \frac{1}{c}$
Row 3: $d m\_{12} + e m\_{22} + f m\_{32} = 0 \implies m\_{32} = -\frac{e}{c f}$

And for the third column:

Row 1: $a m\_{13} = 0 \implies m \_{13} = 0$
Row 2: $b m\_{13} + c m\_{23} = 0 \implies m\_{23} = 0$
Row 3: $d m\_{13} + e m\_{23} + f m\_{33} = 1 \implies m\_{33} = \frac{1}{f}$

Thus, we can see that the inverse $\mathbf{M}$ is also lower triangular and for the $3 \times 3$ case we have:

\[\mathbf{M} = \begin{bmatrix} \frac{1}{a} & 0 & 0 \\ -\frac{b}{a c} & \frac{1}{c} & 0 \\ -\frac{d}{a f} + \frac{e b}{a c f} & -\frac{e}{c f} & \frac{1}{f} \end{bmatrix} $$ Notice that, when solving for $m\_{ii}$, we get the equation: ```math l_{ii} m_{ii} = 1 \implies m_{ii} = \frac{1}{l_{ii}} \]

Therefore, if any of the diagonal elements $l\_{ii} = 0$, then the fraction is not defined and $\mathbf{L}$ is not invertible. If all $l\_{ii} \neq 0$, then $\mathbf{L}$ is invertible and its inverse is also lower triangular. Later on this will also match our intuition that the determinant of a triangular matrix is the product of its diagonal elements and if the determinant is zero (so one of the diagonal elements is zero), then the matrix is not invertible.

We can also show that the inverse of an upper triangular matrix is also upper triangular. The proof is similar to the one above:

\[\mathbf{U} = \begin{bmatrix} a & b & c \\ 0 & d & e \\ 0 & 0 & f \end{bmatrix} \begin{bmatrix} m_{11} \\ m_{21} \\ m_{31} \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \]

which leads to the equations (here starting with the last row makes it easier):

Row 3: $f m\_{31} = 0 \implies m\_{31} = 0$
Row 2: $d m\_{21} + e m\_{31} = 0 \implies m\_{21} = 0$
Row 1: $a m\_{11} + b m\_{21} + c m\_{31} = 1 \implies m\_{11} = \frac{1}{a}$

etc. for the other columns. Putting this all together, we get:

\[\mathbf{M} = \begin{bmatrix} \frac{1}{a} & -\frac{b}{a d} & -\frac{c}{a f} + \frac{b e}{a d f} \\ 0 & \frac{1}{d} & -\frac{e}{d f} \\ 0 & 0 & \frac{1}{f} \end{bmatrix} \]

This shows that the inverse of an upper triangular matrix is also upper triangular. So the key takeaway is that if a triangular matrix is invertible, then its inverse is also triangular (of the same type) and that a triangular matrix is invertible if and only if all its diagonal elements are non-zero.

Nilpotent Matrices

We know that from the properties of the rank that the rank of a product of two matrices is at most the minimum of the ranks of the two matrices:

\[\operatorname{rank}(\mathbf{A}\mathbf{B}) \leq \min(\operatorname{rank}(\mathbf{A}), \operatorname{rank}(\mathbf{B})) \]

But what if we know rather than taking a different matrix, we take the same matrix multiple times, so what is the rank of $\mathbf{A}^2$. From the property of it would follow that the rank either stays the same or decreases, i.e.,

\[\operatorname{rank}(\mathbf{A}^2) \leq \operatorname{rank}(\mathbf{A}) \]

An interesting case is when the matrix is full rank, so in other words, the matrix is invertible. We have already seen that the product of two invertible matrices is also invertible, so we have:

\[(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1} \]

If we know take $\mathbf{B} = \mathbf{A}$, then we have:

\[(\mathbf{A}^2)^{-1} = \mathbf{A}^{-1}\mathbf{A}^{-1} = (\mathbf{A}^{-1})^2 \]

However, we can also generalize to higher powers of the matrix, so for any integer $k \geq 1$. In this case, we can show that inverse of the power of the matrix is the power of the inverse of the matrix, i.e.,

\[(\mathbf{A}^k)^{-1} = (\mathbf{A}^{-1})^k \]

this would mean that if the matrix is full rank we have equality as for a matrix to be invertible it must be full rank, i.e.,

\[\operatorname{rank}(\mathbf{A}^k) = \operatorname{rank}(\mathbf{A}) \]

Proof

We can prove this by induction, for $k=1$ the case is obvious as we have $(\mathbf{A}^1)^{-1} = \mathbf{A}^{-1}$. We then assume that the statement holds for some $i \geq 1$ as our induction hypothesis, i.e.,

\[(\mathbf{A}^i)^{-1} = (\mathbf{A}^{-1})^i \]

Now we want to show that the statement holds for $i+1$, so we have:

\[\mathbf{A}^{i+1} = \mathbf{A} \cdot \mathbf{A}^i \]

Both $\mathbf{A}$ and $\mathbf{A}^i$ are invertible by our base case and induction hypothesis, so we can take the inverse of the product:

\[(\mathbf{A}^{i+1})^{-1} = (\mathbf{A}^i)^{-1} \cdot \mathbf{A}^{-1} = (\mathbf{A}^{-1})^i \mathbf{A}^{-1} = (\mathbf{A}^{-1})^{i+1} \]

Thus, by induction, $\mathbf{A}^k$ is invertible for all $k \geq 1$.

Example

Let’s take a simple example of a $2 \times 2$ matrix:

\[\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]

The inverse of this matrix is:

\[\mathbf{A}^{-1} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} = \frac{1}{-2} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix} \]

Now, let’s compute the power of the matrix:

\[\mathbf{A}^2 = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix} \]

The inverse of this matrix is:

\[\mathbf{A}^{-2} = \left(\mathbf{A}^2\right)^{-1} = \frac{1}{7 \cdot 22 - 10 \cdot 15} \begin{bmatrix} 22 & -10 \\ -15 & 7 \end{bmatrix} = \begin{bmatrix} 5.5 & -2.5 \\ -3.75 & 1.75 \end{bmatrix} \]

Now, let’s compute the inverse of the original matrix raised to the power of 2:

\[(\mathbf{A}^{-1})^2 = \begin{bmatrix}-2 & 1 \\ 1.5 & -0.5 \end{bmatrix} \begin{bmatrix}-2 & 1 \\ 1.5 & -0.5 \end{bmatrix} = \begin{bmatrix} 4 + 1.5 & -2 - 0.5 \\ -3 - 0.75 & 1.5 + 0.25 \end{bmatrix} = \begin{bmatrix} 3.5 & -2.5 \\ -3.75 & 1.75 \end{bmatrix} \]

So as we can see, the inverse of the power of the matrix is equal to the power of the inverse of the matrix.

So once a matrix is invertible, it stays invertible for all powers $k \geq 1$, and the inverse of the power is the power of the inverse. However, on the other side if a matrix is not invertible, then taking the power of the matrix will never make it invertible, it will either stay not invertible or become even less invertible, i.e., the rank will decrease or stay the same.

Intuitively, you can think of this as the matrix being “stuck” in a state where it cannot be inverted, and taking powers of it will not change that state, only make it “more stuck” in a sense, as the rank will decrease or stay the same. Some matrices are so “stuck” that they will eventually become the zero matrix so have a rank of zero, this is called a nilpotent matrix. We define a nilpotent matrix as a square matrix $\mathbf{A}$ such that for some positive integer $k$,

\[\mathbf{A}^k = \mathbf{O} \]

where $\mathbf{O}$ is the zero matrix. We already know that a nilpotent matrix is not invertible, but let’s prove this formally.

Proof

Suppose $\mathbf{A}$ is nilpotent, so $\mathbf{A}^k = \mathbf{0}$ for some $k \geq 1$. We claim $\mathbf{A}$ is not invertible. We do this by contradiction. Let’s assume that $\mathbf{A}$ is invertible. Then by the result above, $\mathbf{A}^k$ is also invertible. However, we have:

\[\mathbf{A}^k = \mathbf{0} \]

So the zero matrix is invertible, i.e., there exists $\mathbf{B}$ such that

\[\mathbf{0} \mathbf{B} = \mathbf{I} \]

But this is impossible, since $\mathbf{0} \mathbf{B} = \mathbf{0}$ for any $\mathbf{B}$, and $\mathbf{0} \neq \mathbf{I}$. So we have a contradiction, which means our assumption that $\mathbf{A}$ is invertible must be false and it is therefore not invertible.

Example

Let

\[\mathbf{A} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \]

Then

\[\mathbf{A}^2 = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 0 \cdot 0 + 1 \cdot 0 & 0 \cdot 1 + 1 \cdot 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \]

So $\mathbf{A}$ is nilpotent ($\mathbf{A}^2 = 0$).

Warning

However, don’t think now that all matrices that are not invertible are nilpotent, this is not the case. A matrix can be not invertible and not nilpotent! Intuitively you can quickly come up with an example of a matrix that is not nilpotent, but not invertible, the most simple example could be:

\[\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \]

This matrix is not inveratable as it is not full rank, but it is also not nilpotent as it does not become the zero matrix for any power $k \geq 1$ as it always stays the same:

\[\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \]

Left and Right Inverse for Non-Square Matrices

Todo

These are explained in the pseudo inverse section.

not two-sided like normal inverse, only one sided

if matrix is Tall then we can have a left inverse, if matrix is Wide then we can have a right inverse.

\[(T^TT)^{-1}T^TT=I \]

T^TT needs to be full rank, this is the case if T has full column rank. Show why this. what is the derivation of this formula?

Matrix inverse

Inverse Theorem

Inverse of Diagonal Matrices

Inverse of Symmetric Matrices

Inverse of Triangular Matrices

Nilpotent Matrices

Left and Right Inverse for Non-Square Matrices

Inverse via Cofactor Matrix

Inverse via Gaussian Elimination

Inverse via Determinant

Inverse via Eigenvalues

Inverse via SVD