Matrices

You can think of a matrix as rectangle of objects arranged in rows and columns, like a chessboard. For computer scientist you can also imagine a matrix as a 2D array. A matrix with \(m\) rows and \(n\) columns is called an \(m \times n\) matrix. The number of rows and columns therefore define the size or dimensions of a matrix. The objects in a matrix are commonly referred to as elements or entries. A matrix containing only real numbered elements is called a real matrix and can be defined as follows:

\[\mathbf{A} \in \mathbb{R}^{m \times n} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \]

The elements of a matrix are indexed by their row and then there column. So \(a_{ij}\) or sometimes also \((\mathbf{A})_{ij}\) is the element in the \(i\)th row and \(j\)th column. This means that a matrix can also be defined as \(\mathbf{A} = [a_{ij}]_{1 \leq i \leq m, 1 \leq j \leq n}\).

Example \[\begin{align*} \mathbf{A} = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \\ \mathbf{B} = \begin{bmatrix}1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \end{align*} \]

Warning

The number of rows and columns of a matrix are always written in the order of rows first and columns second! So a matrix with 2 rows and 3 columns is written as a \(2 \times 3\) matrix, not a \(3 \times 2\) matrix, so \(\mathbf{A} \in \mathbb{R}^{2 \times 3}\).

A matrix with only one row is called a row vector or n-tuple and a matrix with only one column is called a column vector. When talking about vectors we usually refer to column vectors. A matrix that only has one row and one column can be thought of as a normal number and is called a scalar and therefore we omit the matrix brackets. To differentiate between the three types of matrices we usually use bold capital letters for matrices, bold lower case letters for column vectors and normal lower case letters for scalars.

Example

A column vector, i.e. ur usual vector, can be written as follows:

\[\mathbf{v} \in \mathbb{R}^{n \times 1} = \begin{bmatrix}v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \]

A row vector can be written as follows:

\[\mathbf{v} \in \mathbb{R}^{1 \times n} = \begin{bmatrix}v_1 & v_2 & \cdots & v_n \end{bmatrix} \]

A scalar can be written as follows:

\[\mathbf{v} \in \mathbb{R}^{1 \times 1} = \begin{bmatrix}v \end{bmatrix} = v \]

Outer Product

We can actually construct a matrix from two vectors. This is called the outer product of two vectors. The outer product is different from the dot product which results in a scalar. Where the dot product is multiply a row vector with a column vector, the outer product is multiplying a column vector with a row vector. The result of the outer product is a matrix where each element is the product of the corresponding elements in the two vectors.

So if we have two vectors \(\mathbf{x} \in \mathbb{R}^{m \times 1}\) and \(\mathbf{y} \in \mathbb{R}^{n \times 1}\), then we can multiply them together as follows to get a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\). The outer product can be denoted as a matrix multiplication or as with the symbol \(\otimes\).

\[\mathbf{A} = \mathbf{x}\mathbf{y}^T = \mathbf{x} \otimes \mathbf{y} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} \begin{bmatrix} y_1 & y_2 & \dots & y_n \end{bmatrix} = \begin{bmatrix} x_1y_1 & x_1y_2 & \dots & x_1y_n \\ x_2y_1 & x_2y_2 & \dots & x_2y_n \\ \vdots & \vdots & \ddots & \vdots \\ x_my_1 & x_my_2 & \dots & x_my_n \end{bmatrix} \]

Or more formally:

\[(\mathbf{A})_{ij} = (\mathbf{x}\mathbf{y}^T)_{ij} = (\mathbf{x} \otimes \mathbf{y})_{ij} = x_iy_j \]

From above we can see that the outer product of two vectors results in a matrix where the columns are the first vector scaled by the components of the second vector and the rows are the second vector scaled by the components of the first vector. So the matrix forms a dependent set of vectors, i.e. the columns/rows of the matrix are linearly dependent. Because the size of largest set of linearly independent vectors is 1, the rank of the matrix is 1.

Example \[\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \begin{bmatrix} 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 1 \cdot 4 & 1 \cdot 5 & 1 \cdot 6 \\ 2 \cdot 4 & 2 \cdot 5 & 2 \cdot 6 \\ 3 \cdot 4 & 3 \cdot 5 & 3 \cdot 6 \end{bmatrix} = \begin{bmatrix} 4 & 5 & 6 \\ 8 & 10 & 12 \\ 12 & 15 & 18 \end{bmatrix} \]

Square Matrix

As you can imagine a square matrix is a matrix where the number of rows and columns are equal. The number of rows or columns \(n\) is referred to as the order of the square matrix. So a square matrix of order \(n\) is an \(n \times n\) matrix, i.e. \(\mathbf{A} \in \mathbb{R}^{n \times n}\). Square matrices have a number of useful properties and are therefore often used in many different applications.

\[\mathbf{A} \in \mathbb{R}^{n \times n} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \]

Example \[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]

Null/Zero Matrix

If a matrix only contains elements that are zero it is called a zero or null matrix. A null matrix is denoted as \(\mathbf{O}\) as the dimensions are usually implied by the context. If you want to be explicit you can also write \(\mathbf{O}_{m \times n}\) where \(m\) is the number of rows and \(n\) is the number of columns.

\[\mathbf{O} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \]

If the matrix is a column vector we usually use the lower case letter \(\mathbf{o}\) to denote a null or zero vector.

\[\mathbf{o} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} \]

Diagonal Matrix

A diagonal matrix is a square matrix where all the elements that are not on the main diagonal are zero. The main diagonal is the diagonal that goes from the top left to the bottom right of the matrix, in other words where the row and column index are equal, i.e. \(a_{ij}\) where \(i = j\).

So a diagonal matrix is where \(a_{ij} = 0\) for all \(i \neq j\) and \(\mathbf{A} \in \mathbb{R}^{n \times n}\). If a diagonal matrix is defined by the following elements \(d_{11}, d_{22}, \ldots, d_{nn}\) then the matrix can be written as:

\[\mathbf{D} = \text{diag}(d_{11}, d_{22}, \ldots, d_{nn}) = \begin{bmatrix} d_{11} & 0 & \cdots & 0 \\ 0 & d_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_{nn} \end{bmatrix} \]

Example \[\mathbf{D} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} = \text{diag}(1, 2, 3) \]

The definition of a diagonal matrix can also be extended to non-square matrices. In this case the matrix is still a diagonal matrix if all the elements that are not on the main diagonal are zero.

Example

The following non-square matrix are still diagonal matrices:

\[\mathbf{D}_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & -3 \\ 0 & 0 & 0 \end{bmatrix} \text{ and } \mathbf{D}_2 = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 \\ 0 & 0 & -3 & 0 & 0 \end{bmatrix} \]

Identity Matrix

An identity matrix sometimes also called unit matrix or identity of order \(n\) is a square diagonal matrix where all the diagonal elements are equal to \(1\). The identity matrix is often denoted as \(\mathbf{I}\) or \(\mathbf{I}_n\) where \(n\) is the number of rows and columns in the matrix.

Example \[\mathbf{I_3} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \text{diag}(1, 1, 1) \]

Matrix Addition

Two matrices can be added together if they have the same dimensions, i.e. \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{B} \in \mathbb{R}^{m \times n}\). The addition of two matrices is defined as the element-wise addition of the matrices:

\[\mathbf{A} + \mathbf{B} = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \end{bmatrix} \]

Or more formally:

\[\mathbf{A} + \mathbf{B} = (\mathbf{A} + \mathbf{B})_{ij} = (\mathbf{A})_{ij} + (\mathbf{B})_{ij} \]

Example \[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 7 & 8 & 9 \\ 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 8 & 10 & 12 \\ 14 & 16 & 18 \end{bmatrix} \]

Additive Identity

As you can expect the addition of any matrix with the null matrix results in the original matrix:

\[\mathbf{A} + \mathbf{O} = \mathbf{A} \]

Additive Inverse

We can also expect to find a matrix \(\mathbf{B}\) that when added to \(\mathbf{A}\) results in the null matrix:

\[\mathbf{A} + \mathbf{B} = \mathbf{O} \]

The matrix \(\mathbf{B}\) is called the additive inverse of \(\mathbf{A}\) and is denoted as \(-\mathbf{A}\) where each element in the matrix is defined as the negative of the corresponding element in the matrix \(\mathbf{A}\):

\[(-\mathbf{A})_{ij} = -(\mathbf{A})_{ij} \]

By using the additive inverse we can also subtract two matrices. The subtraction of two matrices is defined as the addition of the first matrix and the additive inverse of the second matrix:

\[\mathbf{A} - \mathbf{B} = \mathbf{A} + (-\mathbf{B}) \]

Example \[\mathbf{A} - \mathbf{B} = \begin{bmatrix} 7 & 8 & 9 \\ 10 & 11 & 12 \end{bmatrix} - \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 7 - 1 & 8 - 2 & 9 - 3 \\ 10 - 4 & 11 - 5 & 12 - 6 \end{bmatrix} = \begin{bmatrix} 6 & 6 & 6 \\ 6 & 6 & 6 \end{bmatrix} \]

Scalar Multiplication

A matrix can be multiplied by a scalar, i.e. a constant. This is defined as the element-wise multiplication of the matrix with the scalar:

\[s \cdot \mathbf{A} = \begin{bmatrix} s \cdot a_{11} & s \cdot a_{12} & \cdots & s \cdot a_{1n} \\ s \cdot a_{21} & s \cdot a_{22} & \cdots & s \cdot a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ s \cdot a_{m1} & s \cdot a_{m2} & \cdots & s \cdot a_{mn} \end{bmatrix} \]

Or more formally:

\[s \cdot \mathbf{A} = (s \cdot \mathbf{A})_{ij} = s \cdot (\mathbf{A})_{ij} \]

Example \[5 \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 5 \cdot 1 & 5 \cdot 2 & 5 \cdot 3 \\ 5 \cdot 4 & 5 \cdot 5 & 5 \cdot 6 \end{bmatrix} = \begin{bmatrix} 5 & 10 & 15 \\ 20 & 25 & 30 \end{bmatrix} \]

Matrix Multiplication

Matrix multiplication is a bit more complex than addition and scalar multiplication. One would think that multiplying two matrices together would be as simple as multiplying each element of the first matrix with the corresponding element in the second matrix and would therefore just like addition be an element-wise operation and only defined for matrices with the same dimensions. However, this is not the case, this would be the Hadamard product of two matrices. The reason for a more complex definition is due to relationship that the matrix multiplication has with linear transformations to which we will come later.

Matrix multiplication is only defined for matrices where the number of columns in the first matrix is equal to the number of rows in the second matrix. We will see why this is the case later. The result of multiplying two matrices together is a new matrix where the number of rows is equal to the number of rows in the first matrix and the number of columns is equal to the number of columns in the second matrix. So the dimensions in a matrix multiplication are defined as follows:

\[\mathbf{A} \in \mathbb{R}^{\color{red}{m} \times \color{blue}{n}} \text{ and } \mathbf{B} \in \mathbb{R}^{\color{blue}{n} \times \color{green}{p}} \Rightarrow \mathbf{C} = \mathbf{A} \cdot \mathbf{B} \in \mathbb{R}^{\color{red}{m} \times \color{green}{p}} \]

Dimensions of a matrix multiplication visualized.

The actual calculation of the elements in the resulting matrix is a bit more complex. The element in the \(i\)-th row and \(j\)-th column in the resulting matrix is defined as the sum of the products of the elements in the \(i\)-th row of the first matrix and the \(j\)-th column of the second matrix. So the element \(c_{ij}\) in the resulting matrix is calculated as follows:

\[c_{ij} = a_{i1} \cdot b_{1j} + a_{i2} \cdot b_{2j} + \cdots + a_{im} \cdot b_{mj} = \sum_{k=1}^m a_{ik} \cdot b_{kj} \]

or more formally:

\[\mathbf{C} = \mathbf{A} \cdot \mathbf{B} = (\mathbf{A} \cdot \mathbf{B})_{ij} = \sum_{k=1}^m (\mathbf{A})_{ik} \cdot (\mathbf{B})_{kj} = \sum_{k=1}^m a_{ik} \cdot b_{kj} \]

There are three different ways to think about matrix multiplication, the element view, the row view and the column view. The first way would be to think of how a single element in the resulting matrix is calculated. A single element in row \(i\) and column \(j\) is calculated as the sum of the products of the elements in the \(i\)-th row of the first matrix and the \(j\)-th column of the second matrix. Later on you will see that this is the dot product of the \(i\)-th row and the \(j\)-th column as vectors.

Element-wise view of matrix multiplication.

Example \[\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1a + 2c & \quad\quad \\ \quad\quad & \quad\quad \\ \end{bmatrix} \\ &= \begin{bmatrix} 1a + 2c & 1b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*} \]

The second way to think about matrix multiplication is to see how a column in the resulting matrix is calculated. The first column in the resulting matrix corresponds to the linear combination of the columns in the left matrix where the weights are the elements in the first column of the right matrix. This pattern carries on for the other columns in the resulting matrix.

Column-wise view of matrix multiplication.

Example \[\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} a \begin{bmatrix} 1 \\ 3 \end{bmatrix} + b \begin{bmatrix} 2 \\ 4 \end{bmatrix} & \quad\quad\quad\quad\quad \\ \end{bmatrix} \\ &= \begin{bmatrix} a \begin{bmatrix} 1 \\ 3 \end{bmatrix} + b \begin{bmatrix} 2 \\ 4 \end{bmatrix} & c \begin{bmatrix} 1 \\ 3 \end{bmatrix} + d \begin{bmatrix} 2 \\ 4 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 1a + 2b & 1c + 2d \\ 3a + 4b & 3c + 4d \end{bmatrix} \end{align*} \]

The third way to think about matrix multiplication is to see how a row in the resulting matrix is calculated. The first row in the resulting matrix corresponds to the linear combination of the rows in the right matrix where the weights are the elements in the first row of the left matrix. This pattern carries on for the other rows in the resulting matrix.

Example \[\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1 \begin{bmatrix} a & b \end{bmatrix} + 2 \begin{bmatrix} c & d \end{bmatrix} \\ \quad\quad\quad\quad\quad \end{bmatrix} \\ &= \begin{bmatrix} 1 \begin{bmatrix} a & b \end{bmatrix} + 2 \begin{bmatrix} c & d \end{bmatrix} \\ 3 \begin{bmatrix} a & b \end{bmatrix} + 4 \begin{bmatrix} c & d \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 1a + 2c & 1b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*} \]

A fourth view of matrix multiplication is that it is the sum of the outer products of the columns of the first matrix and the rows of the second matrix. So you can interpret each outer product as a layer of the resulting matrix. This in turn shows that any matrix can be written as a sum of rank 1 matrices.

Matrix multiplication as the sum of the outer products of the columns of the first matrix and the rows of the second matrix.

Example

The matrix multiplication of two matrices \(\mathbf{A}\) and \(\mathbf{B}\) can be written as the sum of the outer products of the columns of \(\mathbf{A}\) and the rows of \(\mathbf{B}\).

\[\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1 \\ 3 \end{bmatrix} \begin{bmatrix} a & b \end{bmatrix} + \begin{bmatrix} 2 \\ 4 \end{bmatrix} \begin{bmatrix} c & d \end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot a & 1 \cdot b \\ 3 \cdot a & 3 \cdot b \end{bmatrix} + \begin{bmatrix} 2 \cdot c & 2 \cdot d \\ 4 \cdot c & 4 \cdot d \end{bmatrix} = \begin{bmatrix} a + 2c & b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*} \]

Commutative Property

Matrix multiplication is not commutative so:

\[\mathbf{A} \cdot \mathbf{B} \neq \mathbf{B} \cdot \mathbf{A} \]

This means that the order in which you multiply is important! This already becomes apparent when you look at the dimensions of the matrices. If you multiply a \(2 \times 3\) matrix with a \(3 \times 2\) matrix you get a \(2 \times 2\) matrix. However, if you multiply a \(3 \times 2\) matrix with a \(2 \times 3\) matrix you get a \(3 \times 3\) matrix.

Example \[\begin{align*} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \cdot \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} &= \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix} \\ \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} &= \begin{bmatrix} 39 & 54 & 69 \\ 49 & 68 & 87 \\ 59 & 82 & 105 \end{bmatrix} \end{align*} \]

Even if the dimensions of the matrices are the same the result of the matrix multiplication can be different depending on the order of the matrices.

\[\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \cdot \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} &= \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \\ \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} &= \begin{bmatrix} 23 & 34 \\ 31 & 46 \end{bmatrix} \end{align*} \]

For the element \(c_{11}\) we can clearly see why the order of the matrices is important:

\[\begin{align*} c_{11} &= 1 \cdot 5 + 2 \cdot 7 = 19 \\ c_{11} &= 5 \cdot 1 + 6 \cdot 3 = 23 \\ 19 &\neq 23 \end{align*} \]

There are however some special matrices that are commutative to each other. These matrices are called commutative matrices or commute to each other so:

\[\mathbf{A} \cdot \mathbf{B} = \mathbf{B} \cdot \mathbf{A} \]

Example

For the three matrices \(\mathbf{A}\), \(\mathbf{B}\) and \(\mathbf{C}\):

\[\mathbf{A} = \begin{bmatrix} 2 & 6 \\ 1 & 7 \end{bmatrix} \quad \mathbf{B} = \begin{bmatrix} -3 & -1 \\ 2 & 1 \end{bmatrix} \quad \mathbf{C} = \begin{bmatrix} 15 & 6 \\ 1 & 20 \end{bmatrix} \]

We can calculate the following:

\[\begin{align*} \mathbf{AB} = \begin{bmatrix} 6 & 4 \\ 11 & 6 \end{bmatrix} \quad \mathbf{BA} = \begin{bmatrix} -7 & -25 \\ 5 & 19 \end{bmatrix} \\ \mathbf{AC} = \begin{bmatrix} 36 & 132 \\ 22 & 146 \end{bmatrix} \quad \mathbf{CA} = \begin{bmatrix} 36 & 132 \\ 22 & 146 \end{bmatrix} \\ \end{align*} \]

We can see that the matrices \(\mathbf{A}\) and \(\mathbf{C}\) commute to each other.

\[\mathbf{A} \cdot \mathbf{C} = \mathbf{C} \cdot \mathbf{A} \quad \text{and} \quad \mathbf{A} \cdot \mathbf{B} \neq \mathbf{B} \cdot \mathbf{A} \]

Associative Property

Matrix multiplication is associative so:

\[(\mathbf{A} \cdot \mathbf{B}) \cdot \mathbf{C} = \mathbf{A} \cdot (\mathbf{B} \cdot \mathbf{C}) \]

This mean that the order in which you group the matrices in a multiplication does not matter. However, this does not mean that the order in which you multiply the matrices does not matter. As we have seen before matrix multiplication is not commutative.

The associative property is very useful for linear transformations as it allows you to combine multiple transformations into a single transformation.

Proof

We have the following matrices:

\[\mathbf{A} \in \mathbb{R}^{m \times n} \quad \mathbf{B} \in \mathbb{R}^{n \times p} \quad \mathbf{C} \in \mathbb{R}^{p \times q} \]

Then we can see that the following two equations become the same:

\[\begin{align*} ((\mathbf{A} \mathbf{B}) \mathbf{C})_{ik} &= \sum_{l=1}^p (\mathbf{A} \mathbf{B})_{il} c_{lk} = \sum_{l=1}^p \sum_{j=1}^n a_{ij} b_{jl} c_{lk} \\ (\mathbf{A} (\mathbf{B} \mathbf{C}))_{ij} &= \sum_{j=1}^n a_{ij} (\mathbf{B} \mathbf{C})_{jk} = \sum_{j=1}^n \sum_{l=1}^p a_{ij} b_{jl} c_{lk} \end{align*} \]

Distributive Property

The matrix multiplication is left distributive over addition and right distributive over addition so:

\[\begin{align*} \mathbf{A} \cdot (\mathbf{B} + \mathbf{C}) = \mathbf{A} \cdot \mathbf{B} + \mathbf{A} \cdot \mathbf{C} \\ (\mathbf{A} + \mathbf{B}) \cdot \mathbf{C} = \mathbf{A} \cdot \mathbf{C} + \mathbf{B} \cdot \mathbf{C} \end{align*} \]

Proof

For the matrices \(\mathbf{A}\), \(\mathbf{B}\) and \(\mathbf{C}\):

\[\mathbf{A} \in \mathbb{R}^{m \times n} \quad \mathbf{B} \in \mathbb{R}^{m \times n} \quad \mathbf{C} \in \mathbb{R}^{n \times p} \]

Then we can see that the following two equations become the same:

\[\begin{align*} ((\mathbf{A} + \mathbf{B}) \cdot \mathbf{C})_{ij} &= \sum_{k=1}^n (\mathbf{A} + \mathbf{B})_{ik} c_{kj} \\ &= \sum_{k=1}^n (a_{ik} + b_{ik}) c_{kj} = \sum_{k=1}^n a_{ik} c_{kj} + b_{ik} c_{kj} = \sum_{k=1}^n a_{ik} c_{kj} + \sum_{k=1}^n b_{ik} c_{kj} \\ &= (\mathbf{A} \cdot \mathbf{C})_{ij} + (\mathbf{B} \cdot \mathbf{C})_{ij} \end{align*} \]

Multiplicative Identity

With regards to matrix multiplication the identity matrix is a special matrix. The identity matrix functions as the multiplicative identity or also called the neutral element for matrix multiplication. Meaning that if you multiply any matrix with the identity matrix you get the same matrix back. The identity matrix is also commute to any matrix.

\[\mathbf{I} \cdot \mathbf{A} = \mathbf{A} \cdot \mathbf{I} = \mathbf{A} \]

The reason as to why the identity matrix functions as the multiplicative identity can quite easily be seen if you think of the first row in the identity matrix as selecting all the elements in the first row of the second matrix. Then the second row in the identity matrix selects all the elements in the second row of the second matrix and so on. So it can be thought of as the 1 in each row/column of the identity matrix selecting the corresponding row/column in the second matrix.

Example \[\mathbf{I} \cdot {A} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} (1 \cdot 1 + 0 \cdot 4 + 0 \cdot 7) & (1 \cdot 2 + 0 \cdot 5 + 0 \cdot 8) & (1 \cdot 3 + 0 \cdot 6 + 0 \cdot 9) \\ (0 \cdot 1 + 1 \cdot 4 + 0 \cdot 7) & (0 \cdot 2 + 1 \cdot 5 + 0 \cdot 8) & (0 \cdot 3 + 1 \cdot 6 + 0 \cdot 9) \\ (0 \cdot 1 + 0 \cdot 4 + 1 \cdot 7) & (0 \cdot 2 + 0 \cdot 5 + 1 \cdot 8) & (0 \cdot 3 + 0 \cdot 6 + 1 \cdot 9) = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]

The same is true for the other way around but the identity matrix selects the columns of the first matrix instead of the rows:

\[\mathbf{A} \cdot {I} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \cdot \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} (1 \cdot 1 + 2 \cdot 0 + 3 \cdot 0) & (1 \cdot 0 + 2 \cdot 1 + 3 \cdot 0) & (1 \cdot 0 + 2 \cdot 0 + 3 \cdot 1) \\ (4 \cdot 1 + 5 \cdot 0 + 6 \cdot 0) & (4 \cdot 0 + 5 \cdot 1 + 6 \cdot 0) & (4 \cdot 0 + 5 \cdot 0 + 6 \cdot 1) \\ (7 \cdot 1 + 8 \cdot 0 + 9 \cdot 0) & (7 \cdot 0 + 8 \cdot 1 + 9 \cdot 0) & (7 \cdot 0 + 8 \cdot 0 + 9 \cdot 1) = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]

Multiplying Diagonal Matrices

We have seen that multiplying matrices can be quite complex, but diagonal matrices have a very simple multiplication rule. This simplicity arises because the only non-zero elements are on the main diagonal, which makes calculations straightforward. The most obvious case of this is when multiplying a matrix by the identity matrix (also a diagonal matrix). Here we saw that the identity matrix acts as if it selects the corresponding rows or columns of the other matrix. If we change one of the values on the diagonal of the identity matrix to a different value, we can see that it also selects the corresponding rows or columns of the other matrix but scales them by that value (for the identity matrix this value is always 1):

\[\begin{bmatrix} a & 0 & 0 \\ 0 & b & 0 \\ 0 & 0 & c \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} (a \cdot 1 + 0 \cdot 4 + 0 \cdot 7) & (a \cdot 2 + 0 \cdot 5 + 0 \cdot 8) & (a \cdot 3 + 0 \cdot 6 + 0 \cdot 9) \\ (0 \cdot 1 + b \cdot 4 + 0 \cdot 7) & (0 \cdot 2 + b \cdot 5 + 0 \cdot 8) & (0 \cdot 3 + b \cdot 6 + 0 \cdot 9) \\ (0 \cdot 1 + 0 \cdot 4 + c \cdot 7) & (0 \cdot 2 + 0 \cdot 5 + c \cdot 8) & (0 \cdot 3 + 0 \cdot 6 + c \cdot 9) = \begin{bmatrix} a & 2a & 3a \\ 4b & 5b & 6b \\ 7c & 8c & 9c \end{bmatrix} \]

Or for the other way around:

\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \cdot \begin{bmatrix} a & 0 & 0 \\ 0 & b & 0 \\ 0 & 0 & c \end{bmatrix} = \begin{bmatrix} (1 \cdot a + 2 \cdot 0 + 3 \cdot 0) & (1 \cdot 0 + 2 \cdot b + 3 \cdot 0) & (1 \cdot 0 + 2 \cdot 0 + 3 \cdot c) \\ (4 \cdot a + 5 \cdot 0 + 6 \cdot 0) & (4 \cdot 0 + 5 \cdot b + 6 \cdot 0) & (4 \cdot 0 + 5 \cdot 0 + 6 \cdot c) \\ (7 \cdot a + 8 \cdot 0 + 9 \cdot 0) & (7 \cdot 0 + 8 \cdot b + 9 \cdot 0) & (7 \cdot 0 + 8 \cdot 0 + 9 \cdot c) = \begin{bmatrix} a & 2b & 3c \\ 4a & 5b & 6c \\ 7a & 8b & 9c \end{bmatrix} \]

We can also write this in a more general and formal way. Let \(\mathbf{D} = \operatorname{diag}(d\_{11}, \ldots, d\_{nn})\) be a diagonal matrix of size \(n \times n\), and \(\mathbf{A} \in \mathbb{R}^{n \times m}\) is any \(n \times m\) matrix. Then the product \(\mathbf{D}\mathbf{A}\) is:

\[(\mathbf{D}\mathbf{A})_{ij} = \sum_{k=1}^n d_{ik} a_{kj} \]

But since \(d\_{ik} = 0\) for \(i \neq k\), only \(k = i\) contributes:

\[(\mathbf{D}\mathbf{A})_{ij} = d_{ii} a_{ij} \]

So, multiplying from the left scales the \(i\)-th row of \(\mathbf{A}\) by \(d\_{ii}\). And then the product for the other side, \(\mathbf{A}\mathbf{D}\), is:

\[(\mathbf{A}\mathbf{D})_{ij} = \sum_{k=1}^n a_{ik} d_{kj} \]

But again, \(d\_{kj} = 0\) for \(k \neq j\), so only \(k = j\) contributes:

\[(\mathbf{A}\mathbf{D})_{ij} = a_{ij} d_{jj} \]

So, multiplying from the right scales the \(j\)-th column of \(\mathbf{A}\) by \(d\_{jj}\).

Example

Let

\[\mathbf{D} = \operatorname{diag}(2, 0, 3) = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 3 \end{bmatrix} \quad \mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \]

Then

\[\mathbf{D}\mathbf{A} = \begin{bmatrix} 2 \cdot 1 & 2 \cdot 2 \\ 0 \cdot 3 & 0 \cdot 4 \\ 3 \cdot 5 & 3 \cdot 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 0 & 0 \\ 15 & 18 \end{bmatrix} \]

Or an example where we multiply from the right:

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \quad \mathbf{D} = \operatorname{diag}(10, 0, -1) = \begin{bmatrix} 10 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & -1 \end{bmatrix} \]

Then

\[\mathbf{A}\mathbf{D} = \begin{bmatrix} 1 \cdot 10 & 2 \cdot 0 & 3 \cdot (-1) \\ 4 \cdot 10 & 5 \cdot 0 & 6 \cdot (-1) \end{bmatrix} = \begin{bmatrix} 10 & 0 & -3 \\ 40 & 0 & -6 \end{bmatrix} \]

We can also look at the case where both matrices are diagonal. In this case, the multiplication is even simpler because the only non-zero elements are on the diagonal. The product of two diagonal matrices is another diagonal matrix where each diagonal element is the product of the corresponding diagonal elements of the two matrices. This again can be quickly seen when looking at an example:

\[\begin{bmatrix} d_1 & 0 & 0 \\ 0 & d_2 & 0 \\ 0 & 0 & d_3 \end{bmatrix} \cdot \begin{bmatrix} e_1 & 0 & 0 \\ 0 & e_2 & 0 \\ 0 & 0 & e_3 \end{bmatrix} = \begin{bmatrix} (d_1 \cdot e_1 + 0 \cdot 0 + 0 \cdot 0) & (0 \cdot e_1 + d_2 \cdot e_2 + 0 \cdot 0) & (0 \cdot e_1 + 0 \cdot e_2 + d_3 \cdot e_3) \\ (0 \cdot e_1 + 0 \cdot e_2 + 0 \cdot e_3) & (d_1 \cdot 0 + d_2 \cdot e_2 + 0 \cdot 0) & (0 \cdot e_1 + d_2 \cdot 0 + d_3 \cdot e_3) \\ (0 \cdot e_1 + 0 \cdot e_2 + d_3 \cdot 0) & (d_1 \cdot 0 + d_2 \cdot 0 + d_3 \cdot e_3) & (0 \cdot e_1 + 0 \cdot e_2 + d_3 \cdot e_3) = \begin{bmatrix} (d_1 e_1) & 0 & 0 \\ 0 & (d_2 e_2) & 0 \\ 0 & 0 & (d_3 e_3) \end{bmatrix} \]

This then becomes equivialent to the [elementwise product (Hadamard product)() of the diagonal elements of the two matrices. Formally, if \(\mathbf{D} = \operatorname{diag}(d_1, \ldots, d_n)\) and \(\mathbf{E} = \operatorname{diag}(e_1, \ldots, e_n)\), then:

\[(\mathbf{D}\mathbf{E})_{ij} = \sum_{k=1}^n d_{ik} e_{kj} \]

But \(d\_{ik} = 0\) unless \(i = k\), and \(e\_{kj} = 0\) unless \(k = j\), so the only nonzero term is when \(i = k = j\):

\[(\mathbf{D}\mathbf{E})_{ij} = d_{ii} e_{ii} \text{ if } i = j, \text{ else } 0 \]

Thus,

\[\mathbf{D}\mathbf{E} = \operatorname{diag}(d_1 e_1, \ldots, d_n e_n) \]

Example

Let

\[\mathbf{D} = \operatorname{diag}(2, 3, 4),\quad \mathbf{E} = \operatorname{diag}(7, 8, 9) \]

Then

\[\mathbf{D}\mathbf{E} = \operatorname{diag}(2 \cdot 7,\, 3 \cdot 8,\, 4 \cdot 9) = \operatorname{diag}(14,\, 24,\, 36) \]

Mixing Vectors and Matrices

Because a vector is a special case of a matrix, we can also multiply vectors with matrices. A vector can be thought of as a matrix with either one row or one column. This means that the multiplication between a vector and a matrix is defined in the same way as the multiplication between two matrices so the dimensions of the two must be compatible. If we have a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and a column vector \(\mathbf{x} \in \mathbb{R}^{n \times 1}\), then we can multiply them together as follows:

\[\begin{align*} \mathbf{A}\mathbf{x} &= \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + \dots + a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2 + \dots + a_{2n}x_n \\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \dots + a_{mn}x_n \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} \end{align*} \]

As we can see from the definition above, the result is another column vector with \(m\) components. We can also see that the resulting vector is a linear combination of the columns of the matrix \(\mathbf{A}\) where the weights are the components of the vector \(\mathbf{x}\). This is why when the matrix is square the multiplication can often be called a linear transformation, because it transforms the vector into another vector (under certain conditions). If the matrix is not square, then the multiplication can still be a linear transformation but the resulting vector will have a different number of components (dimension) than the original vector. This is why it is sometimes also called a projection rather than a transformation as it projects the vector onto a lower or higher dimensional space (specifically the column space of the matrix).

If the vector is on the left side of the matrix, then the vector needs to be transposed for the multiplication to work, i.e. the dimensions of the vector must be compatible with the matrix. So if we have a row vector \(\mathbf{x}^T \in \mathbb{R}^{1 \times n}\) and a matrix \(\mathbf{A} \in \mathbb{R}^{n \times m}\), then we can multiply them together as follows:

\[\begin{align*} \mathbf{x}^T\mathbf{A} &= \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1m} \\ a_{21} & a_{22} & \dots & a_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nm} \end{bmatrix} \\ &= \begin{bmatrix} x_1a_{11} + x_2a_{21} + \dots + x_na_{n1} & \dots & x_1a_{1m} + x_2a_{2m} + \dots + x_na_{nm} \end{bmatrix} = \begin{bmatrix} y_1 & y_2 & \dots & y_m \end{bmatrix} \end{align*} \]

Just like when right multiplying a vector by a matrix, the result is a row vector with \(m\) components. We can also see that the resulting vector is a linear combination of the rows of the matrix \(\mathbf{A}\) where the weights are the components of the vector \(\mathbf{x}\). So the multiplication is also a linear transformation but this time it transforms the vector into another vector by projecting it onto the row space of the matrix.

Example

A right multiplication of a matrix and a vector:

\[\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 \cdot 5 + 2 \cdot 6 \\ 3 \cdot 5 + 4 \cdot 6 \end{bmatrix} = \begin{bmatrix} 17 \\ 39 \end{bmatrix} \]

A left multiplication of a matrix and a vector:

\[\begin{bmatrix} 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 5 \cdot 1 + 6 \cdot 3 & 5 \cdot 2 + 6 \cdot 4 \end{bmatrix} = \begin{bmatrix} 23 & 34 \end{bmatrix} \]

Matrix Exponentiation by Squaring

Using exponentiation by squaring you can quickly calculate the power of any number. The idea is to make use of the fact that multiplication is associative so:

\[a^{10} = a^{5} \cdot a^{5} = (a^{2} \cdot a^{2} \cdot a) \cdot (a^{2} \cdot a^{2} \cdot a) \]

So rather then performing 9 multiplications if we memorize the results we only need to perform the following multiplications:

\[\begin{align*} a^{2} &= a \cdot a \\ a^{4} &= a^{2} \cdot a^{2} \\ a^{5} &= a^{4} \cdot a \\ a^{10} &= a^{5} \cdot a^{5} \end{align*} \]

Which is only 4 multiplications rather then 9. This can be generalized to any power of \(k\):

\[a^k = \begin{cases} 1 & \text{if } k = 0 \\ a & \text{if } k = 1 \\ (a^{\frac{k}{2}}) \cdot (a^{\frac{k}{2}}) & \text{if } k \text{ is even} \\ a \cdot (a^{\frac{k-1}{2}}) \cdot (a^{\frac{k-1}{2}}) & \text{if } k \text{ is odd} \end{cases} \]

This can also be applied to matrices. The matrix exponentiation by squaring is defined in the same way as for numbers:

\[\mathbf{A}^k = \begin{cases} \mathbf{I} & \text{if } k = 0 \\ \mathbf{A} & \text{if } k = 1 \\ (\mathbf{A}^{\frac{k}{2}}) \cdot (\mathbf{A}^{\frac{k}{2}}) & \text{if } k \text{ is even} \\ \mathbf{A} \cdot (\mathbf{A}^{\frac{k-1}{2}}) \cdot (\mathbf{A}^{\frac{k-1}{2}}) & \text{if } k \text{ is odd} \end{cases} \]

This results in a time complexity of \(O(\log k)\) for calculating the power of a matrix. This is much faster than the naive approach which would take \(O(k)\) time.

Todo

We can calculate the fibonacci numbers using matrix exponentiation by squaring. The Fibonacci numbers can be represented as a matrix multiplication:

\[\begin{bmatrix} F_{n-1} \\ F_n \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^n \begin{bmatrix} F_{n-2} F_{n-1} \end{bmatrix} \]

for \(n \geq 2\).

Matrix Chain Multiplication

Todo

This is not yet complete and is just a rough scribbling

When you want to multiply multiple matrices the number of multiplications can quickly add up. However, you can use the associative property to group the matrices in a way that you can minimize the number of multiplications. So given some matrices \(\mathbf{A}_1, \mathbf{A}_2, \ldots, \mathbf{A}_n\) where the matrix \(\mathbf{A}_i\) has dimensions \(\mathbb{R}^{m_{i-1} \times m_i}\) we want to find the product of all the matrices with the least amount of multiplications.

Show example with 3 matrices of dimensions \(k \times 1\), \(1 \times k\) and \(k \times 1\). Results in either \(O(k)\) or \(O(k^2)\) multiplications.

We can write a dynamic programming algorithm to find the optimal way to multiply the matrices. The algorithm is called the Matrix Chain Multiplication algorithm. and runs in \(O(n^3)\) time which is anyway the time complexity of multiplying one matrix with another matrix.

Strassen’s Algorithm

Todo

This needs to be written. We use \(O(n^{\log_2 7}) \approx O(n^{2.81})\) time to multiply two matrices using Strassen’s algorithm. This is faster than the naive algorithm which uses \(O(n^3)\) time. The algorithm works by reducing the number of arithmetic operations needed to multiply two matrices by breaking them down into smaller subproblems. It does this by recursively dividing the matrices into smaller matrices and combining the results in a clever way.

The reduction in the number of arithmetic operations however comes at the price of a somewhat reduced numerical stability, and the algorithm also requires significantly more memory compared to the naive algorithm. Both initial matrices must have their dimensions expanded to the next power of 2, which results in storing up to four times as many elements, and the seven auxiliary matrices each contain a quarter of the elements in the expanded ones.

Winograd’s Algorithm

Todo

This needs to be written. further improves this to \(O(n^{2.37})\) time?

Triangular Matrices

A triangular matrix is a square matrix where all the elements above or below the main diagonal are \(0\). If all the elements above the diagonal are \(0\) it is called a lower triangular matrix because it only has elements below the diagonal that are non-zero:

\[(\mathbf{L})_{ij} = 0 \text{ for all } i < j \]

If all the elements below the diagonal are \(0\) it is called a upper triangular matrix because it only has elements above the diagonal that are non-zero:

\[(\mathbf{U})_{ij} = 0 \text{ for all } i > j \]

Importantly note that the main diagonal can contain zeroes and still be a triangular matrix. The main diagonal is not required to be non-zero.

Example

A lower triangular matrix \(\mathbf{L}\):

\[\mathbf{L} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix} \]

An upper triangular matrix \(\mathbf{U}\):

\[\mathbf{U} = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 0 & 5 \\ 0 & 0 & 6 \end{bmatrix} \]

It is rather obvious that adding two triangular matrices of the same type results in a triangular matrix of the same type. So adding two lower triangular matrices results in a lower triangular matrix and adding two upper triangular matrices results in an upper triangular matrix. However, what about multiplying triangular matrices? Let’s first look at the multiplication of triangular matrices of the same type, starting with lower triangular matrices. Let’s first look at it visually:

\[\begin{align*} \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix} \cdot \begin{bmatrix} a & 0 & 0 \\ b & c & 0 \\ d & e & f \end{bmatrix} &= \begin{bmatrix} (1 \cdot a + 0 \cdot b + 0 \cdot d) & (1 \cdot 0 + 0 \cdot c + 0 \cdot e) & (1 \cdot 0 + 0 \cdot 0 + 0 \cdot f) \\ (2 \cdot a + 3 \cdot b + 0 \cdot d) & (2 \cdot 0 + 3 \cdot c + 0 \cdot e) & (2 \cdot 0 + 3 \cdot 0 + 0 \cdot f) \\ (4 \cdot a + 5 \cdot b + 6 \cdot d) & (4 \cdot 0 + 5 \cdot c + 6 \cdot e) & (4 \cdot 0 + 5 \cdot 0 + 6 \cdot f) \end{bmatrix} \\ &= \begin{bmatrix} a & 0 & 0 \\ (2a + 3b) & (3c) & 0 \\ (4a + 5b + 6d) & (5c + 6e) & (6f) \end{bmatrix} \end{align*} \]

So we can see that the product of two lower triangular matrices is also a lower triangular matrix. We can also look at this more formally. Let \(\mathbf{L}_1\) and \(\mathbf{L}_2\) be \(n \times n\) lower triangular matrices:

\[(\mathbf{L}_1)_{ij} = 0 \text{ for } i < j \qquad (\mathbf{L}_2)_{ij} = 0 \text{ for } i < j \]

The product \(\mathbf{L}_1\mathbf{L}_2\) is then given by:

\[(\mathbf{L}_1\mathbf{L}_2)_{ij} = \sum_{k=1}^n (\mathbf{L}_1)_{ik} (\mathbf{L}_2)_{kj} \]

The question is then which entries \((\mathbf{L}_1\mathbf{L}_2)_{ij}\) can be nonzero? If \(i < j\), then for all \(k\):

In \((\mathbf{L}_2)_{kj}\), \(k < j \implies (\mathbf{L}_2)_{kj} = 0\).
But \(k\) runs from \(1\) to \(n\), and for a fixed \(i < j\), for \(k \leq i\), \(k < j\), so all such terms vanish.
For \(k > i\), \((\mathbf{L}_1)_{ik} = 0\).

So putting all this together, we see that \((\mathbf{L}_1\mathbf{L}_2)_{ij} = 0\) for \(i < j\) which is again the definition of a lower triangular matrix, so the product is lower triangular.

Example

Let

\[\mathbf{L}_1 = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix},\quad \mathbf{L}_2 = \begin{bmatrix} 7 & 0 & 0 \\ 8 & 9 & 0 \\ 10 & 11 & 12 \end{bmatrix} \]

Then

\[\begin{align*} \mathbf{L}_1\mathbf{L}_2 &= \begin{bmatrix} (1\cdot 7 + 0\cdot 8 + 0\cdot 10) & (1\cdot 0 + 0\cdot 9 + 0\cdot 11) & (1\cdot 0 + 0\cdot 0 + 0\cdot 12) \\ (2\cdot 7 + 3\cdot 8 + 0\cdot 10) & (2\cdot 0 + 3\cdot 9 + 0\cdot 11) & (2\cdot 0 + 3\cdot 0 + 0\cdot 12) \\ (4\cdot 7 + 5\cdot 8 + 6\cdot 10) & (4\cdot 0 + 5\cdot 9 + 6\cdot 11) & (4\cdot 0 + 5\cdot 0 + 6\cdot 12) \end{bmatrix} \\ &= \begin{bmatrix} 7 & 0 & 0 \\ 14 + 24 & 27 & 0 \\ 28 + 40 + 60 & 45 + 66 & 72 \end{bmatrix} \\ &= \begin{bmatrix} 7 & 0 & 0 \\ 38 & 27 & 0 \\ 128 & 111 & 72 \end{bmatrix} \end{align*} \]

We can also look at the case of multiplying two upper triangular matrices. Again, we can see this visually:

\[\begin{align*} \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix} \cdot \begin{bmatrix} a & b & c \\ 0 & d & e \\ 0 & 0 & f \end{bmatrix} &= \begin{bmatrix} (1 \cdot a + 2 \cdot 0 + 3 \cdot 0) & (1 \cdot b + 2 \cdot d + 3 \cdot 0) & (1 \cdot c + 2 \cdot e + 3 \cdot f) \\ (0 \cdot a + 4 \cdot 0 + 5 \cdot 0) & (0 \cdot b + 4 \cdot d + 5 \cdot 0) & (0 \cdot c + 4 \cdot e + 5 \cdot f) \\ (0 \cdot a + 0 \cdot b + 6 \cdot 0) & (0 \cdot d + 6 \cdot e) & (0 \cdot f) \end{bmatrix} \\ &=\begin{bmatrix} a & (b + 2d) & (c + 2e + 3f) \\ 0 & 4d & (5e + 6f) \\ 0 & 0 & 6f \end{bmatrix} \end{align*} \]

Or formally, let \(\mathbf{U}_1\) and \(\mathbf{U}_2\) be \(n \times n\) upper triangular matrices:

\[(\mathbf{U}_1)_{ij} = 0 \text{ for } i > j \qquad (\mathbf{U}_2)_{ij} = 0 \text{ for } i > j \]

And then in the product \(\mathbf{U}_1\mathbf{U}_2\), if \(i > j\), then for any \(k\):

\((\mathbf{U}_1)_{ik} = 0\) unless \(i \leq k\).
\((\mathbf{U}_2)_{kj} = 0\) unless \(k \leq j\).
Thus, both \(i \leq k \leq j\); but if \(i > j\), there is no \(k\) satisfying this.

So, \((\mathbf{U}_1\mathbf{U}_2)_{ij} = 0\) for \(i > j\) which again by the definition of upper triangular matrices means the product is upper triangular.

Example

Let

\[\mathbf{U}_1 = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix},\quad \mathbf{U}_2 = \begin{bmatrix} 7 & 8 & 9 \\ 0 & 10 & 11 \\ 0 & 0 & 12 \end{bmatrix} \]

Then

\[\begin{align*} \mathbf{U}_1\mathbf{U}_2 &= \begin{bmatrix} (1\cdot 7 + 2\cdot 0 + 3\cdot 0) & (1\cdot 8 + 2\cdot 10 + 3\cdot 0) & (1\cdot 9 + 2\cdot 11 + 3\cdot 12) \\ (0\cdot 7 + 4\cdot 0 + 5\cdot 0) & (0\cdot 8 + 4\cdot 10 + 5\cdot 0) & (0\cdot 9 + 4\cdot 11 + 5\cdot 12) \\ (0\cdot 7 + 0\cdot 8 + 6\cdot 0) & (0\cdot 10 + 0\cdot 11 + 6\cdot 12) & (0\cdot 9 + 0\cdot 0 + 6\cdot 12) \end{bmatrix} \\ &= \begin{bmatrix} 7 & 8 + 20 & 9 + 22 + 36 \\ 0 & 40 & 44 + 60 \\ 0 & 0 & 72 \end{bmatrix} \\ &= \begin{bmatrix} 7 & 28 & 67 \\ 0 & 40 & 104 \\ 0 & 0 & 72 \end{bmatrix} \end{align*} \]

Now what about multiplying a lower triangular matrix with an upper triangular matrix? Or vice versa? Let’s look at this case. Unfortunately, the product of a lower triangular matrix and an upper triangular matrix is not triangular in general, only if one of them is diagonal. It just results in a normal matrix. If one of them is diagonal, then remember that multiplying a diagonal matrix with any matrix results in scaling the rows or columns of the other matrix, so it will stay the same type of triangular matrix.

Example

Let

\[\mathbf{U} = \begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix},\quad \mathbf{L} = \begin{bmatrix} 4 & 0 \\ 5 & 6 \end{bmatrix} \]

Then

\[\begin{align*} \mathbf{U}\mathbf{L} &= \begin{bmatrix} (1\cdot 4 + 2\cdot 5) & (1\cdot 0 + 2\cdot 6) \\ (0\cdot 4 + 3\cdot 5) & (0\cdot 0 + 3\cdot 6) \end{bmatrix} \\ &= \begin{bmatrix} 4 + 10 & 0 + 12 \\ 0 + 15 & 0 + 18 \end{bmatrix} \\ &= \begin{bmatrix} 14 & 12 \\ 15 & 18 \end{bmatrix} \end{align*} \]

Lastly we can also look at the case of multiplying a triangular matrix with a general matrix. This is where the triangular matrix “mixes” the rows or columns of the general matrix, depending on whether we multiply from the left or right. To see this let’s first look at the case of multiplying a lower triangular matrix with a general matrix from the left:

\[\begin{align*} \begin{bmatrix} a & 0 & 0 \\ b & c & 0 \\ d & e & f \end{bmatrix} \cdot \begin{bmatrix} x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} &= \begin{bmatrix} (a \cdot x_1 + 0 \cdot x_2 + 0 \cdot x_3) & (a \cdot y_1 + 0 \cdot y_2 + 0 \cdot y_3) & (a \cdot z_1 + 0 \cdot z_2 + 0 \cdot z_3) \\ (b \cdot x_1 + c \cdot x_2 + 0 \cdot x_3) & (b \cdot y_1 + c \cdot y_2 + 0 \cdot y_3) & (b \cdot z_1 + c \cdot z_2 + 0 \cdot z_3) \\ (d \cdot x_1 + e \cdot x_2 + f \cdot x_3) & (d \cdot y_1 + e \cdot y_2 + f \cdot y_3) & (d \cdot z_1 + e \cdot z_2 + f \cdot z_3) \end{bmatrix} \\ &= \begin{bmatrix} a x_1 & a y_1 & a z_1 \\ (b x_1 + c x_2) & (b y_1 + c y_2) & (b z_1 + c z_2) \\ (d x_1 + e x_2 + f x_3) & (d y_1 + e y_2 + f y_3) & (d z_1 + e z_2 + f z_3) \end{bmatrix} \end{align*} \]

We can notice that each row of the resulting matrix is a linear combination of the rows above it in the original matrix, where the coefficients are the rows of the lower triangular matrix. Formally, if \(\mathbf{L}\) is lower triangular, then \((\mathbf{L}\mathbf{A})_{ij}\) depends only on rows \(1, 2, \ldots, i\) of \(\mathbf{A}\).

If we multiply a lower triangular matrix from the right, we can see the following:

\[\begin{align*} \begin{bmatrix} x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} \cdot \begin{bmatrix} a & 0 & 0 \\ b & c & 0 \\ d & e & f \end{bmatrix} &= \begin{bmatrix} (x_1 \cdot a + y_1 \cdot b + z_1 \cdot d) & (x_1 \cdot 0 + y_1 \cdot c + z_1 \cdot e) & (x_1 \cdot 0 + y_1 \cdot 0 + z_1 \cdot f) \\ (x_2 \cdot a + y_2 \cdot b + z_2 \cdot d) & (x_2 \cdot 0 + y_2 \cdot c + z_2 \cdot e) & (x_2 \cdot 0 + y_2 \cdot 0 + z_2 \cdot f) \\ (x_3 \cdot a + y_3 \cdot b + z_3 \cdot d) & (x_3 \cdot 0 + y_3 \cdot c + z_3 \cdot e) & (x_3 \cdot 0 + y_3 \cdot 0 + z_3 \cdot f) \end{bmatrix} \\ &= \begin{bmatrix} (x_1 a + y_1 b + z_1 d) & (y_1 c + z_1 e) & (z_1 f) \\ (x_2 a + y_2 b + z_2 d) & (y_2 c + z_2 e) & (z_2 f) \\ (x_3 a + y_3 b + z_3 d) & (y_3 c + z_3 e) & (z_3 f) \end{bmatrix} \end{align*} \]

Here we can see that each column of the resulting matrix is a linear combination of the columns to the right of it in the original matrix, where the coefficients are the columns of the corresponding lower triangular matrix. Formally, if \(\mathbf{L}\) is lower triangular, then \((\mathbf{A}\mathbf{L})_{ij}\) depends only on columns \(j, \ldots, n\) of \(\mathbf{A}\), where \(n\) is the number of columns in \(\mathbf{A}\).

Now let’s look at the case of multiplying an upper triangular matrix with a general matrix from the left:

\[\begin{align*} \begin{bmatrix} a & b & c \\ 0 & d & e \\ 0 & 0 & f \end{bmatrix} \cdot \begin{bmatrix} x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} &= \begin{bmatrix} (a \cdot x_1 + b \cdot x_2 + c \cdot x_3) & (a \cdot y_1 + b \cdot y_2 + c \cdot y_3) & (a \cdot z_1 + b \cdot z_2 + c \cdot z_3) \\ (0 \cdot x_1 + d \cdot x_2 + e \cdot x_3) & (0 \cdot y_1 + d \cdot y_2 + e \cdot y_3) & (0 \cdot z_1 + d \cdot z_2 + e \cdot z_3) \\ (0 \cdot x_1 + 0 \cdot x_2 + f \cdot x_3) & (0 \cdot y_1 + 0 \cdot y_2 + f \cdot y_3) & (0 \cdot z_1 + 0 \cdot z_2 + f \cdot z_3) \end{bmatrix} \\ &= \begin{bmatrix} (a x_1 + b x_2 + c x_3) & (a y_1 + b y_2 + c y_3) & (a z_1 + b z_2 + c z_3) \\ (d x_2 + e x_3) & (d y_2 + e y_3) & (d z_2 + e z_3) \\ (f x_3) & (f y_3) & (f z_3) \end{bmatrix} \end{align*} \]

In this case we can see that each row of the resulting matrix is a linear combination of the rows below it in the original matrix, where the coefficients are the rows of the upper triangular matrix. Formally, if \(\mathbf{U}\) is upper triangular, then \((\mathbf{U}\mathbf{A})_{ij}\) depends only on rows \(i, \ldots, n\) of \(\mathbf{A}\), where \(n\) is the number of rows in \(\mathbf{A}\).

If we multiply an upper triangular matrix from the right, we can see the following:

\[\begin{align*} \begin{bmatrix} x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} \cdot \begin{bmatrix} a & b & c \\ 0 & d & e \\ 0 & 0 & f \end{bmatrix} &= \begin{bmatrix} (x_1 \cdot a + y_1 \cdot 0 + z_1 \cdot 0) & (x_1 \cdot b + y_1 \cdot d + z_1 \cdot 0) & (x_1 \cdot c + y_1 \cdot e + z_1 \cdot f) \\ (x_2 \cdot a + y_2 \cdot 0 + z_2 \cdot 0) & (x_2 \cdot b + y_2 \cdot d + z_2 \cdot 0) & (x_2 \cdot c + y_2 \cdot e + z_2 \cdot f) \\ (x_3 \cdot a + y_3 \cdot 0 + z_3 \cdot 0) & (x_3 \cdot b + y_3 \cdot d + z_3 \cdot 0) & (x_3 \cdot c + y_3 \cdot e + z_3 \cdot f) \end{bmatrix} \\ &= \begin{bmatrix} (x_1 a) & (x_1 b + y_1 d) & (x_1 c + y_1 e + z_1 f) \\ (x_2 a) & (x_2 b + y_2 d) & (x_2 c + y_2 e + z_2 f) \\ (x_3 a) & (x_3 b + y_3 d) & (x_3 c + y_3 e + z_3 f) \end{bmatrix} \end{align*} \]

So each column of the resulting matrix is a linear combination of the columns to the right of it in the original matrix, where the coefficients are the columns of the upper triangular matrix. Formally, if \(\mathbf{U}\) is upper triangular, then \((\mathbf{A}\mathbf{U})_{ij}\) depends only on columns \(j, \ldots, n\) of \(\mathbf{A}\), where \(n\) is the number of columns in \(\mathbf{A}\).

So in summary we have:

Adding two triangular matrices of the same type results in a triangular matrix of the same type. Adding a lower triangular matrix with an upper triangular matrix results in a general matrix.
Multiplying two triangular matrices of the same type results in a triangular matrix of the same type. Multiplying a lower triangular matrix with an upper triangular matrix results in a general matrix.
Multiplying a lower triangular matrix with a general matrix from the left results in a matrix where each row is a linear combination of the rows above it in the original matrix, where the coefficients are the rows of the lower triangular matrix. Multiplying a lower triangular matrix with a general matrix from the right results in a matrix where each column is a linear combination of the columns to the right of it in the original matrix, where the coefficients are the columns of the lower triangular matrix.
Multiplying an upper triangular matrix with a general matrix from the left results in a matrix where each row is a linear combination of the rows below it in the original matrix, where the coefficients are the rows of the upper triangular matrix. Multiplying an upper triangular matrix with a general matrix from the right results in a matrix where each column is a linear combination of the columns to the right of it in the original matrix, where the coefficients are the columns of the upper triangular matrix.

Transpose

The transpose of a matrix is a matrix where the rows and columns are swapped. The transpose of the matrix \(\mathbf{A}\) is written as \(\mathbf{A}^T\). Formally the element in the \(i\)-th row and \(j\)-th column in the matrix \(\mathbf{A}\) becomes the element in the \(j\)-th row and \(i\)-th column in the matrix \(\mathbf{A}^T\).

\[\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix} \]

Or also more formally:

\[(\mathbf{A}^T)_{ij} = (\mathbf{A})_{ji} \]

This results in the transpose of a row vector being a column vector and the transpose of a column vector being a row vector. The same goes for a matrix where the transpose of a matrix results in a matrix with its dimensions swapped. So the transpose of a \(m \times n\) matrix is a \(n \times m\) matrix.

Example

Transposing a matrix:

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \]

Transposing a column vector is very useful when you want to write vectors in a readable way in a text, as you can see. Most commonly in textbooks when a vector is defined it is defined as a transposed column vector.

\[\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \Rightarrow \mathbf{v}^T = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} \]

Transposing a row vector:

\[\mathbf{v} = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} \Rightarrow \mathbf{v}^T = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \]

There are a few useful properties of the transpose of a matrix that are worth remembering. The first it that if a matrix is transposed twice it is the same as the original matrix:

\[(\mathbf{A}^T)^T = \mathbf{A} \]

Example \[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \Rightarrow (\mathbf{A}^T)^T = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \mathbf{A} \]

Another important property is that when you apply a transpose to some expression in brackets the order of the matrices is reversed. For addition this does not matter as it is commutative but for multiplication it does matter. So:

\[\begin{align*} (\mathbf{A} + \mathbf{B})^T &= \mathbf{A}^T + \mathbf{B}^T \\ (\mathbf{A} \cdot \mathbf{B})^T &= \mathbf{B}^T \cdot \mathbf{A}^T (\mathbf{ABCD})^T &= \mathbf{D}^T \cdot \mathbf{C}^T \cdot \mathbf{B}^T \cdot \mathbf{A}^T \end{align*} \]

Symmetric Matrices

If a matrix is equal to its transpose it is called a symmetric matrix. So \(\mathbf{A} = \mathbf{A}^T\) and for each element in the matrix \(a_{ij} = a_{ji}\). As you can quite easily imagine a prerequisite for a matrix to be symmetric is that it is a square matrix as otherwise the transpose of the matrix would have different dimensions.

Another thing that makes sense is that a diagonal matrix is always symmetric as all the elements that are not on the diagonal are zero.

Example

A symmetric matrix:

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} = \mathbf{A} \]

A diagonal matrix is also symmetric:

\[\mathbf{A} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} = \mathbf{A} \]

Skew-Symmetric Matrix

A skew-symmetric matrix is a matrix where the elements of the matrix are equal to the negative of the elements in the transpose of the matrix. So \(\mathbf{A} = -\mathbf{A}^T\) and for each element in the matrix \(a_{ij} = -a_{ji}\).

Example \[\mathbf{A} = \begin{bmatrix} 0 & 2 & -3 \\ -2 & 0 & 5 \\ 3 & -5 & 0 \end{bmatrix} \Rightarrow \mathbf{A}^T = \begin{bmatrix} 0 & -2 & 3 \\ 2 & 0 & -5 \\ -3 & 5 & 0 \end{bmatrix} = -\mathbf{A} \]

Transposing on a Computer

When you want the transpose of a matrix you don’t actually need to perform any operations. You can just change the way you access the elements of the matrix. Rather than accessing the elements row by row you can access them column by column.

Depending on the size of the matrix and how many times you need to access the elements of the matrix this can be a lot faster than actually transposing the matrix. However, if you need to access the elements of the matrix multiple times it is probably faster to transpose the matrix first and then access the elements due to memory locality.

To transpose a square matrix in-place you can use the following algorithm which you can think of as swapping the elements of the matrix along the diagonal:


for (int i = 0; i < n; i++) {
    for (int j = i + 1; j < n; j++) {
        swap(A[i][j], A[j][i]);
    }
}

For a non-square matrix you need to use a slightly more complex algorithm.

Inverse

The inverse of a matrix is a very important and special matrix that when multiplied with the original matrix results in the identity matrix. So in other words this special matrix is the multiplicative inverse of the original matrix. The inverse of a matrix is denoted as \(\mathbf{A}^{-1}\).

\[\mathbf{A} \cdot \mathbf{A}^{-1} = \mathbf{A}^{-1} \cdot \mathbf{A} = \mathbf{I} \]

Not all matrices have an inverse. A matrix that has an inverse is called an regular/invertible/non-singular matrix. A matrix that does not have an inverse is called a degenerate/singular/non-invertible matrix. For now we will not discuss how to calculate the inverse of a matrix or when a matrix is invertible but we will do so later on in the corresponding sections for matrix rank, determinant and matrix inversion.

Example

Whether a matrix is invertible or not is visually not very obvious as for example the following random looking matrix is invertible:

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 1 \\ 4 & 4 & 5 \\ 6 & 7 & 7 \end{bmatrix} \quad \text{with} \quad \mathbf{A}^{-1} = \begin{bmatrix} -7 & -7 & 6 \\ 2 & 1 & -1 \\ 4 & 5 & -4 \end{bmatrix} \]

But the following simple matrix is not invertible:

\[\mathbf{A} = \begin{bmatrix} 2 & 4 \\ 2 & 4 \end{bmatrix} \]

Permutation Matrix

A permutation matrix is a special type of square matrix that rearranges (permutes) the rows or columns of another matrix or vector. Permutations themselves are simply reorderings of elements. We can formally define a permutation as a function \(\pi\) that maps the set \({1, 2, \dots, n}\) onto itself, i.e., \(\pi: {1, 2, \dots, n} \to {1, 2, \dots, n}\), where each number appears exactly once as an output.

Example

For example, the permutation \(\pi\) defined by \(\pi(1) = 2\), \(\pi(2) = 1\), \(\pi(3) = 3\) swaps the first two elements and leaves the third unchanged.

Because a permutation is just a reordering, we can also “reorder” the elements back to their original positions using the inverse permutation. Another way of thinking about this is that notice that a permutation is bijective, meaning it has a one-to-one correspondence between the input and output. So by definition some inverse function \(\pi^{-1}\) exists such that \(\pi^{-1}(\pi(i)) = i\) for all \(i\). In our example, the inverse permutation \(\pi^{-1}\) would be defined as \(\pi^{-1}(1) = 2\), \(\pi^{-1}(2) = 1\), and \(\pi^{-1}(3) = 3\).

Permutations are also composable, meaning you can apply one permutation after another. If you have two permutations \(\pi_1\) and \(\pi_2\), their composition is defined as:

\[(\pi_2 \circ \pi_1)(i) = \pi_2(\pi_1(i)) \]

Again because we are just reordering elements, the composition of two permutations is also a permutation. The inverse of a composition is simply the composition of the inverses in reverse order as we first want to undo the last permutation applied:

\[(\pi_2 \circ \pi_1)^{-1} = \pi_1^{-1} \circ \pi_2^{-1} \]

So if we want to undo the effect of applying \(\pi_1\) followed by \(\pi_2\), we first apply \(\pi_2^{-1}\) and then \(\pi_1^{-1}\):

\[(\pi_2 \circ \pi_1)^{-1}(i) = \pi_1^{-1}(\pi_2^{-1}(i)) \]

We can also extend this idea to permuting the components of a vector. For this we represent the permutation as a matrix as remember that we can transform vectors into other vectors by multiplying them with matrices. A permutation matrix \(\mathbf{P} \in \mathbb{R}^{n \times n}\) is obtained by permuting the rows of the identity matrix \(\mathbf{I}\_n\) according to a permutation \(\pi\). Formally, the entries of a permutation matrix are defined as:

\[P_{ij} = \begin{cases} 1 & \text{if } j = \pi(i) \\ 0 & \text{otherwise} \end{cases} \]

This means each row and each column has exactly one entry equal to \(1\), and all other entries are \(0\). We can interpret this as follows: In the \(i\)th row, the position of the \(1\) tells you where the \(i\)th row of the identity matrix (or an input vector) is being sent in the output. Alternatively, in the \(j\)th column, the \(1\) tells you that the \(j\)th column of the identity is moved to the row where that \(1\) appears.

Example

Let’s use a \(3 \times 3\) permutation matrix:

\[\mathbf{P} = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} \]

Suppose we apply \(\mathbf{P}\) to the vector \(\mathbf{x} = \begin{bmatrix} x\_1 \ x\_2 \ x\_3 \end{bmatrix}\) then the following happens:

The first row of \(\mathbf{P}\) is \((0\ 1\ 0)\): This means the first entry of the output vector is \(0 \cdot x\_1 + 1 \cdot x\_2 + 0 \cdot x\_3 = x\_2\). So, the \(1\) in column 2 means “place \(x\_2\) into the first position”.
The second row is \((0\ 0\ 1)\): The output’s second entry is \(x\_3\) (the \(1\) in column 3).
The third row is \((1\ 0\ 0)\): The output’s third entry is \(x\_1\) (the \(1\) in column 1).

\[\mathbf{P}\mathbf{x} = \begin{bmatrix} x_2 \\ x_3 \\ x_1 \end{bmatrix} \]

The properties of the normal permutation function above also apply to permutation matrices. So composing two permutation matrices corresponds to composing the permutations they represent. If \(\mathbf{P}_{\pi_1}\) and \(\mathbf{P}_{\pi_2}\) are permutation matrices for permutations \(\pi_1\) and \(\pi_2\), then their product is a permutation matrix for the composition of the two permutations:

\[\mathbf{P}_{\pi_2} \mathbf{P}_{\pi_1} = \mathbf{P}_{\pi_2 \circ \pi_1} \]

And the inverse of a permutation matrix is simply the permutation matrix for the inverse permutation:

\[\mathbf{P}_{\pi}^{-1} = \mathbf{P}_{\pi^{-1}} \]

Example

Suppose you have \(\mathbf{P}\) as above, which permutes \((x\_1, x\_2, x\_3) \to (x\_2, x\_3, x\_1)\). How do we undo this? We need a permutation matrix \(\mathbf{Q}\) such that:

\[\mathbf{Q}\mathbf{P}\mathbf{x} = \mathbf{x} \]

Let’s work this out explicitly:

\(\mathbf{P}\mathbf{x} = \begin{bmatrix} x\_2 \ x\_3 \ x\_1 \end{bmatrix}\)

We want \(\mathbf{Q}\) so that:

\[\mathbf{Q} \begin{bmatrix} x_2 \\ x_3 \\ x_1 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \]

So, what must \(\mathbf{Q}\) do? It must move \(x\_2\) (currently in position 1) back to position 2, \(x\_3\) (currently in position 2) back to position 3, \(x\_1\) (currently in position 3) back to position 1.

So, the inverse permutation matrix \(\mathbf{Q}\) will have \(1\)s in the following places:

Row 1: column 3 (to put \(x\_1\) in position 1),
Row 2: column 1 (to put \(x\_2\) in position 2),
Row 3: column 2 (to put \(x\_3\) in position 3):

\[\mathbf{Q} = \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \]

we can check that this works:

\[\mathbf{Q} \begin{bmatrix} x_2 \\ x_3 \\ x_1 \end{bmatrix} = \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_2 \\ x_3 \\ x_1 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \]

If you compare the inverse matrix \(\mathbf{Q}\) to the original \(\mathbf{P}\), we notice that they are related:

\[\mathbf{P} = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} \qquad \mathbf{Q} = \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \]

Specifically, \(\mathbf{Q}\) is the transpose of \(\mathbf{P}\):

\[\mathbf{P}^T = \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} = \mathbf{Q} \]

This is always true for permutation matrices, their inverse is just their transpose.

Example

Let’s see this with a \(4 \times 4\) permutation matrix. Suppose \(\pi(1) = 3\), \(\pi(2) = 1\), \(\pi(3) = 4\), \(\pi(4) = 2\). Then

\[\mathbf{P} = \begin{bmatrix} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{bmatrix} \]

Applying \(\mathbf{P}\) to a column vector:

\[\mathbf{P} \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} c \\ a \\ d \\ b \end{bmatrix} \]

The inverse permutation is \(\pi^{-1}(1) = 2\), \(\pi^{-1}(2) = 4\), \(\pi^{-1}(3) = 1\), \(\pi^{-1}(4) = 3\), and:

\[\mathbf{P}^{-1} = \mathbf{P}^T = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \]

Multiplying back:

\[\mathbf{P}^T \begin{bmatrix} c \\ a \\ d \\ b \end{bmatrix} = \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix} \]

So applying \(\mathbf{P}^T\) undoes the permutation applied by \(\mathbf{P}\).

From the definition of the matrix multiplication we noticed that multiplying a matrix with another matrix from the left was like applying a linear transformation to the rows of the matrix and multiplying a matrix with another matrix from the right was like applying a linear transformation to the columns of the matrix. This means that multiplying a permutation matrix with another matrix from the left permutes the rows of that matrix and multiplying a permutation matrix with another matrix from the right permutes the columns of that matrix.

Example

Suppose we want to swap the first and second rows of the matrix \(\mathbf{A}\):

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]

The corresponding permutation matrix is:

\[\mathbf{P} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

Multiplying on the left swaps the first and second rows:

\[\mathbf{P}\mathbf{A} = \begin{bmatrix} 4 & 5 & 6 \\ 1 & 2 & 3 \\ 7 & 8 & 9 \end{bmatrix} \]

Similarly, multiplying on the right swaps the first and second columns:

\[\mathbf{A}\mathbf{P} = \begin{bmatrix} 2 & 1 & 3 \\ 5 & 4 & 6 \\ 8 & 7 & 9 \end{bmatrix} \]

Frobenius Norm

The Frobenius norm is a way to measure the size of a matrix. It is defined as the square root of the sum of the squares of all the elements in the matrix. So for a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) the Frobenius norm is defined as follows:

\[\|\mathbf{A}\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{ij}^2} \]

You can also think of it as just taking the matrix and flattening it into a vector and then calculating the length of that vector, i.e. the Euclidean/L2 norm of the vector.

Example

If we define the matrix \(\mathbf{A}\):

\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \]

Then the Frobenius norm of \(\mathbf{A}\) is:

\[\|\mathbf{A}\|_F = \sqrt{1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2} = \sqrt{91} \approx 9.539 \]

Trace

The trace of a matrix is the sum of all the diagonal elements in the matrix and is denoted as \(\text{tr}(\mathbf{A})\). Because it is the sum of the diagonal elements it is only defined for square matrices, i.e. \(\mathbf{A} \in \mathbb{R}^{n \times n}\):

\[\text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii} = a_{11} + a_{22} + \cdots + a_{nn} \]

Example \[\text{tr}(\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}) = 1 + 5 + 9 = 15 \]

Todo

Add properties of the trace. and proof that it is the sum of the eigenvalues.

Todo

Parts below here are still a work in progress and might not belong here.

Orthogonal / Orthonormal Matrix

Very unclear what the difference is between these two. I think an orthogonal matrix is a matrix where the columns are orthogonal to each other but don’t have to be normalized. And an orthonormal matrix is a matrix where the columns are orthogonal to each other and are normalized, i.e. have a length of \(1\).

Todo

Add more to this section such as examples etc. these are later on very important.

Matrices

Outer Product

Square Matrix

Null/Zero Matrix

Diagonal Matrix

Identity Matrix

Matrix Addition

Additive Identity

Additive Inverse

Scalar Multiplication

Matrix Multiplication

Commutative Property

Associative Property

Distributive Property

Multiplicative Identity

Multiplying Diagonal Matrices

Mixing Vectors and Matrices

Matrix Exponentiation by Squaring

Matrix Chain Multiplication

Strassen’s Algorithm

Winograd’s Algorithm

Triangular Matrices

Transpose

Symmetric Matrices

Skew-Symmetric Matrix

Transposing on a Computer

Inverse

Permutation Matrix

Frobenius Norm

Trace

Orthogonal / Orthonormal Matrix

Householder Matrix

Block Matrix

Bi- and Tri-Diagonal Matrix

Band Matrix

Toeplitz and Hankel Matrix