Rank of a Matrix
The rank of a matrix is a key measurement of a matrix and has many important properties such as that only full rank matrices can be inverted. However, for now we will just define it as the number of linearly independent rows or columns in the matrix. The number of linearly independent rows is also often called the row rank and the number of linearly independent columns is called the column rank.
This can intuitively be interpreted as the amount of information contained in the matrix, as each linearly independent row or column adds new information that cannot be derived from the others. Later on we will see that the rank also corresponds to the number of pivots in the reduced row echelon form of the matrix, which is a more practical way to compute the rank and that it also corresponds to the dimension of the row space or column space of the matrix. We commonly denote the rank of a matrix \(A\) as follows:
\[r = \text{rank}(\mathbf{A}) \]Determining the rank of a matrix can be done easily by hand by transforming the matrix into its reduced row echelon form (RREF) with Gaussian elimination and counting the number of non-zero rows. The rank can also be computed using various algorithms or singular value decomposition (SVD), which are more efficient for larger matrices.
Because the rank is defined as the number of linearly independent rows or columns, it is important to note that the rank of a matrix is always less than or equal to the minimum of the number of rows and columns in the matrix. In other words, if a matrix has \(m\) rows and \(n\) columns, then:
\[\text{rank}(\mathbf{A}) \leq \min(m, n) \]For a square matrix with \(n\) rows and \(n\) columns, the rank can therefore be at most \(n\). If the rank is equal to \(n\), then the matrix is said to be full rank, meaning that all rows and columns are linearly independent. If a matrix is tall so that \(m > n\), then the rank can be at most \(n\), but if all the columns are linearly independent, then the rank is \(n\) and the matrix is said to be column full rank. If the matrix is wide so that \(m < n\), then the rank can be at most \(m\), but if all the rows are linearly independent, then the rank is \(m\) and the matrix is said to be row full rank.
For a matrix \(\mathbf{A}\) to have \(\text{rank}(\mathbf{A}) = 0\), it must be the zero matrix, i.e. all entries are zero. This is because the only linear combination of the rows or columns that can equal the zero vector is the trivial solution where all scalars are zero as if you have a non-zero row or column, then you automatically have at least one linearly independent row or column.
We have also already seen how to construct a matrix \(\mathbf{B}\) with \(\text{rank}(\mathbf{B}) = 1\) by taking the outer product of two vectors \(\mathbf{u}\) and \(\mathbf{v}\), i.e. \(\mathbf{B} = \mathbf{u}\mathbf{v}^T\). This is because the resulting matrix will have all rows as scalar multiples of the first row, hence it has only one linearly independent row:
\[\begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_m \end{bmatrix} \begin{bmatrix} v_1 & v_2 & \ldots & v_n \end{bmatrix} = \begin{bmatrix} u_1 v_1 & u_1 v_2 & \ldots & u_1 v_n \\ u_2 v_1 & u_2 v_2 & \ldots & u_2 v_n \\ \vdots & \vdots & \ddots & \vdots \\ u_m v_1 & u_m v_2 & \ldots & u_m v_n \end{bmatrix} \]Another way of describing the rank of a matrix is as the dimension of the row space or column space of the matrix. The row space is the subspace spanned by the rows of the matrix, while the column space is the subspace spanned by the columns of the matrix. The rank is then equal to the dimension of these subspaces, which is the number of linearly independent vectors needed to span the subspace. This means that the rank of a matrix is also equal to the number of pivots in the reduced row echelon form of the matrix, which is a practical way to compute the rank.
This example will be very simple to give some intuition of the rank as calculating the rank of a matrix is discussed in the other sections. First let us start with some simple matrices:
\[\begin{align*} &\begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix} \, &\begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \, &\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \, &\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \\ &r=1 \, &r=0 \, &r=3 \, &r=1 \end{align*} \]And then some less obvious ones:
\[\begin{align*} &\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \, &\begin{bmatrix} 1 & 3 \\ 2 & 6 \\ 4 & 12 \end{bmatrix} \, &\begin{bmatrix} 1 & 3.1 \\ 2 & 6 \\ 4 & 12 \end{bmatrix} \, &\begin{bmatrix} 1 & 3 & 2 \\ 6 & 6 & 1 \\ 4 & 2 & 0 \end{bmatrix} \\ &r=2 \, &r=1 \, &r=2 \, &r=3 \end{align*} \]Let’s explore some fundamental properties of matrix rank, each with an explanation and example. The first property is that the rank of a matrix is invariant under certain operations, which means that the rank does not change when we perform these operations on the matrix.
A first example would be swapping rows or columns, which does not change the rank of the matrix. This is because swapping rows or columns does not affect the linear independence of the rows or columns.
Another example is multiplying a matrix by a nonzero scalar. This operation does not change its rank:
\[\text{rank}(c\mathbf{A}) = \text{rank}(\mathbf{A}), \quad \text{for } c \in \mathbb{R} \setminus \{0\} \]The intuition is that multiplying all elements of the matrix by \(c\) does not change the linear relationships between the rows or columns. If a set of rows is linearly independent, multiplying them by a nonzero scalar keeps them linearly independent. Importantly, if \(c = 0\), then the matrix becomes the zero matrix, which has rank 0, hence the condition that \(c \neq 0\).
Suppose we have a matrix \(\mathbf{A}\) with rank \(r\). This means there are \(r\) linearly independent columns (or rows) \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_r\) such that we have non-zero scalars \(\lambda_1, \lambda_2, \ldots, \lambda_r\) for which:
\[\lambda_1 \mathbf{v}_1 + \lambda_2 \mathbf{v}_2 + \ldots + \lambda_r \mathbf{v}_r = \mathbf{o} \]If we multiply the entire matrix by a nonzero scalar \(c\), then all the vectors become \(c\mathbf{v}_1, c\mathbf{v}_2, \ldots, c\mathbf{v}_r\). The linear combination becomes:
\[\lambda_1 (c\mathbf{v}_1) + \lambda_2 (c\mathbf{v}_2) + \ldots + \lambda_r (c\mathbf{v}_r) = c(\lambda_1 \mathbf{v}_1 + \lambda_2 \mathbf{v}_2 + \ldots + \lambda_r \mathbf{v}_r) = c \cdot \mathbf{o} = \mathbf{o} \]So the set of vectors \(c\mathbf{v}_1, c\mathbf{v}_2, \ldots, c\mathbf{v}_r\) is still linearly independent, and thus the rank remains \(r\).
Both \(\mathbf{A}\) and \(c\mathbf{A}\) have rank 2, since both rows (and columns) are linearly independent.
The rank of the sum of two matrices satisfies:
\[\text{rank}(\mathbf{A + B}) \leq \text{rank}(\mathbf{A}) + \text{rank}(\mathbf{B}) \]The intuition here is that when you add two matrices, the resulting matrix cannot have more independent rows or columns than the total number of independent rows or columns in the two original matrices as by adding them together, you are essentially combining their information and creating linear combinations of their rows and columns.
Remember that we said the rank is also the same as the the dimension of the row space or column space of the matrix. When you add two matrices, the resulting matrix’s column (or row) space is spanned by the union of the column (or row) spaces of the two matrices. The maximum number of linearly independent vectors in this union cannot exceed the sum of the independent vectors in each individual matrix. Thus we have:
\[C_{\matbf{A} + \mathbf{B}} \subseteq C_{\mathbf{A}} + C_{\mathbf{B}} \]where \(C_{\mathbf{A}}\), \(C_{\mathbf{B}}\) and \(C_{\mathbf{A} + \mathbf{B}}\) are the column spaces of the matrices. And then bringing it back to the rank, we have:
\[\text{rank}(\mathbf{A} + \mathbf{B}) \leq \text{dim}(C_{\mathbf{A}}) + \text{dim}(C_{\mathbf{B}}) = \text{rank}(\mathbf{A}) + \text{rank}(\mathbf{B}) \]Notice that the equality only holds if the column spaces of \(\mathbf{A}\) and \(\mathbf{B}\) do not overlap (are disjoint), meaning they do not share any linearly independent vectors. If they do share some vectors, then the rank of the sum will be less than the sum of the ranks.
Let
\[\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix},\quad \mathbf{B} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \]Both have rank 1. But
\[\mathbf{A} + \mathbf{B} = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix} \]This matrix still has rank 1 (the second column is a scalar multiple of the first), so in this case,
\[\text{rank}(\mathbf{A} + \mathbf{B}) = 1 \leq 1 + 1 = 2 \]If you try with
\[\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix},\quad \mathbf{B} = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} \]where the first column of \(\mathbf{A}\) and the second column of \(\mathbf{B}\) are linearly independent, so they have disjoint column spaces, then:
\[\mathbf{A} + \mathbf{B} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]which is the identity matrix of rank 2, and so
\[\text{rank}(\mathbf{A} + \mathbf{B}) = 2 = 1 + 1 \]Another very important property of the rank of a matrix is that the rank of a matrix and its transpose are always equal:
\[\text{rank}(\mathbf{A}^T) = \text{rank}(\mathbf{A}) \]This then has an important implication that the row rank (number of linearly independent rows) is equal to the column rank (number of linearly independent columns) for any matrix. It might seem like, for a rectangular or even a square matrix, you could pick three independent columns, but the rows are forced into dependencies. However, this cannot actually happen, so we have:
\[\text{column rank}(\mathbf{A}) = \text{row rank}(\mathbf{A}) = \text{rank}(\mathbf{A}) \]Proving this is a bit tricky but remember that after transforming a matrix into its reduced row echelon form (RREF), the number of leading 1s (pivots) in the rows is equal to the number of independent columns and therefore the dimension of the column space. And that the number of non-zero rows in the RREF is equal to the number of independent rows, which is the dimension of the row space. Since both these dimensions are equal and both define the rank we have that that the number of independent rows is equal to the number of independent columns. If we then transpose the matrix in RREF, the rows become columns and the columns become rows, so the number of independent rows in the original matrix is equal to the number of independent columns in the transposed matrix, and vice versa.
Let
\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \]the row rank is 2 (the two rows are not multiples of each other) and the column rank is also 2 (the first two columns are not multiples of each other). If we then transpose the matrix:
\[\mathbf{A}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \]Column rank of \(\mathbf{A}^T\) is also 2 (no column is a scalar multiple of another) and the row rank is also 2 (the two rows are not multiples of each other). So, rank\((\mathbf{A}^T) = \)rank\((\mathbf{A})\).
If we perform Gauss-Jordan elimination on \(\mathbf{A}\) to get it into RREF, we get the following:
\[RREF(\mathbf{A}) = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 2 \end{bmatrix} \]which has 2 leading 1s, so the column rank is 2. The two non-zero rows also indicate that the row rank is 2. If we then transpose the matrix and perform Gauss-Jordan elimination again, we get:
\[RREF(\mathbf{A}^T) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \]which also has 2 leading 1s, so the column rank is 2. The two non-zero rows also indicate that the row rank is 2. So we have shown that the rank of a matrix and its transpose are equal.
We also have that the rank of the product of two matrices is at most the minimum of their ranks:
\[\text{rank}(\mathbf{A}\mathbf{B}) \leq \min(\text{rank}(\mathbf{A}), \text{rank}(\mathbf{B})) \]So intuitively, multiplying the matrix \(\mathbf{B}\) by \(\mathbf{A}\) cannot create more independent rows or columns than either matrix already has. Remember when defining the matrix multiplication we said that the columns of the product matrix are linear combinations of the columns of the first matrix, and the rows of the product matrix are linear combinations of the rows of the second matrix. This means that the column space of the product is contained within the column space of \(\mathbf{A}\), and the row space is contained within the row space of \(\mathbf{B}\) and because these dimensions need to be equal and corresponds to the rank, we have that the rank of the product is at most the minimum of the ranks of the two matrices.
Let
\[\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix},\quad \mathbf{B} = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix} \]\(\text{rank}(\mathbf{A}) = 1\), \(\text{rank}(\mathbf{B}) = 1\) (second column of \(\mathbf{B}\) is twice the first). Their product is then:
\[\mathbf{A}\mathbf{B} = \begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix} \]and has rank 1, which is less than or equal to the minimum of the ranks of \(\mathbf{A}\) and \(\mathbf{B}\). Notice that the first row is a linear combination of the first row of \(\mathbf{A}\) and the first column of \(\mathbf{B}\), and the second row is just zero, so it does not add any new information.
Lastly probably one of the most important properties of the rank is that the rank of the product of a matrix and its transpose is equal to the rank of the original matrix. This property is very useful and important as we will see later in the context of singular value decomposition (SVD) and other applications. The property can be stated as follows:
\[\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}^T\mathbf{A}) = \text{rank}(\mathbf{A}\mathbf{A}^T) \]To see why this is the case I suggest you check the other sections out which show the following:
- The column space of \(\mathbf{A}\) is the same as the column space of \(\mathbf{A}^T\mathbf{A}\). This is shown here, and because the rank is the dimension of the column space, we have that the rank of \(\mathbf{A}\) is equal to the rank of \(\mathbf{A}^T\mathbf{A}\). So if \(\mathbf{A}\in \mathbb{R}^{m \times n}\) and has full column rank, so \(\text{rank}(\mathbf{A}) = n\), then \(\mathbf{A}^T\mathbf{A} \in \mathbb{R}^{n \times n}\) also has rank \(n\) and therefore then full rank and is invertible.
- The row space of \(\mathbf{A}\) is the same as the row space of \(\mathbf{A}\mathbf{A}^T\). This is shown here, and because the rank is the dimension of the row space, we have that the rank of \(\mathbf{A}\) is equal to the rank of \(\mathbf{A}\mathbf{A}^T\). So if \(\mathbf{A}\in \mathbb{R}^{m \times n}\) and has full row rank, so \(\text{rank}(\mathbf{A}) = m\), then \(\mathbf{A}\mathbf{A}^T \in \mathbb{R}^{m \times m}\) also has rank \(m\) and therefore then full rank and is invertible.
Let
\[\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix} \]Clearly, the second column is twice the first, so rank is 1.
Compute \(\mathbf{A}^T\mathbf{A}\):
\[\mathbf{A}^T = \begin{bmatrix} 1 & 3 \\ 2 & 6 \end{bmatrix} \]So
\[\mathbf{A}^T\mathbf{A} = \begin{bmatrix} 1 & 3 \\ 2 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix} = \begin{bmatrix} 1\cdot1 + 3\cdot3 & 1\cdot2 + 3\cdot6 \\ 2\cdot1 + 6\cdot3 & 2\cdot2 + 6\cdot6 \end{bmatrix} = \begin{bmatrix} 10 & 20 \\ 20 & 40 \end{bmatrix} \]Second column is twice the first, so the rank is still 1.
Regularization
We can make a matrix full rank by adding some scaled identity matrix to it, this is called regularization. This is often used in machine learning to prevent overfitting, as it adds some noise to the data and makes the model more robust. The regularization parameter \(\lambda\) controls the amount of noise added to the data, and can be tuned to achieve the desired level of robustness?
Show that column rank = row rank always