Continuous Functions

In mathematics we say a real function $f$ is continuous if it does not have any abrupt changes in value, meaning that small changes in the input lead to small changes in the output. Or how most people describe it is that the function can be drawn without lifting the pen from the paper, so there are no jumps or holes in the graph of the function. This is a very intuitive definition, but we can also define it more formally. We group these functions into the set of all continuous functions with domain $D \subseteq \mathbb{R}$, denoted as:

\[C(D) = \{f : D \to \mathbb{R} \mid f \text{ is continuous} \} \]

Epsilon-Delta Definition of Continuity

If we have a function $f: D \to R$ where $D \subseteq R$ then we say that $f$ is continuous at a point $x_0 \in D$ if for every $\epsilon > 0$ there exists a $\delta > 0$ such that for all $x \in D$ the following holds:

\[|x - x_0| < \delta \implies |f(x) - f(x_0)| < \epsilon \]

So in other words, the image in the delta-neighborhood of $x_0$ is contained in the epsilon-neighborhood of $f(x_0)$:

\[f((x_0 - \delta, x_0 + \delta)) \subseteq (f(x_0) - \epsilon, f(x_0) + \epsilon) \]

The epsilon and delta neighborhoods of a point x_0 and the function f(x_0).

This can be interpreted as the function $f$ being continuous at the point $x_0$ if we can make the output values of $f$ as close to $f(x_0)$ as we want by choosing input values sufficiently close to $x_0$. Or in other words the difference of the point $x$ to $x_0$ can be made arbitrarily small if we move the output value of $f$ to $f(x_0)$. This idea of their not being any drastic changes in the output value of $f$ when we change the input value $x$ slightly around $x_0$ is what makes a function continuous at a point. This idea of continuity is also what a lot of people have in mind when they think of continuous functions. It is also what makes the function “smooth” at that point, i.e. it has no jumps or holes at that point and can be drawn without lifting the pen from the paper.

We then say the entire function $f$ is continuous on the set $D$ if it is continuous at every point $x_0 \in D$.

Example

We are given the following function:

\[f(x) = x \]

And we assume it is continuous at all points $x_0 \in R$. Then we have

\[|x - x_0| < \delta \implies |f(x) - f(x_0)| < \epsilon \]

Which is the case as $|f(x) - f(x_0)| = |x - x_0|$ and we can therefore choose $\delta = \epsilon$ for any $\epsilon > 0$.

Example

We want to show that $f(x) = |x|$ is continuous at all points $x_0 \in \mathbb{R}$. For this we need to show that for every $\epsilon > 0$ there exists a $\delta > 0$ such that for all $x \in \mathbb{R}$:

\[|x - x_0| < \delta \implies \big|\,|x| - |x_0|\,\big| < \epsilon \]

We start with the reverse triangle inequality, which states that for any real numbers $x$ and $x_0$:

\[\big|\,|x| - |x_0|\,\big| \leq |x - x_0| \]

This means that the change in the output of the absolute value function is at most the change in the input.

So, if we make $|x - x_0| < \delta$, then we automatically have:

\[\big|\,|x| - |x_0|\,\big| \leq |x - x_0| < \delta \]

To ensure that $\big|\,|x| - |x_0|\,\big| < \epsilon$, it is sufficient to choose $\delta = \epsilon$.

**Therefore, for any $\epsilon > 0$, we set $\delta = \epsilon$.**Then, for any $x \in \mathbb{R}$ with $|x - x_0| < \delta$:

\[\big|\,|x| - |x_0|\,\big| \leq |x - x_0| < \delta = \epsilon \]

This shows that $f(x) = |x|$ is continuous at every point $x_0 \in \mathbb{R}$ by the epsilon-delta definition.

Continuity as a Limit

We have already seen the “epsilon-delta” definition of continuity, but there is another way to characterize continuity at a point, namely, by using limits. This limit-based formulation is not only equivalent to the epsilon-delta definition but is also very useful both for calculations and for building intuition.

A function $f : D \to \mathbb{R}$ is continuous at a point $x_0 \in D$ if and only if for every sequence $(a_n)_{n \geq 1}$ in $D$ that converges to $x_0$, the following holds:

\[\lim_{n \to \infty} a_n = x_0 \implies \lim_{n \to \infty} f(a_n) = f(x_0) \]

In other words, if we take any sequence of input values that get arbitrarily close to $x_0$, then the sequence of output values must get arbitrarily close to $f(x_0)$. We can also write this as:

\[f\left( \lim_{n \to \infty} a_n \right ) = \lim_{n \to \infty} f(a_n) \]

This is sometimes called the sequential characterization. This limit-based definition is equivalent to the epsilon-delta definition. To see why, note that the epsilon-delta definition says: for every $\epsilon > 0$, there is a $\delta > 0$ such that if $|x - x_0| < \delta$ then $|f(x) - f(x_0)| < \epsilon$. But a sequence $(a_n)$ converging to $x_0$ means that for every $\delta > 0$, there is an $N$ such that for all $n > N$, $|a_n - x_0| < \delta$. So, if $f$ is continuous at $x_0$, then $|f(a_n) - f(x_0)| < \epsilon$ for all large $n$, which means that $\lim_{n \to \infty} f(a_n) = f(x_0)$.

The intuition is that the behavior of $f$ at $x_0$ is “predictable” from nearby values if it is continuous: the function value at $x_0$ agrees with the limiting values as we approach $x_0$. If a function is not continuous at $x_0$, then you can find at least one sequence that converges to $x_0$ for which the output values do not converge to $f(x_0)$. This gives a very convenient way to prove a function is not continuous at a point $x_0$, it suffices to find a sequence $a_n \to x_0$ such that $f(a_n)$ does not converge to $f(x_0)$, so we have a counter example. This is often easier than working with epsilons and deltas, especially for piecewise or step-functions.

Proof

First we show that the epsilon-delta definition implies the sequential definition, and then we show the reverse implication.

Suppose $f$ is continuous at $x_0$ in the epsilon-delta sense. Let $(a_n)$ be a sequence in $D$ with $a_n \to x_0$. Fix $\epsilon > 0$. By continuity, there exists $\delta > 0$ such that $|x - x_0| < \delta$ implies $|f(x) - f(x_0)| < \epsilon$. Since $a_n \to x_0$, there exists $N$ such that for all $n > N$, $|a_n - x_0| < \delta$. Therefore, for all $n > N$,

\[|f(a_n) - f(x_0)| < \epsilon \]

which means $f(a_n) \to f(x_0)$. Hence we have shown that the epsilon-delta definition implies the sequential definition. Now we show the reverse implication.

Suppose that for every sequence $a_n \to x_0$, we have $f(a_n) \to f(x_0)$. Suppose, for contradiction, that the epsilon-delta definition fails at $x_0$. Then there exists $\epsilon_0 > 0$ such that for every $\delta > 0$, there exists $x \in D$ with $|x - x_0| < \delta$ but $|f(x) - f(x_0)| \geq \epsilon_0$. For each $n \in \mathbb{N}$, choose $\delta = 1/n$ and pick $a_n$ such that $|a_n - x_0| < 1/n$ but $|f(a_n) - f(x_0)| \geq \epsilon_0$. Then $a_n \to x_0$ but $f(a_n)$ does not converge to $f(x_0)$, which is a contradiction. Therefore, the epsilon-delta definition must also hold.

So, both definitions are equivalent.

Example

We have seen using the epsilon-delta definition that the function $f(x) = |x|$ is continuous at every point $x_0 \in \mathbb{R}$. Now we can also show this using the limit definition.

So $f$ is continuous at $x_0$ if for every sequence $(a_n)$ with $a_n \to x_0$, it follows that

\[\lim_{n \to \infty} f(a_n) = f(x_0) \]

For $f(x) = |x|$, this means we want to show:

\[\lim_{n \to \infty} |a_n| = |x_0| \]

whenever $a_n \to x_0$. Let $(a_n)$ be any sequence with $a_n \to x_0$ as $n \to \infty$. So we want to show that $|a_n| \to |x_0|$. Let $\epsilon > 0$ be given. Since $a_n \to x_0$, there exists $N \in \mathbb{N}$ such that for all $n > N$:

\[|a_n - x_0| < \epsilon \]

But by the reverse triangle inequality,

\[\big|\,|a_n| - |x_0|\,\big| \leq |a_n - x_0| \]

So for all $n > N$:

\[\big|\,|a_n| - |x_0|\,\big| \leq |a_n - x_0| < \epsilon \]

Therefore, $\big|,|a_n| - |x_0|,\big| \to 0$ as $n \to \infty$, which means $|a_n| \to |x_0|$. Thus,

\[\lim_{n \to \infty} f(a_n) = |x_0| = f(x_0) \]

Therefore $f(x) = |x|$ is continuous at every point $x_0 \in \mathbb{R}$.

Example

Next let’s look at a discontinuous function to illustrate the limit definition of continuity:

\[g(x) = \begin{cases} 1, & x \geq 0 \\ 0, & x < 0 \end{cases} \]

Is $g$ continuous at $x_0 = 0$? For this we take the sequence $a_n = -\frac{1}{n}$, so $a_n \to 0$ from the left. Then $g(a_n) = 0$ for all $n$. But $g(0) = 1$. Thus:

\[\lim_{n \to \infty} g(a_n) = 0 \neq 1 = g(0) \]

So $g$ is not continuous at $x_0 = 0$.

Example

We define $h(x)$ as the indicator of rationals:

\[h(x) = \begin{cases} 1, & x \in \mathbb{Q} \\ 0, & x \notin \mathbb{Q} \end{cases} \]

this function, is also called the Dirichlet function, and it is not continuous at any point. First we show that it is not continous for any number $x_0 \in \mathbb{I}$ where $\mathbb{I}$ is the set of irrational numbers $\mathbb{I} = \mathbb{R} \setminus \mathbb{Q}$. For this we construct the sequence $a_n$ where the element is the number $x_0$ to the $n$-th decimal place. So if $x_0 = \pi$ then $a_1 = 3$, $a_2 = 3.1$, $a_3 = 3.14$, $a_4 = 3.141$, etc. Then we can see that the sequence converges to $x_0$ as $n \to \infty$. But for all $n$ we have $h(a_n) = 1$ because all the numbers in the sequence are rational numbers, so we have:

\[\lim_{n \to \infty} h(a_n) = 1 \neq 0 = h(x_0) \]

To prove that $h$ is not continuous at any rational number $x_0 \in \mathbb{Q}$, we can use a similar argument. We can construct the sequence $a_n$ where the element is the number $x_0 + frac{\pi}{k}$ where $k$ is the $n$-th element of the sequence of natural numbers. So if $x_0 = 1$, then $a_1 = 1 + \frac{\pi}{1} = 1 + \pi$, $a_2 = 1 + \frac{\pi}{2}$, $a_3 = 1 + \frac{\pi}{3}$, etc. Then we can see that as $n \to \infty$, the additive term $\frac{\pi}{n}$ becomes arbitrarily small, and thus the sequence converges to $x_0$. But for all $n$ we have $h(a_n) = 0$ because all the numbers in the sequence are irrational numbers, so we have:

\[\lim_{n \to \infty} h(a_n) = 1 \neq 0 = h(x_0) \]

The limit-based definition of continuity is not only a useful way to describe continuity at a point, but also connects directly to the algebraic properties of limits that you probably already know from calculus. This allows us to use all the powerful results about limits to prove facts about continuous functions.

For example the usual limit properties from calculus immediately give us analogous properties for continuous functions. Let $f, g : D \to \mathbb{R}$ be functions both continuous at $x_0 \in D$, and let $c \in \mathbb{R}$ be a constant. Then:

Sum Rule: Then $f+g$ is continuous at $x_0$.
\[\lim_{x \to x_0} [f(x) + g(x)] = \lim_{x \to x_0} f(x) + \lim_{x \to x_0} g(x) \]
Difference Rule: Then $f-g$ is continuous at $x_0$.
\[\lim_{x \to x_0} [f(x) - g(x)] = \lim_{x \to x_0} f(x) - \lim_{x \to x_0} g(x) \]
Product Rule: Then $f \cdot g$ is continuous at $x_0$.
\[\lim_{x \to x_0} [f(x)g(x)] = \left(\lim_{x \to x_0} f(x)\right) \left(\lim_{x \to x_0} g(x)\right) \]
Quotient Rule: If $g(x_0) \neq 0$ then $\frac{f}{g}$ is continuous at $x_0$.
\[\lim_{x \to x_0} \frac{f(x)}{g(x)} = \frac{\lim_{x \to x_0} f(x)}{\lim_{x \to x_0} g(x)} \]
Constant Multiple Rule: If $c \in \mathbb{R}$ is a constant, then $c \cdot f$ is continuous at $x_0$.
\[\lim_{x \to x_0} [c \cdot f(x)] = c \cdot \lim_{x \to x_0} f(x) \]
Composition Rule: If $g$ is continuous at $x_0$ and $f$ is continuous at $g(x_0)$, then $f \circ g$ is continuous at $x_0$.
\[\lim_{x \to x_0} f(g(x)) = f\left(\lim_{x \to x_0} g(x)\right) = f(g(x_0)) \]

Proof

By definition, $f$ and $g$ are continuous at $x_0$ if:

\[\lim_{x \to x_0} f(x) = f(x_0) \qquad \text{and} \qquad \lim_{x \to x_0} g(x) = g(x_0) \]

From the sum rule we have:

\[\lim_{x \to x_0} [f(x) + g(x)] = \lim_{x \to x_0} f(x) + \lim_{x \to x_0} g(x) = f(x_0) + g(x_0) = (f+g)(x_0) \]

Thus, by the limit definition,

\[\lim_{x \to x_0} [f(x) + g(x)] = (f+g)(x_0) \]

So $f+g$ is continuous at $x_0$.

Example

We have seen that the identity function $f(x) = x$ and the constant function $c(x) = 1$ are continuous everywhere. Combining these we get the function $g(x) = x + 1$, which is also continuous everywhere. We can then also either multiply this function by a constant, say $c = 2$, to get $h(x) = 2(x + 1)$ or add the identity function again to show that $f + g = 2x + 1$ is continuous everywhere.

The same can be done to get $h(x) = x^2$ and $k(x) = x^3$. Both are continuous everywhere, so $h + k = x^2 + x^3$ and $h \cdot k = x^5$ are continuous everywhere.

As a result, any polynomial function is continuous everywhere in $\mathbb{R}$, since it is a sum of continuous functions multiplied by scalars and powers of $x$, each of which is continuous.

Another less obvious but very useful property is that if $f$ and $g$ are continuous at $x_0$, then $|f|$, $\max(f, g)$, and $\min(f, g)$ are also continuous at $x_0$. For this we first show that $|f|$ is continuous at $x_0$. By definition of continuity we need to show that for every sequence $a_n \to x_0$, we have:

\[\lim_{n \to \infty} |f(a_n)| = |f(x_0)| \]

We can use the fact that $f$ is continuous at $x_0$, which means:

\[\lim_{n \to \infty} f(a_n) = f(x_0) \]

Now, since the absolute value function $|x|$ is continuous everywhere, we can use the composition rule:

\[\lim_{n \to \infty} |f(a_n)| = |f(x_0)| \]

So $|f|$ is continuous at $x_0$. We can then write the $\max$ and $\min$ terms of $f$ and $g$ as follows:

\[\max(f(x), g(x)) = \frac{f(x) + g(x)}{2} + \frac{|f(x) - g(x)|}{2} \]

and

\[\min(f(x), g(x)) = \frac{f(x) + g(x)}{2} - \frac{|f(x) - g(x)|}{2} \]

Since sums, differences, and the absolute value of continuous functions are continuous, so are $\max$ and $\min$.

Example

Let $f(x) = x^2 - 4$, $g(x) = x^3$. Both are continuous everywhere. Then $|f(x)|$ is continuous everywhere (as the absolute value of a continuous function), and both $\max(f(x), g(x))$ and $\min(f(x), g(x))$ are also continuous everywhere.

Intermediate Value Theorem

The Intermediate value theorem is a result of the continuity of a function. The theorem is if we are given an interval $I \subseteq R$ and we have a continuous function $f: I \to R$. If we then take two points $a, b \in I$ such that $f(a) < f(b)$, then there exists at least one point $c \in (a, b)$ such that $f(c) = y$ for any $y$ in the interval $(f(a), f(b))$. In other words, the function takes every value between $f(a)$ and $f(b)$ at least once in the interval $(a, b)$, so the function can not “skip” any values in that interval.

Proof

Let $f: [a, b] \to \mathbb{R}$ be continuous and assume without loss of generality that $f(a) < y < f(b)$. (If $f(a) > f(b)$, the proof is the same after swapping $a$ and $b$.)

We will construct two sequences $(a_n)$ and $(b_n)$ in $[a, b]$ such that $a_n < b_n$ for all $n$, $a_n$ is increasing, $b_n$ is decreasing, and $\lim_{n \to \infty} a_n = \lim_{n \to \infty} b_n = c$ for some $c \in [a, b]$.

Let $a_1 = a$, $b_1 = b$.

For each $n \geq 1$, define $m_n = \frac{a_n + b_n}{2}$ (the midpoint).

If $f(m_n) < y$, set $a_{n+1} = m_n$, $b_{n+1} = b_n$.
If $f(m_n) > y$, set $a_{n+1} = a_n$, $b_{n+1} = m_n$.
If $f(m_n) = y$, then $c = m_n$ and we are done.

This process creates nested closed intervals $[a_n, b_n]$ of length $|b_n - a_n| = (b-a)/2^{n-1}$.

By construction:

$f(a_n) < y < f(b_n)$ for all $n$,
$a_n$ increases, $b_n$ decreases, and $b_n - a_n \to 0$ as $n \to \infty$.

Therefore, $c = \lim_{n \to \infty} a_n = \lim_{n \to \infty} b_n \in [a, b]$.

Now, by continuity of $f$ at $c$, since $a_n \leq c \leq b_n$ and $a_n, b_n \to c$, we have $f(a_n) \to f(c)$ and $f(b_n) \to f(c)$.

But for all $n$, $f(a_n) < y < f(b_n)$, so in the limit, $f(a_n) \leq f(c) \leq f(b_n)$ and thus $f(c) = y$.

Therefore, there exists $c \in [a, b]$ such that $f(c) = y$. Because $f(a) < y < f(b)$, $c$ cannot be equal to $a$ or $b$, so $c \in (a, b)$.

From the intermediate value theorem it follows that if we have a continuous function $f: [a, b] \to R$ and $f(a) < 0$ and $f(b) > 0$, then there exists at least one point $c \in (a, b)$ such that $f(c) = 0$. In other words, the function has at least one nullpoint in the interval $(a, b)$.

As it crosses the x-axis, it must have at least one root in the interval (a, b).

As we have also already seen what the polynomial functions look like, we can also see that they have at least one real root in the interval $(a, b)$ if the function is continuous and of odd degree. For even degree polynomials, the function can have no real roots for example x^2 + 1 has no real roots, but two complex roots.

Example

Show that the polynomial $f(x) = 4x^3 - 6x^2 + 3x - 2$ has at least one real root in the interval $(1, 2)$ using the intermediate value theorem. We can use the intermediate value theorem because all polynomial functions are continuous. For this let’s check the values of the function at the endpoints of the interval:

\[f(1) = 4(1)^3 - 6(1)^2 + 3(1) - 2 = 4 - 6 + 3 - 2 = -1 \]\[f(2) = 4(2)^3 - 6(2)^2 + 3(2) - 2 = 32 - 24 + 6 - 2 = 12 \]

Since $f(1) < 0$ and $f(2) > 0$, by the intermediate value theorem, there exists at least one point $c \in (1, 2)$ such that $f(c) = 0$. So the polynomial has at least one real root in the interval $(1, 2)$.

Extreme Value Theorem

The extreme value theorem is another rather intuitive but powerful theorem like the intermediate value theorem. It states that if we have a continuous function $f$ that is defined on a [compact interval] $I = [a, b]$, so $I$ is closed and bounded, then $f$ has at least one global maximum and one global minimum in the interval $I$. More formally, we have a function $f: I \to R$ that is continuous on $I$, then there exists points $x_{max}, x_{min} \in I$ such that:

\[f(x_{min}) \leq f(x) \leq f(x_{max}) \quad \forall x \in I \]

The intuition behind this theorem is that if a function is continuous on a closed and bounded interval, then it cannot “escape” the interval and must therefore reach its maximum and minimum values at some points in the interval.

An important consequence of this theorem is because it has a global maximum and minimum, then it means that the function is bounded on the interval $I$, so in other words we have:

\[f(x_{min}) = \inf \{f(x) \mid x \in I} \quad \text{and} \quad f(x_{max}) = \sup{f(x) \mid x \in I} \]

for some $x_{min}, x_{max} \in I$. This means that the function does not go to infinity or negative infinity on the interval $I$, so it is bounded. From this it follows that if a function is continuous on a closed interval because their exists a global maximum and minimum, then the image of the function is also a closed interval, where the bounds of the image are the global maximum and minimum of the function on the interval $I$ as shown above.

Recall that an interval $[a, b]$ is compact if it is both closed (contains its endpoints $a$ and $b$) and bounded (the endpoints are finite real numbers). If either property fails, the extreme value theorem can fail as well. For instance:

On the open interval $(a, b)$, the function $f(x) = x$ does not attain a maximum (since the supremum is $b$, but $b \notin (a, b)$).
On the interval $[0, 1)$, the function $f(x) = x$ has a minimum at $0$ but no maximum, since $1$ is not included.

Proof

Let $f: [a, b] \to \mathbb{R}$ be continuous. We show that $f$ attains both its maximum and its minimum on $[a, b]$. We focus on the maximum as the minimum follows by applying the argument to $-f$.

First we want to show that $f$ is bounded on $[a, b]$. Suppose, for contradiction, that $f$ is not bounded above. Then for every $n \in \mathbb{N}$, there exists $x_n \in [a, b]$ such that $f(x_n) > n$. Thus, we can construct a sequence $(x_n)_{n \geq 1}$ in $[a, b]$ with $f(x_n) > n$ for each $n$.

Because $[a, b]$ is closed and bounded (hence compact), by the Bolzano–Weierstrass theorem there then exists a convergent subsequence $(x_{n_k})*{k \geq 1}$ with $x*{n_k} \to x^\* \in [a, b]$. Since $f$ is continuous at $x^\*$, we have

\[f(x_{n_k}) \to f(x^*) \]

as $k \to \infty$. But $f(x_{n_k}) > n_k \to \infty$, so $f(x_{n_k}) \to \infty$, which contradicts the fact that $f(x^\*)$ is a real number. Thus, $f$ is bounded above on $[a, b]$. By the same argument, $f$ is bounded below.

Now we know that $f$ is bounded above, so we can define the supremum of $f$ on the interval $[a, b]$.Let $M := \sup{f(x) \mid x \in [a, b]}$. Since $f$ is bounded above, $M$ exists and is finite. Now, by the definition of supremum, for every $n \in \mathbb{N}$, there exists $x_n \in [a, b]$ such that

\[f(x_n) > M - \frac{1}{n} \]

Again, the sequence $(x_n)$ lies in $[a, b]$, which is compact. Thus, there exists a convergent subsequence $(x_{n_k})$ such that $x_{n_k} \to x_{\max} \in [a, b]$. Since $f$ is continuous,

\[f(x_{n_k}) \to f(x_{\max}) \]

But for all $k$,

\[f(x_{n_k}) > M - \frac{1}{n_k} \]

So as $k \to \infty$, $f(x_{n_k}) \to M$. Therefore,

\[f(x_{\max}) = M \]

which means $f$ attains its supremum at $x_{\max} \in [a, b]$.

We can apply the same argument to $-f$, whose maximum is $-m$ where $m = \inf{f(x) \mid x \in [a, b]}$. Therefore, $f$ attains its minimum at some $x_{\min} \in [a, b]$.

Example

If we look at the function $f(x) = \frac{1}{x}$ on the interval $I = (0, 1]$, note this is not a compact interval because it is not closed. Then we can see that function only has a global minimum at $x = 1$ where $f(1) = 1$ and no global maximum because it approaches infinity as $x$ approaches 0. So the function is not bounded on this interval. This shows why the theorem requires the interval to be closed and bounded, otherwise the function can have no global maximum or minimum.

If we look at the function $f(x) = x^2$ on the interval $I = [0, \infty)$, then we can see that the function has a global minimum at $x = 0$ where $f(0) = 0$ and no global maximum because it approaches infinity as $x$ approaches infinity. So the function is not bounded on this interval.

Another example is the function $f(x) = x$ on the interval $I = [0, 1)$, then we can see that the function has a global minimum at $x = 0$ where $f(0) = 0$ but no global maximum because it approaches 1 as $x$ approaches 1. So the function is not bounded on this interval. However, if we take the interval $I = [0, 1]$, then the function has a global maximum at $x = 1$ where $f(1) = 1$ and a global minimum at $x = 0$ where $f(0) = 0$. So the function is bounded on this interval.

Inverse Function Theorem

The inverse function theorem states that if we have a continuous function $f: I \to R$ that is strictly monotonic on a closed interval $I = [a, b]$, then it has an inverse function $f^{-1}: f(I) \to I$ that is also continuous and strictly monotonic. So more specifically, if $f$ is strictly increasing then $f^{-1}$ is also strictly increasing, and if $f$ is strictly decreasing then $f^{-1}$ is also strictly decreasing.

Proof

For their to be an inverse function $f^{-1}$, the function $f$ must be bijective, so it must be injective and surjective. Because the function is strictly monotonic, it is injective, because for a function to be injective, it must not map two different input values to the same output value. More formally if $x \neq y$ then $f(x) \neq f(y)$ for all $x, y \in I$ as then $x < y$ or $x > y$ and therefore $f(x) < f(y)$ or $f(x) > f(y)$, so the function is injective.

To show the function is surjective is a bit more complicated. We need to show that for every $y \in f(I)$ there exists an $x \in I$ such that $f(x) = y$. Let’s say we have $a = f(x)$ for some $x \in I$. Then we know if $x \neq y$ then $a \neq b$. Then because of the intermediate value theorem, we know that for every $y \in (f(a), f(b))$ there exists an $z \in [x, y]$ such that $f(z) = y$. Therefore if $f: [x,y] \to [a, b] is surjective.

So because $f$ is continuous and strictly monotonic on a closed Interval, it is bijective, so it has an inverse function $f^{-1}: f(I) \to I$.

But what about the monotonicity of the inverse function? If $f$ is strictly increasing, then for every $x_1 < x_2$ we have $f(x_1) < f(x_2)$. The inverse function therefore must also be strictly increasing, because if $f^{-1}(y_1) < f^{-1}(y_2)$ then $f(f^{-1}(y_1)) < f(f^{-1}(y_2))$ which means $y_1 < y_2$. So the inverse function is also strictly increasing. The same argument holds for the case where $f$ is strictly decreasing, then the inverse function is also strictly decreasing.

We now show that if $f$ is continuous and strictly monotonic, then $f^{-1}$ is also continuous on $J$. So let $y_0 \in J$, and let $x_0 = f^{-1}(y_0)$. We want to show that for every sequence $(y_n)$ in $J$ such that $y_n \to y_0$, we have $f^{-1}(y_n) \to f^{-1}(y_0)$.

Suppose $y_n \to y_0$ in $J$. Let $x_n = f^{-1}(y_n)$. Since $f$ is strictly increasing (or decreasing), and $f(x_n) = y_n \to y_0 = f(x_0)$, and $f$ is continuous and injective, this implies $x_n \to x_0$. If not, there would exist a subsequence $x_{n_k}$ converging to $x^* \neq x_0$ (since $I$ is an interval, $x^* \in I$), but then by continuity $f(x_{n_k}) \to f(x^)$, so $y_{n_k} \to f(x^)$, but $y_{n_k} \to y_0 = f(x_0)$, so $f(x^) = f(x_0)$. But $f$ is injective, so $x^ = x_0$. Contradiction. Thus $x_n \to x_0$.

Therefore, $f^{-1}$ is continuous at every point $y_0 \in J$.

We can actually extend this definition to open intervals and also unbounded intervals.

Example

If we look at the polynomial functions $f: [0, \infty) \to [0, \infty)$ defined by $f(x) = x^n$ for $n \in N$. Because we are only looking at the interval $[0, \infty)$, we can see that the function is continuous and strictly increasing on this interval. Therefore, it has an inverse function $f^{-1}(y) = y^{1/n}$ which is called the $n$-th root function. This function is also continuous on the interval $[0, \infty)$.

We already know that $f$ is continous because it is a polynomial function, it remains to be shown that it is strictly increasing. For this we can show that for every $x_1, x_2 \in [0, \infty)$ with $x_1 < x_2$, we have $f(x_1) < f(x_2)$. This is true because if $x_1 < x_2$, then $x_1^n < x_2^n$ for all $n \in N$, so the function is strictly increasing.

Lastly we can show that it is surjective. For this we need to show that for every $y \in [0, \infty)$ there exists an $x \in [0, \infty)$ such that $f(x) = y$ without using the root function as this is what we are trying to prove.

For $y = 0$, we can take $x = 0$, then $f(0) = 0^n = 0$. For $y > 0$, we can define $g(x) = x^n - y$. Then $g$ is continuous on $[0, \infty)$ and $g(0) = -y < 0$. As $x \to \infty$, $x^n \to \infty$ so $g(x) \to \infty$.

By the intermediate value theorem, since $g(0) < 0$ and $\lim_{x \to \infty} g(x) = +\infty$, and $g(x)$ is continuous, there must exist some $x_0 \in (0, \infty)$ such that $g(x_0) = 0$, i.e. $x_0^n = y$. Thus, for every $y \in [0, \infty)$, there is $x \in [0, \infty)$ with $x^n = y$, so $f(x) = x^n$ is surjective onto $[0, \infty)$.

Example

Another example is the exponential function $f: \mathbb{R} \to (0, \infty)$ defined by $f(x) = e^x = \exp(x)$. We know we can represent the exponential function as an infinite series:

\[e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots \]

We can quickly see that $\exp(0) = 1$. And that $\exp(x) > 0$ for all $x \in \mathbb{R}$ because all the terms are positive. Another way to show this is to use the the fact that $(\exp(a))(\exp(b)) = \exp(a + b)$ for all $a, b \in \mathbb{R}$. If we then look at $a=x$ and $b=-x$, we have $\exp(x) \exp(-x) = \exp(x - x) = \exp(0) = 1$. This then means that $\exp(-x) = \frac{1}{\exp(x)} > 0$ for all $x \in \mathbb{R}$, so $\exp(x) > 0$ for all $x \in \mathbb{R}$. This is because we have shown that the exponential function can never be zero as otherwise the reciprical would be undefined. Then by the intermediate value theorem it must be that $\exp(x) > 0$ for all $x \in \mathbb{R}$.

Next we want to show that the exponential function is strictly increasing. For this we can use the fact that the derivative of the exponential function is itself, so $\exp'(x) = e^x > 0$ for all $x \in \mathbb{R}$. This means that the function is strictly increasing because if $x > y$ then $\exp(x) > \exp(y)$, so the function is injective.

Todo

Showing it is surjective is annoying.

Showing it is continous even more.

Now that we have defined $\ln(x)$ as the inverse of $e^x$ on $(0, \infty)$, we can prove its key properties. We’ll do this by translating statements about $e^x$ into statements about $\ln(x)$, using the definition $e^{\ln(x)} = x$ for $x > 0$ and $\ln(e^y) = y$ for all $y \in \mathbb{R}$.

Logarithm of a product:

\[\ln(xy) = \ln(x) + \ln(y) \quad \text{for } x, y > 0 \]

Proof \[e^{\ln(x) + \ln(y)} = e^{\ln(x)}e^{\ln(y)} = xy$, so $\ln(xy) = \ln(x) + \ln(y) \]

Logarithm of 1:

\[\ln(1) = 0 \]

Proof \[e^0 = 1, so \ln(1) = 0 \]

Power with exponential:

\[x^a = e^{a\ln(x)} \quad \text{for } x > 0,\, a \in \mathbb{R} \]

Logarithm of a product:

\[\ln(x \cdot y) = \ln(x) + \ln(y) \quad \text{for } x, y > 0 \]

Proof \[e^{\ln(x) + \ln(y)} = e^{\ln(x)} e^{\ln(y)} = xy, so \ln(xy) = \ln(x) + \ln(y) \]

Logarithm of a quotient:

\[\ln\left(\frac{x}{y}\right) = \ln(x) - \ln(y) \quad \text{for } x, y > 0 \]

Proof \[e^{\ln(x) - \ln(y)} = e^{\ln(x)} e^{-\ln(y)} = \frac{x}{y}, so \ln\left(\frac{x}{y}\right) = \ln(x) - \ln(y) \]

Multiple of a logarithm:

\[a \ln(x) = \ln(x^a) \quad \text{for } x > 0,\, a \in \mathbb{R} \]

Proof \[e^{\ln(a)} \cdot \ln(x) = e^{\ln(x^a)} = x^a, so a \ln(x) = \ln(x^a) \]

Power as exponential:

\[x^a = e^{a\ln(x)} \quad \text{for } x > 0,\, a \in \mathbb{R} \]

Proof \[e^{a\ln(x)} = e^{\ln(x^a)} = x^a, so x^a = e^{a\ln(x)} \]

Logarithm of a power:

\[\ln(x^a) = a\ln(x) \quad \text{for }x > 0,\, a \in \mathbb{R} \]

Proof \[x^a = e^{a\ln(x)}, so \ln(x^a) = a\ln(x) \]

Exponent Laws: For $x > 0$, $a, b \in \mathbb{R}$:

\[x^a x^b = x^{a+b} \] \[(x^a)^b = x^{ab} \]

Proof \[x^a x^b = e^{a\ln(x)} e^{b\ln(x)} = e^{(a+b)\ln(x)} = x^{a+b} \]\[(x^a)^b = e^{a\ln(x)}^b = e^{ab\ln(x)} = x^{ab} \]