Differentiable Functions
We know that for a linear function with one variable we have the standard equation:
\[y = mx + b \]Where \(m\) is the slope of the line and \(b\) is the y-intercept. For the line defined by the function \(f(x)\) finding the slope is easy by reordering the equation to the standard form. And because it is a line the slope is constant, so the same for all points \(x_0\) in the domain of the function. However, what if we wanted to find the slope of a non-linear function at some point \(x_0\)? This is what the derivate is for.
Add image of linear function with slope and y-intercept. and a non-linear function with a point \(x_0\) and the slope at that point.
First let’s just start with an approximation of the slope. For this we can define a secant line that goes through two points on the function. The slope of this secant line is then the so called difference quotient. So if we want to find the slope of the function \(f\) at the point \(x_0\), we can take a second point \(x_1\) and calculate the slope of the secant line that goes through the points \(P_0(x_0, f(x_0))\) and \(P_1(x_1, f(x_1))\). We can then define the slope of the secant line as the difference of the function values divided by the difference of the \(x\) values:
\[m = \frac{f(x_1) - f(x_0)}{x_1 - x_0} = \frac{\Delta f}{\Delta x} \]Hence it is called the difference quotient.
Add image of different secant lines with different \(x_1\) values.
Depending on the choice of \(x_1\) this will give us a different slope. We can see if we choose \(x_1\) very close to \(x_0\), we get a better approximation of the slope at the point \(x_0\). This leads us to the next idea, finding the slope of the tangent line at the point \(x_0\). The slope of this tangent line is the derivative of the function at the point \(x_0\). We get the tangent line by taking the limit of the difference quotient as \(x_1\) approaches \(x_0\). A common way of writing this is by defining the point \(x_1\) with a small change denoted by \(\Delta x\) or \(h\). To then approach \(x_0\) we can let \(\Delta x\) or \(h\) approach zero. This gives us the following definition of the derivative:
\[m = \lim_{\Delta x \to 0} \frac{f(x_0 + \Delta x) - f(x_0)}{\Delta x} = \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h} \]Add image of tangent line at point \(x_0\) with the slope defined by the limit of the difference quotient. What are actually the origin of sekant and tangent lines? Probably from trigonometry.
Using the derivate which is just the slope of the tangent line we can then find the equation of the tangent line at the point \(x_0\), all we are missing the y-intercept \(b\). We get this be reordering the equation of the tangent line to the point-slope form:
\[b = f(x_0) - m \cdot x_0 \]So the equation of the tangent line is:
\[T(x) = f(x_0) + m \cdot (x - x_0) \]Where \(m\) is the derivative of the function at the point \(x_0\). Now we can formally define differentiable functions. We say that a function \(f\) is differentiable at the point \(x_0\) if \(x_0\) is an accumulation point of the domain of \(f\) and the limit of the difference quotient exists, i.e. the derivate exists at the point \(x_0\):
\[f \text{ is differentiable at } x_0 \iff \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h} \text{ exists} \]We denote this limit as \(f'(x_0)\) or \(\frac{df}{dx}(x_0)\), which is the derivative of the function \(f\) at the point \(x_0\). We say a function is differentiable on \(D\) if it is differentiable at every point in the domain \(D\). If this is the case then we can define the derivative function \(f'\) as:
\[f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \forall x \in D \]Let us look at the derivative of some common functions to get a better understanding of differentiable functions. The constant function is defined as follows:
\[f(x) = c \quad \forall x \in D \]We can calculate the derivative of this function at any point \(x_0\) in the domain \(D\):
\[\begin{align*} \lim_{x \to x_0} \frac{f(x) - f(x_0)}{x - x_0} &= \lim_{x \to x_0} \frac{c - c}{x - x_0} \\ &= \lim_{x \to x_0} \frac{0}{x - x_0} \\ &= 0 \end{align*} \]Therefore the derivative of the constant function for any \(c \in \mathbb{R}\) and any point \(x_0 \in D\) is:
\[f'(x_0) = 0 \]A (non-constant) linear function has the form
\[f(x) = mx + b \quad \forall x \in D \]We can find the derivative at an arbitrary point \(x_0 \in D\) as follows:
\[\begin{align*} \lim_{x \to x_0} \frac{f(x) - f(x_0)}{x - x_0} &= \lim_{x \to x_0} \frac{m x + b - (m x_0 + b)}{x - x_0} \\ &= \lim_{x \to x_0} \frac{m(x - x_0)}{x - x_0} \\ &= \lim_{x \to x_0} m = m. \end{align*} \]Hence the derivative of any linear function is the slope itself:
\[f'(x_0) = m. \]Because the slope is constant, the function is differentiable everywhere and \(f'(x)=m\) for all \(x \in D\).
Next we consider the simple quadratic (squared) function:
\[f(x) = x^{2} \quad \forall x \in D. \]To find the derivative at an arbitrary point \(x_0\in D\) we apply the limit definition:
\[\begin{align*} \lim_{x \to x_0} \frac{f(x) - f(x_0)}{x - x_0} &= \lim_{x \to x_0} \frac{x^{2} - x_0^{2}}{x - x_0} \\ &= \lim_{x \to x_0} \frac{(x - x_0)(x + x_0)}{x - x_0} \\ &= \lim_{x \to x_0} (x + x_0) \\ &= 2x_0. \end{align*} \]Hence the derivative of \(f(x)=x^{2}\) is for any point \(x \in D\):
\[f'(x) = 2x. \]Because the tangent at any point has slope \(2x\) the function is differentiable everywhere on its domain. An alternative approach would’ve been to use the formulation with \(h\) instead of \(x_0\):
\[\begin{align*} \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h} &= \lim_{h \to 0} \frac{(x_0 + h)^{2} - x_0^{2}}{h} \\ &= \lim_{h \to 0} \frac{x_0^{2} + 2x_0 h + h^{2} - x_0^{2}}{h} \\ &= \lim_{h \to 0} \frac{2x_0 h + h^{2}}{h} \\ &= \lim_{h \to 0} (2x_0 + h) \\ &= 2x_0. \end{align*} \]The absolute-value function is defined piecewise:
\[f(x)= \begin{cases} x & \text{if } x \geq 0,\\ -x & \text{if } x < 0. \end{cases} \]So to find the derivative we need to look at the two cases separately. We start with the case where \(x_0>0\):
\[\begin{align*} \lim_{x \to x_0} \frac{f(x)-f(x_0)}{x-x_0} &= \lim_{x \to x_0} \frac{x - x_0}{x - x_0} = 1. \end{align*} \]So \(f'(x_0)=1\). If we look at the case where \(x_0<0\):
\[\begin{align*} \lim_{x \to x_0} \frac{f(x)-f(x_0)}{x-x_0} &= \lim_{x \to x_0} \frac{-x + x_0}{x - x_0} = \lim_{x \to x_0} \frac{-(x - x_0)}{x - x_0} = -1. \end{align*} \]So \(f'(x_0)=-1\). Lastly we need to look at the case where \(x_0=0\). In this case we have to look at the left and right limits separately as
\[\lim_{x \to 0^{+}} \frac{f(x)-f(0)}{x} = 1, \qquad \lim_{x \to 0^{-}} \frac{f(x)-f(0)}{x} = -1, \]and because the two one-sided limits are not equal the limit does not exist as the difference approaches 0, so the derivative also does not exist at \(x=0\). Putting this together we get the following piecewise definition of the derivative of the absolute value function:
\[f'(x)= \begin{cases} 1, & x>0,\\ -1, & x<0,\\ \text{undefined}, & x=0. \end{cases} \]Thus \(|x|\) is differentiable everywhere except for \(x=0\).
Van der Waerden function. Everywhere continous but nowhere differentiable.
Tangent as an Approximation
It turns out that the tangent line is actually a pretty good approximation of the function in the area around the point \(x_0\). This is useful because the tangent line is a simple linear function that we can easily work with and calculate with, whereas the original function might be more complex and expensive to compute. So we can approximate the function as follows:
\[f(x) = T(x) + R_{x_0}(x) \]Where \(T(x)\) is the tangent line at the point \(x_0\) and \(R_{x_0}(x)\) is the remainder term. If \(x \neq x_0\) Then we can rewrite this as:
\[\begin{align*} f(x) &= T(x) + R_{x_0}(x) \\ f(x) &= (f(x_0) + f'(x_0)(x - x_0)) + R_{x_0}(x) \\ f(x) - f(x_0) &= f'(x_0)(x - x_0) + R_{x_0}(x) \\ \frac{f(x) - f(x_0)}{x - x_0} &= f'(x_0) + \frac{R_{x_0}(x)}{x - x_0} \\ \lim_{x \to x_0} \frac{f(x) - f(x_0)}{x - x_0} &= f'(x_0) + \lim_{x \to x_0} \frac{R_{x_0}(x)}{x - x_0} \\ f'(x_0) &= f'(x_0) + \lim_{x \to x_0} \frac{R_{x_0}(x)}{x - x_0} \\ \lim_{x \to x_0} \frac{R_{x_0}(x)}{x - x_0} &= 0 \end{align*} \]This means that as \(x\) approaches \(x_0\), the remainder term approaches zero faster than the difference of the points on the function. So for the limit to be zero the remainder term is always a small value compared to the difference of the points on the function. This is a very useful property of differentiable functions and allows us to use the tangent line as an approximation of the function in the area around the point \(x_0\). If we define the remainder term as follows:
\[r(x) = \begin{cases} \frac{R_{x_0}(x)}{x - x_0} & \text{if } x \neq x_0 \\ 0 & \text{if } x = x_0 \end{cases} \]Then because we have \(\lim_{x \to x_0} r(x) = 0 = r(x_0)\), we can say that the remainder term is continuous at the point \(x_0\) because the limit is equal to the value, “filling” the original discontinuity hole. Because it is continuous it follows that if the function is also differentiable at \(x_0\), then by the definition there exists some \(m = f'(x_0)\) such that:
\[f(x) = f(x_0) + f'(x_0)(x - x_0) + r(x)(x - x_0) \]where \(r(x_0) = 0\) and \(f(x_0) + f'(x_0)(x - x_0) = mx + b\) is the equation of the tangent line at the point \(x_0\). This leads to the so called Weierstrass’ differentiability criterion theorem which states that if a function is differentiable at a point and there exists some \(c \in \mathbb{R}\) and function \(r(x)\) that is continuous at \(x_0\) such that the following holds:
\[f(x) = f(x_0) + c (x - x_0) + r(x)(x - x_0) \]Then the derivative \(f'(x_0) = c\) is unique. This can be shown by assuming that there is another derivative \(c\) and matching remainder term \(s(x)\).
Continuity of Differentiable Functions
Intuitively, differentiability is the “stronger” property then continuity, if you can zoom in far enough that a function looks perfectly linear, then its graph also can’t have any jumps or holes at that point. However, we can also formally show this by using the definition of differentiability. Specifically, we can show that if a function \(f\) is differentiable at a point \(x_0\), then it is also continous at that point because then there exists some function \(g\) that is continuous at \(x_0\) such that the following holds:
\[f(x) = f(x_0) + g(x)(x - x_0) \]Where \(g(x)=f'(x_0)\) is the derivative of \(f\) at the point \(x_0\). So for example \(g\) could be for the linear function \(f(x) = mx + b\) the slope \(m\) of the tangent line at the point \(x_0\). From this it follows that if \(x_0\) is an accumulation point of the domain and the derivative exists, then the function is continuous at that point.
\[f \text{ is differentiable at } x_0 \implies f \text{ is continuous at } x_0 \]Then if a function is differentiable for all \(x \in D\), then it is also continuous for all \(x \in D\):
\[f \text{ is differentiable on } D \implies f \text{ is continuous on } D \]However, the converse is not true, i.e. a function can be continuous at a point but not differentiable at that point. A common example of this is the absolute value function which is continuous everywhere as \(\lim_{x \to 0} |x| = 0 = |0|\) but not differentiable at \(x_0 = 0\) because the left and right limits of the derivative at that point are not equal:
\[\lim_{x \to 0^{+}} \frac{|x| - |0|}{x - 0} = 1 \neq -1 = \lim_{x \to 0^{-}} \frac{|x| - |0|}{x - 0} \]We already know that the exponential function is continuous everywhere. However, if we didn’t know this, we could show that it is differentiable at any point \(x_0\) in its domain which would then imply that it is also continuous at that point. We can find the derivative of the exponential function \(f(x) = e^x = \exp(x)\) at an arbitrary point \(x_0\) as follows. First we look at the difference:
\[\begin{align*} f(x_0 + h) - f(x_0) &= \exp(x_0 + h) - \exp(x_0) \\ &= \exp(x_0) \cdot \exp(h) - \exp(x_0) \\ &= \exp(x_0) (\exp(h) - 1) \end{align*} \]Then we can divide this by \(h\) and take the limit as \(h\) approaches zero:
\[\begin{align*} \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h} &= \lim_{h \to 0} \frac{\exp(x_0) (\exp(h) - 1)}{h} \\ &= \exp(x_0) \cdot \lim_{h \to 0} \frac{\exp(h) - 1}{h} \\ &= \exp(x_0) \cdot 1 = \exp(x_0) \end{align*} \]So the derivative of the exponential function at any point \(x_0\) is the exponential function itself:
\[\exp'(x_0) = \exp(x_0) \]The tricky part here is showing that the limit \(\lim_{h \to 0} \frac{\exp(h) - 1}{h} = 1\) exists. This can be shown using the definition of the exponential function as a power series:
\[\begin{align*} \exp(h) = \sum_{n=0}^{\infty} \frac{h^n}{n!} = 1 + h + \frac{h^2}{2!} + \frac{h^3}{3!} + \ldots \\ \exp(h) - 1 = h + \frac{h^2}{2!} + \frac{h^3}{3!} + \ldots \\ \frac{\exp(h) - 1}{h} = 1 + \frac{h}{2!} + \frac{h^2}{3!} + \ldots \end{align*} \]We can also look at the sine and cosine functions which are also continuous everywhere. We can find the derivative of the sine function at an arbitrary point \(x_0\) in its domain by using the useful addition formula \(\sin(a+b)=\sin a\cos b+\cos a\sin b\):
\[\begin{align*} \sin'(x_0) &=\lim_{h\to 0}\frac{\sin(x_0+h)-\sin x_0}{h} \\ &=\lim_{h\to 0}\frac{\sin x_0\cos h+\cos x_0\sin h-\sin x_0}{h}\\ &=\sin x_0\underbrace{\lim_{h\to 0}\frac{\cos h-1}{h}}_{=0} +\cos x_0\underbrace{\lim_{h\to 0}\frac{\sin h}{h}}_{=1}\\ &=\cos x_0 \end{align*} \]Key here is that \(\lim_{h\to 0}\frac{\sin h}{h}=1\) and \(\lim_{h\to 0}\frac{\cos h-1}{h}=0\), which again can be shown using the there underlying power series definitions of the sine and cosine functions:
\[\begin{align*} \sin(h) = h - \frac{h^3}{3!} + \frac{h^5}{5!} - \ldots \\ \frac{\sin(h)}{h} = 1 - \frac{h^2}{3!} + \frac{h^4}{5!} - \ldots \\ \lim_{h \to 0} \frac{\sin(h)}{h} = 1 \text{ and } \\ \cos(h) = 1 - \frac{h^2}{2!} + \frac{h^4}{4!} - \ldots \\ \frac{\cos(h) - 1}{h} = -\frac{h}{2!} + \frac{h^3}{4!} - \ldots \\ \lim_{h \to 0} \frac{\cos(h) - 1}{h} = 0 \end{align*} \]Therefore we have:
\[\sin'(x) = \cos x \]Because the cosine function is defined for all \(x \in \mathbb{R}\), the sine function is differentiable everywhere on \(\mathbb{R}\) and thus continuous everywhere on \(\mathbb{R}\) as well.
Proceeding analogously for the cosine function we have:
\[\begin{align*} \cos'(x_0) &=\lim_{h\to 0}\frac{\cos(x_0+h)-\cos x_0}{h}\\ &=\lim_{h\to 0}\frac{\cos x_0\cos h-\sin x_0\sin h-\cos x_0}{h}\\ &=\cos x_0\underbrace{\lim_{h\to 0}\frac{\cos h-1}{h}}_{=0} -\sin x_0\underbrace{\lim_{h\to 0}\frac{\sin h}{h}}_{=1}\\ &=-\sin x_0 \end{align*} \]Therefore we have:
\[\cos'(x) = -\sin x \]Constant Rule
We have already seen in one of the examples that the derivative of a constant function is zero. This is a fundamental rule in calculus and is known as the constant rule.
\[f(x)=c \quad \Longrightarrow \quad f'(x)=0,\qquad \forall c \in\mathbb R \]Factor Rule
Another rule that is very useful is the factor rule. It states that if we have a function that is a constant multiple of another function, then the derivative of that function is just the constant multiplied by the derivative of the other function. More formally:
\[f(x)=c\cdot g(x) \quad \Longrightarrow \quad f'(x)=c \cdot g'(x),\qquad \forall c \in\mathbb R \]We can see this by looking at the limit definition of the derivative at any point \(x_0\) in the domain of \(g\),
\[\begin{align*} f'(x_0) &=\lim_{h\to 0}\frac{f(x_0+h)-f(x_0)}{h} \\ &=\lim_{h\to 0}\frac{c g(x_0+h)-c g(x_0)}{h} \\ &=c \lim_{h\to 0}\frac{g(x_0+h)-g(x_0)}{h} \\ &=c g'(x_0) \end{align*} \]As an example, let’s look at the derivative of the following function:
\[f(x) = 5\sin x \]Because 5 is a constant it just gets factored out of the limit and we are left with the derivative of the sine function, which we already know is \(\cos x\). So we can write:
\[\begin{align*} f'(x) &= \lim_{h\to 0}\frac{5\sin(x+h)-5\sin x}{h} \\ &= 5\lim_{h\to 0}\frac{\sin(x+h)-\sin x}{h} \\ &= 5\cos x \end{align*} \]Summation Rule
The summation rule states that the derivative of the sum of two functions is the sum of the derivatives of those functions. More formally, if we have two functions \(a\) and \(b\), then:
\[f(x)=a(x)\pm b(x)\quad\Longrightarrow\quad f'(x)=a'(x)\pm b'(x) \]Again, we can see this by looking at the limit definition of the derivative at any point \(x_0\) in the domain of \(a\) and \(b\):
\[\begin{align*} f'(x_0) &=\lim_{h\to 0}\frac{a(x_0+h)\!\pm\! b(x_0+h)-[a(x_0)\!\pm\! b(x_0)]}{h}\\ &=\lim_{h\to 0}\left[\frac{a(x_0+h)-a(x_0)}{h}\right] \pm \lim_{h\to 0}\left[\frac{b(x_0+h)-b(x_0)}{h}\right]\\ &=a'(x_0)\pm b'(x_0) \end{align*} \]Let us look at the following function:
\[f(x)=3x^{2}+5x-7 \]We already know that the derivative of \(x^2\) is \(2x\) and the derivative of \(x\) is \(1\). So we can apply the summation rule along with the factor rule to find the derivative of \(f\):
\[\begin{align*} f'(x) &= 3\cdot 2x + 5\cdot 1 + 0 \\ &= 6x + 5 \end{align*} \]Product Rule
We often need to differentiate the product of two functions. The product rule states that if \(f(x)\) is the product of two functions \(a(x)\) and \(b(x)\), then the derivative of \(f\) is given by:
\[f(x)=a(x)b(x) \quad\Longrightarrow\quad f'(x)=a'(x)b(x)+a(x)b'(x) \]Again we can derive this by looking at the limit definition of the derivative at any point \(x_0\) in the domain of \(a\) and \(b\):
\[\begin{align*} f'(x_0) &=\lim_{h\to 0}\frac{a(x_0+h)b(x_0+h)-a(x_0)b(x_0)}{h}\\ &=\lim_{h\to 0}\frac{a(x_0+h)b(x_0+h)-a(x_0)b(x_0+h) +a(x_0)b(x_0+h)-a(x_0)b(x_0)}{h}\\ &=\lim_{h\to 0}\bigg[ b(x_0+h)\frac{a(x_0+h)-a(x_0)}{h} +a(x_0)\frac{b(x_0+h)-b(x_0)}{h}\bigg]\\ &=b(x_0)a'(x_0)+a(x_0)b'(x_0) \end{align*} \]We can apply the product rule to find the derivative of the following function:
\[f(x)=x^{2}\sin x. \]We can identify \(a(x)=x^{2}\) and \(b(x)=\sin x\). We already know that the derivative of \(x^{2}\) is \(2x\) and the derivative of \(\sin x\) is \(\cos x\). So we can apply the product rule to get the following:
\[\begin{align*} f'(x) &= a'(x)b(x) + a(x)b'(x) \\ &= 2x\sin x + x^{2}\cos x. \end{align*} \]Quotient Rule
Just like we have the product rule for the product of two functions, we also have a quotient rule for the quotient of two functions. So if \(f(x)\) is the quotient of two functions \(a(x)\) and \(b(x)\) and \(b(x) \neq 0\) then the derivative of \(f\) is given by:
\[f(x)=\frac{a(x)}{b(x)} \quad\Longrightarrow\quad f'(x)=\frac{a'(x)b(x)-a(x)b'(x)}{[b(x)]^{2}} \]Again we can derive this by looking at the limit definition of the derivative at any point \(x_0\) in the domain of \(a\) and \(b\) where \(b(x_0) \neq 0\). Importantly, we use the product rule and \(b^{-1}(x)\) for the derivative of the reciprocal function:
\[\begin{align* f'(x) &=a'(x)b^{-1}(x)+a(x)(b^{-1}(x))'\\ &=a'(x)b^{-1}(x)-a(x)b^{-2}(x)b'(x)\\ &=\frac{a'(x)}{b(x)}-\frac{a(x)b'(x)}{[b(x)]^{2}}\\ &=\frac{a'(x)b(x)-a(x)b'(x)}{[b(x)]^{2}} \end{align*} \]The tangent function is defined as the quotient of the sine and cosine functions:
\[f(x)=\tan x=\frac{\sin x}{\cos x}. \]So we can apply the quotient rule to find the derivative of the tangent function with our known derivatives of the sine and cosine functions:
\[\begin{align*} f'(x) &=\frac{\cos x \cdot \cos x - \sin x \cdot (-\sin x)}{(\cos x)^{2}} \\ &=\frac{\cos^{2} x + \sin^{2} x}{(\cos x)^{2}} \\ &=\frac{1}{(\cos x)^{2}} = \sec^{2} x. \end{align*} \]Note that we used the Pythagorean identity \(\sin^{2} x + \cos^{2} x = 1\) to simplify the expression and that the tangent function is not defined for \(x = (2k + 1)\frac{\pi}{2}\) where \(k \in \mathbb{Z}\) because the cosine function is zero at these points, making the denominator zero.
We can then do the same for the cotangent function which is defined as the quotient of the cosine and sine functions. Here again we can apply the quotient rule with our known derivatives of the sine and cosine functions and note that the cotangent function is not defined for \(x = k\pi\) where \(k \in \mathbb{Z}\) because the sine function is zero at these points, making the denominator zero:
\[f(x)=\cot x=\frac{\cos x}{\sin x}. \]Which gives us:
\[\begin{align*} f'(x) &=\frac{\sin x \cdot (-\sin x) - \cos x \cdot \cos x}{(\sin x)^{2}} \\ &=\frac{-\sin^{2} x - \cos^{2} x}{(\sin x)^{2}} \\ &=-\frac{1}{(\sin x)^{2}} = -\csc^{2} x. \end{align*} \]Power Rule
We have already seen the power rule in effect when we looked at the derivative of the quadratic function \(f(x) = x^2\). The power rule states that if we have a function that is a power of \(x\), i.e. \(f(x) = x^n\) where \(n\) is a positive integer, then the derivative of that function is as follows:
\[f(x)=x^{n} \quad \Longrightarrow \quad f'(x)=nx^{n-1} \quad \forall n \in \mathbb{N} \]You can remember this rule by thinking of the exponent as a coefficient that gets multiplied in front of the term and then the exponent gets reduced by one. To show this, we can use a proof by induction on the positive integer \(n\). We have already seen that it holds for the base cases where \(n=0\), \(n=1\) and \(n=2\). So next we assume that the rule holds for some positive integer \(n=k\), i.e. \(f(x) = x^k\) and \(f'(x) = kx^{k-1}\). Then we need to show that it also holds for \(n=k+1\):
\[\begin{align*} x^k = x \cdot x^{k-1} \\ (x^k)' &= (x \cdot x^{k-1})' \\ &= x' \cdot x^{k-1} + x \cdot (x^{k-1})' \\ &= 1 \cdot x^{k-1} + x \cdot (k - 1)x^{k-2} \\ &= x^{k-1} + kx^{k-1} \\ &= (k+1)x^{k}. \end{align*} \]Thus the rule holds for all positive integers \(n \in \mathbb N\). Once we have seen the chain rule, we can extend this to all real numbers \(n \in \mathbb R\) using the exponential function.
Let us look at the following function:
\[f(x) = x^{5}. \]We can apply the power rule to find the derivative of this function to get:
\[f'(x) = 5x^{4}. \]Inverse Function Rule
We have already seen some examples of inverse functions such as a fraction. However, we can also generalize this to any invertible function. The inverse function rule states that if \(f\) is a bijective function and \(f^{-1}\) is its inverse, then if the derivative of \(f\) exists at a point \(x_0\) in its domain, and \(f'(x_0) \neq 0\) and lastly if \(f^{-1}(y_0\) where \(y_0 = f(x_0)\) is continous and therefore an accumulation point of the codomain, then the derivative of the inverse function \(f^{-1}\) at the point \(y_0\) exists and is given by:
\[f^{-1}(y_0) = x_0 \quad \Longrightarrow \quad (f^{-1})'(y_0) = \frac{1}{f'(x_0)}. \]We have already seen the derivative of \(f(x) = x^2\), which is \(f'(x) = 2x\). Now let’s find the derivative of its inverse function \(f^{-1}(y) = \sqrt{y}\):
\[\begin{align*} f^{-1}(y) &= \sqrt{y} \\ (f^{-1})'(y) &= \frac{1}{f'(x_0)} \\ &= \frac{1}{2\sqrt{y}}. \end{align*} \]We have seen the derivative of the exponential function \(f(x) = e^x\), which is \(f'(x) = e^x\). Now let’s find the derivative of its inverse function \(f^{-1}(y) = \ln(y)\):
\[\begin{align*} f^{-1}(y) &= \ln(y) \\ (f^{-1})'(y) &= \frac{1}{f'(x_0)} \\ &= \frac{1}{e^{\ln(y)}} \\ &= \frac{1}{y}. \end{align*} \]Chain Rule
The chain rule is probably the most important but also the most complex rule for differentiation. It allows us to differentiate composite functions, i.e. functions that are composed of other functions. This has many applications and is for example key to the backpropagation algorithm for machine learning. The chain rule states that if we have a function \(f: D \to E\) and \(g: E to \mathbb{R}\) such that \(x_0\) is an accumulation point of \(D\) and \(f(x_0)\) an accumulation point in \(E\) then the derivative of the composition of the two functions \(g \circ f: D \to \mathbb{R}\) is given by:
\[f(x)=g(h(x)) \quad \Longrightarrow \quad f'(x)=g'(h(x))\cdot h'(x) \]or in other words:
\[(g \circ h)'(x) = g'(h(x)) \cdot h'(x) \]Prooving this seems rather complicated. Also what about intuitive notation with dx, dy etc. just multiplied.
Let’s look at the following function:
\[f(x) = \exp(x^3 + \sin x). \]Then we can set \(g(x) = \exp(x)\) and \(h(x) = x^3 + \sin x\). We can then apply the chain rule to find the derivative of \(f\):
\[\begin{align*} g(x) &= \exp(x) \\ g'(x) &= \exp(x) \\ h(x) &= x^3 + \sin x \\ h'(x) &= 3x^2 + \cos x \\ f'(x) &= g'(h(x)) \cdot h'(x) \\ &= \exp(x^3 + \sin x) \cdot (3x^2 + \cos x). \end{align*} \]Another example would be the following function:
\[f(x) = (17x^7 + x^5 + 2 + e^x)^{2025}. \]We can set \(g(x) = x^{2025}\) and \(h(x) = 17x^7 + x^5 + 2 + e^x\). Then we can apply the chain rule to find the derivative of \(f\):
\[\begin{align*} g(x) &= x^{2025} \\ g'(x) &= 2025x^{2024} \\ h(x) &= 17x^7 + x^5 + 2 + e^x \\ h'(x) &= 119x^6 + 5x^4 + e^x \\ f'(x) &= g'(h(x)) \cdot h'(x) \\ &= 2025(17x^7 + x^5 + 2 + e^x)^{2024} \cdot (119x^6 + 5x^4 + e^x). \end{align*} \]Power Rule with Chain Rule
The chain rule can be used show that the power rule also holds for all real numbers \(n \in \mathbb R\). We can rewrite the power function as follows:
\[f(x) = x^n = \exp(n \cdot \ln(x)). \]Then we can apply the chain rule to find the derivative of \(f\). We can set \(g(x) = \exp(x)\) and \(h(x) = n \cdot \ln(x)\). Then we can apply the chain rule to find the derivative of \(f\):
\[\begin{align*} g(x) &= \exp(x) \\ g'(x) &= \exp(x) \\ h(x) &= n \cdot \ln(x) \\ h'(x) &= n \cdot \frac{1}{x} \\ f'(x) &= g'(h(x)) \cdot h'(x) \\ &= \exp(n \cdot \ln(x)) \cdot n \cdot \frac{1}{x} \\ &= x^n \cdot n \cdot \frac{1}{x} \\ &= n \cdot x^{n-1}. \end{align*} \]So we can now use the power rule also for negative integers and real numbers. For example, if we have \(f(x) = \frac{1}{x^2}\), we can rewrite this as \(f(x) = x^{-2}\). First let’s find the derivative using the quotient rule:
\[\begin{align*} f'(x) = \frac{a'(x)b(x) - a(x)b'(x)}{[b(x)]^2} \\ &= \frac{0 \cdot x^2 - 1 \cdot (2^x)}{(x^2)^2} \\ &= \frac{-2x}{x^4} \\ &= -\frac{2}{x^3}. \end{align*} \]If we use the power rule instead, we can rewrite the function as \(f(x) = x^{-2}\) and then apply the power rule:
\[\begin{align*} f'(x) &= -2 \cdot x^{-3} \\ &= -2 \cdot \frac{1}{x^3}. \end{align*} \]We can also use the power rule for fractional exponents in other words for roots of \(x\). For example, if we have \(f(x) = \sqrt[3]{x} = x^{1/3}\), we can apply the power rule to find the derivative:
\[\begin{align*} f(x) &= x^{1/3} \\ f'(x) &= \frac{1}{3} \cdot x^{1/3 - 1} \\ &= \frac{1}{3} \cdot x^{-2/3} \\ &= \frac{1}{3} \cdot \frac{1}{\sqrt[3]{x^2}}. \end{align*} \]Higher Order Derivatives
We have seen the definition of the derivative of a function at a point, which gives us the slope of the tangent line to the graph of the function at that point. We can extend this idea to higher order derivatives, which give us further information about how the function is changing at that point. If a function \(f: D \to \mathbb{R}\) is differentiable on \(D\), then say that the function is n times differentiable for \(n \geq 2\) if the \((n-1)\)-th derivative of \(f\) is differentiable. We can then define the n-th derivative of \(f\) as the derivative of the \((n-1)\)-th derivative of \(f\):
\[f^{(n)}(x) = (f^{(n-1)})'(x). \]We further say that the function \(f\) is n times continuously differentiable if the n-th derivative of \(f\) is also continuous. Because these functions are very useful, we group them into classes based on the order of their derivatives. We denote the class of functions that are continuously differentiable once as \(C^1\), twice as \(C^2\), and so on up to \(C^k\) for \(k\) times continuously differentiable functions. This degree of continiouty and differentiability is also often referred to as the smoothness of the function. The class of infinitely differentiable functions is denoted as \(C^\infty\), which we call the smooth functions. It obviously follows that if a function is \(C^k\), then it is also \(C^j\) for all \(j < k\) so we get an inclusion chain:
\[C^\infty \subset C^k \subset C^{k-1} \subset \ldots \subset C^2 \subset C^1 \subset C^0. \]Just like for the first derivative there are some rules for higher order derivatives. Specifically we have:
- Summation Rule: The n-th derivative of the sum of two functions is the sum of the n-th derivatives of those functions.
- Product Rule: The n-th derivative of the product of two functions can be computed using the so called general Leibniz rule:
-
Quotient Existance Rule: If \(b(x) \neq 0\) for all \(x \in D\), then the function \(f(x) = \frac{a(x)}{b(x)}\) is n times differentiable on \(D\). Is this really correct?
-
Chain Rule: If \(f: D \to E\) and \(g: E \to \mathbb{R}\) are both n times differentiable, then the composition \(g \circ f: D \to \mathbb{R}\) is also n times differentiable.
Meaning of the Derivative
So far we have interpreted the derivative at a point \(x_0\) as the slope of the tangent line to the graph of \(f\) at that point. In geometric terms this tells us how the curve tilts there, in analytic terms it measures the instantaneous rate of change of the function value with respect to its input. This rate of change is crucial in understanding how functions behave and make the derivative an important tool to extract information about the shape of a graph, locate extreme values, and understand key theorems of differential calculus.
Extreme Values and Critical Points
Intuitively, a graph must “level out” before it can turn around so when the slope of the goes from positive to negative (or vice versa). These places where this levelling out might happen are called the critical points. So let \(f:D \to \mathbb R\) be differentiable on an interval containing \(x_0\), then the point \(x_0\) is called a critical point of \(f\) if either of the following conditions hold:
\[f'(x_0)=0 \quad\text{or}\quad f'(x_0)\text{ does not exist}. \]If \(f\) is continuous at a critical point, then such a critical point \(x_0\) is a candidate for a local extremum (maximum or minimum) of \(f\). You can think of a local extremum as a point where the graph “peaks” or “valleys” out. It it is called local because a function can have multiplie such points in its domain, and they are only guaranteed to be local to a small neighbourhood around the point. If they are the only extremum in the entire domain, then they are called global extrema. They are also the global maximum or minimum if they are the highest or lowest point in the entire domain of the function.
So we formally say that a function \(f\) has a local maximum at \(x_0\) if there exists some \(\delta>0\) such that for all \(x\) in the interval \((x_0-\delta,x_0+\delta)\) we have \(f(x) \leq f(x_0)\).
Similarly, we say that \(f\) has a local minimum at \(x_0\) if there exists some \(\delta>0\) such that for all \(x\) in the interval \((x_0-\delta,x_0+\delta)\) we have \(f(x) \geq f(x_0)\).
So from this it follows that if \(f\) is differentiable at \(x_0\) and \(f'(x_0) > 0\), then there exists some \(\delta>0\) such that:
- for all \(x\) in the interval \((x_0-\delta,x_0)\), we have \(f(x) < f(x_0)\) (the function is strictly increasing before \(x_0\)), and
- for all \(x\) in the interval \((x_0,x_0+\delta)\), we have \(f(x) > f(x_0)\) (the function is strictly decreasing after \(x_0\)).
If we have \(f'(x_0) < 0\), then we can also similarly show that there exists some \(\delta>0\) such that:
- for all \(x\) in the interval \((x_0-\delta,x_0)\), we have \(f(x) > f(x_0)\) (the function is strictly decreasing before \(x_0\)), and
- for all \(x\) in the interval \((x_0,x_0+\delta)\), we have \(f(x) < f(x_0)\) (the function is strictly increasing after \(x_0\)).
From this it follows that if \(f'(x_0)=0\) then it must be a local extremum, but we cannot tell whether it is a maximum or minimum just from the derivative alone. This is the so called first-derivative test.
This proof comes from the definition of continuity and the fact that the function cant just drastically change at that point.
If we have a function \(f: D \to R\) where \(D \subseteq R\) then we say that \(f\) is continuous at a point \(x_0 \in D\) if for every \(\epsilon > 0\) there exists a \(\delta > 0\) such that for all \(x \in D\) the following holds:
\[|x - x_0| < \delta \implies |f(x) - f(x_0)| < \epsilon \]So in other words, the image in the delta-neighborhood of \(x_0\) is contained in the epsilon-neighborhood of \(f(x_0)\):
\[f((x_0 - \delta, x_0 + \delta)) \subseteq (f(x_0) - \epsilon, f(x_0) + \epsilon) \]First-Derivative Test
Suppose \(f\) is continuous at \(x_0\) and differentiable on \((x_0-\delta,x_0)\cup(x_0,x_0+\delta)\).
- If \(f'\) changes sign around \(x_0\) from positive (increasing) to negative (decreasing) at \(x_0\), then \(f\) attains a local maximum at \(x_0\).
- If \(f'\) changes sign from negative to positive, \(f\) attains a local minimum at \(x_0\).
- If \(f'\) does not change sign, \(x_0\) is not an extremum; the graph merely flattens out.
This test will lead us to the second-derivative test which gives us a more direct way to determine whether a critical point is a local maximum or minimum.
Add some images and examples
- the constant function and x^2 maybe
Rolle’s Theorem
Rolle’s theorem is a rather interesting but intutive theorem that links the global behaviour of a function on an interval to the existence of a flat tangent line somewhere inside that interval. Specifically, it states that if a function is continuous on a closed interval and differentiable on the open interval, and if the function has the same value at the endpoints of the interval, then there exists at least one point in the interior of the interval where the derivative is zero and there is therefore a local extremum. Formally it can be stated as followed for the continuous function \(f:[a,b]\to\mathbb R\) that is differentiable on the open interval \((a,b)\):
\[f(a)=f(b) \quad\Longrightarrow\quad \exists c\in(a,b): f'(c)=0. \]Because \(f\) is continuous on a compact interval, it attains a maximum and a minimum somewhere in \([a,b]\) by the extreme value theorem. Let \(f(c)\) and \(f(d)\) be the maximum and minimum values, respectively, where \(c,d\in[a,b]\). We can distinguish two cases:
- If \(c\) and \(d\) occur at the endpoints, they are equal because \(f(a)=f(b)\), therefore \(f\) is constant and \(f'(x)=0\) everywhere in \((a,b)\).
- Otherwise, at least one extremum occurs at an interior point \(c\). That point is critical, so \(f'(c)=0\).
Add some images and examples
- the constant function and x^2 maybe
Mean Value Theorem
We have already met Rolle’s theorem as the special case where the function assumes equal values at the endpoints. Removing that extra condition leads to another important theorem, the mean value theorem (MVT). It states that if a function is continuous on a closed interval and differentiable on the open interval, then there exists at least one point in the interior of the interval where the instantaneous slope (derivative) equals the average slope over the entire interval. Formally it can be stated as follows for the continuous function \(f:[a,b]\to\mathbb R\) that is differentiable on the open interval \((a,b)\) then there exists at least one point \(c\) in the interior of the interval \((a,b)\) such that:
\[f'(c)=\frac{f(b)-f(a)}{b-a}. \]uses rolles theorem
From this it follows that the MVT is a generalization of Rolle’s theorem, and it can be used to find points where the instantaneous rate of change matches the average rate of change over an interval. One such example is for the constant function, where the average slope is zero, and therefore the derivative is also zero at every point in the interval. We can turn this around to get:
\[f'(x)=0 \forall x\in(a,b) \quad\Longrightarrow\quad f \text{ is constant on }[a,b]. \]Add some images, example and the proof
Another clear consequence of the MVT is that if we have the functions \(f,g\) that are continuous on \(\[a,b]\) and differentiable on \((a,b)\) then if we have the following:
\[f'(x)=g'(x) \forall x\in(a,b) \quad\Longrightarrow\quad f(x) = g(x) + C \text{ for some constant } C. \]So if the have the same derivative at all points in the interval, then they are just shifted versions of each other. The easiest way is to think of two lines with the same slope but different intercepts.
Add some images, example and the proof
Another consequence of the MVT is that if \(f'(x) \geq 0\) for all \(x \in (a,b)\), then the function is monotonically increasing on the interval \([a,b]\), and if \(f'(x) \leq 0\) for all \(x \in (a,b)\), then the function is monotonically decreasing on the interval \([a,b]\). This can be stated as follows:
\[\begin{align*} f'(x)\geq 0 \forall x\in(a,b) &&\Longrightarrow && f \text{ is monotonically increasing on }[a,b],\\ f'(x)\leq 0 \forall x\in(a,b) &&\Longrightarrow && f \text{ is monotonically decreasing on }[a,b]. \end{align*} \]We can also extend this to strict monotonicity, i.e. if \(f'(x) > 0\) for all \(x \in (a,b)\), then the function is strictly increasing on the interval \([a,b]\), and if \(f'(x) < 0\) for all \(x \in (a,b)\), then the function is strictly decreasing on the interval \([a,b]\). However, it is important to note that this is only an implication, not an equivalence. We can have a strictly increasing function where the derivative is not greater than zero at all points, for example \(f(x) = x^3\) is strictly increasing on \(\mathbb{R}\), but \(f'(0) = 0\).
Lastly we can also use the MVT to derive some useful estimates for the function values based on the derivative. This is called the Lipschitz estimate. It states that if the derivative of a function is bounded by some constant \(M\) on an interval, then the function is Lipschitz continuous on that interval. This means that the function does not oscillate too wildly and can be controlled by a linear function. More formally we can say that if \(f\) is continuous on \(\[a,b]\) and differentiable on \((a,b)\), and if the derivative is bounded by some constant \(M\), i.e. \(|f'(x)| \leq M\) for all \(x \in (a,b)\), then for all \(x,y \in \[a,b]\) we have:
\[|f(x)-f(y)|\leq M\,|x-y|. \]Convex and Concave Functions
Concave and convex functions are important concepts in calculus and optimization, as they describe the curvature of a function’s graph. Especially in optimization, convex functions have nice properties that make them easier to work with, such as having a unique global minimum. Concave functions, on the other hand, are useful for modeling situations where we want to maximize a quantity.
You can think of a convex function as a “bowl” shape that opens upwards, while a concave function is like an “arch” shape that opens downwards. The key property of these functions is how they relate to their chords, which are straight lines connecting two points on the graph. Intuitively a convex graph never lies above its chords, whereas a concave graph never lies below them. We can define a function \(f:I \to \mathbb{R}\) to be convex or concave on an interval \(I\) based on how it behaves with respect to these chords. A function is convex on an interval \(I\) if for any two points \(x_0, x_1 \in I\) and any \(t \in [0, 1]\), the following holds:
\[f\bigl((1-t)x_0+tx_1\bigr)\leq(1-t)f(x_0)+tf(x_1). \]The idea is that using the linear combination of \(x_0\) and \(x_1\) should yield a function value that is less than or equal to the linear combination of the function values at those points. This means that the graph of the function lies below the chord connecting \((x_0, f(x_0))\) and \((x_1, f(x_1))\). The \(t\) parameter represents a point on the line segment between \(x_0\) and \(x_1\), where \(t=0\) corresponds to \(x_0\) and \(t=1\) corresponds to \(x_1\).
If we replace the inequality with a strict inequality, we get the definition of a strictly convex function. Similarly, a function is concave on an interval \(I\) if for any two points \(x_0, x_1 \in I\) and any \(t \in [0, 1]\), the following holds:
\[f\bigl((1-t)x_0+tx_1\bigr)\geq(1-t)f(x_0)+tf(x_1). \]Add some images
As already mentioned, a concave function has a single local maximum and a convex function has a single local minimum on a closed interval. However, on an open interval, a convex or concave function does not necessarily have a strict local minimum or maximum. This is because the endpoints of the interval are not included, so the function can continue to increase or decrease indefinitely without reaching a minimum or maximum.
Because a convex function is “bowl-shaped” and curves upwards, it means that the slope of the function is increasing as we move from left to right. This leads us to the following tests for convexity and concavity based on the behavior of the derivative:
\[\begin{align*} f\text{ convex }\Longleftrightarrow f' \text{ is monotonic increasing on }I,\\ f\text{ concave }\Longleftrightarrow f' \text{ is monotonic decreasing on }I. \end{align*} \]There must be some proof.
This is what leads to the second-derivative test for convexity and concavity. If the second derivative of a function is positive, then the function is convex, and if the second derivative is negative, then the function is concave. If the second derivative is zero, then the test is inconclusive, and we cannot determine whether the function is convex or concave. We can summarize this as follows:
\[f''(x)\geq 0 \forall x \Longrightarrow f\text{ is convex},\\ f''(x)\leq 0 \forall x \Longrightarrow f\text{ is concave}. \]Accordingly, if the second derivative is strictly positive, then the function is strictly convex, and if it is strictly negative, then the function is strictly concave.
Add an illustration showing \(f''>0\) (curving upward) versus \(f''<0\) (curving downward).
However, it is important to note that this test only applies if the function is twice differentiable, i.e. the second derivative exists. If the function is not twice differentiable, then we cannot use this test to determine convexity or concavity. So we can still have convex functions for which the above test does not hold. From this it also follows that if a function is convex or concave then it does not necessarily mean that the second derivative is positive or negative. However, if the second derivative exists and is positive or negative, then the function is convex or concave, respectively.
We can now use this to derive the second derivative test for local extrema. If we have a critical point \(x_0\) where the first derivative is zero, i.e. \(f'(x_0) = 0\), then we can use the second derivative to determine whether it is a local minimum or maximum. The intuition behind this is rather obvious. If we know it is a critical point, then we just need to check the curvature of the graph at that point. If the second derivative is positive so \(f''(x_0) > 0\), then the graph is curving upwards, and therefore \(x_0\) is a local minimum. If the second derivative is negative so \(f''(x_0) < 0\), then the graph is curving downwards, and therefore \(x_0\) is a local maximum. If the second derivative is zero, then we cannot determine whether it is a local extremum or not, and we need to use other methods to analyze the function further.
An interesting property of convex and concave functions is that they are closed under certain operations. This means that if we apply certain operations to convex or concave functions, the result will still be a convex or concave function:
-
Scalar multiplication: If \(f\) is convex and \(\lambda \geq 0\), then \(\lambda f\) is still convex. For concave functions we require that \(\lambda\leq 0\) for convexity. So if the scalar does not change the sign of the function, then the function remains convex or concave. However, if the scalar does not fullfill these conditions, then the function is not guaranteed to be convex or concave anymore and can flip the curvature of the function.
-
Sum: The sum of two convex (or concave) functions remains convex (or concave). Strictness is also preserved if at least one summand is strictly convex/concave.
ax+b is convex and concave but not strictly convex or concave.
x^2 is convex, actually even strictly.
e^x is convex and strictly convex.
ln(x) is concave and strictly concave.
Saddle Points
We have already seen that if \(f''(x) \geq 0\) for all \(x\) in some neighborhood of \(x_0\), then the graph is locally convex and \(x_0\) is a (strict) local minimum. Similarly, if \(f''(x) \leq 0\) for all \(x\) in some neighborhood of \(x_0\), then the graph is locally concave and \(x_0\) is a (strict) local maximum.
But what happens when \(f''(x_0)=0\)? The point can still be an extremum, but it can also be neither a maximum nor a minimum. If it isn’t an extremum, we call the point a saddle point (or horizontal inflection). A saddle point is a point where the graph flattens out but does not change direction, meaning it goes from increasing to flat to increasing again (or vice versa).
So an inflection point is any point where the curvature changes sign (— that is, where \(f''\) changes sign). If in addition \(f'(x_0)=0\) the inflection is horizontal and therefore a saddle point. If \(f'(x_0)\neq 0\) the graph still bends but does not flatten.
Let’s look at \(f(x) = x^3\).
The first derivative is \(f'(x) = 3x^2\), which is zero at \(x=0\). The second derivative is \(f''(x) = 6x\), which is also zero at \(x=0\). However, the third derivative is \(f'''(x) = 6\), which is positive. From the graph of the function we can see that at \(x=0\) the graph flattens out, but it does not change direction. The graph is increasing on both sides of \(x=0\), so this point is a saddle point.
Then let’s also look at \(f(x) = x^4\).
The first derivative is \(f'(x) = 4x^3\), which is zero at \(x=0\). The second derivative is \(f''(x) = 12x^2\), which is also zero at \(x=0\). However, the third derivative is \(f'''(x) = 24x\), which is also zero at \(x=0\). The fourth derivative is \(f^{(4)}(x) = 24\), which is positive. From the graph of the function we can see that at \(x=0\) the graph flattens out, but it does not change direction. The graph is increasing on both sides of \(x=0\), so this point is a local minimum.
Higher-order Derivative Test
From our small example above with the saddle point and local extremum we can see that the second derivative test is not always sufficient to determine whether a point is a saddle point or a local extremum. In fact, we can use higher-order derivatives to determine the nature of the critical point. This is known as the higher-order derivative test. The idea is to look at the first non-zero derivative at the critical point and its order to determine whether the point is a saddle point or a local extremum. More formally, if we have a point \(x_0\) where the function \(f\) is \(C^{k+1}\) (i.e. it has \(k+1\) continuous derivatives) and the first \(k\) derivatives are zero at that point, i.e.
\[f'(x_0)=f''(x_0)=\dots=f^{(k)}(x_0)=0 \]Then we can look at the \((k+1)\)-th derivative \(f^{(k+1)}(x_0)\) to determine the nature of the critical point:
- If \(k\) is odd and \(f^{(k+1)}(x_0) > 0\), then \(x_0\) is a strict local minimum. If \(f^{(k+1)}(x_0) < 0\), then \(x_0\) is a strict local maximum.
- If \(k\) is even and \(x_0\) is a local critical point then from the above it follows that \(f^{(k+1)}(x_0) = 0\).
Can this be shown?
Match this up with the previous example of the saddle point and local extremum.
L’Hospital’s Rule
L’Hôpital’s rule is a powerful tool in calculus for evaluating limits of functions that initially yield indeterminate forms, such as \(\frac{0}{0}\) or \(\frac{\infty}{\infty}\). It allows us to compute the limit of a ratio of two functions by taking the limit of the ratio of their derivatives instead. This rule is particularly useful when dealing with limits that involve functions that are difficult to evaluate directly. However, before we look at L’Hôpital’s rule, we need to understand a stronger version of the Mean Value Theorem, Extended Mean Value Theorem. The mean value theorem stated the following: If \(f\) is continuous on the closed interval \([a,b]\) and differentiable on the open interval \((a,b)\), then there exists at least one point \(\xi\) in the open interval \((a,b)\) such that:
\[f'(c) = \frac{f(b) - f(a)}{b - a}. \]The extension of the theorem is to two functions, \(f\) and \(g\), that are continuous on the closed interval \([a,b]\) and differentiable on the open interval \((a,b)\). The theorem states that if \(g'(x) \neq 0\) for all \(x \in (a,b)\), then there exists at least one point \(\xi\) in the open interval \((a,b)\) such that:
\[g'(\xi)(f(b) - f(a)) = f'(\xi)(g(b) - g(a)). \]If we assume that \(g(b) \neq g(a)\), we can rearrange this to obtain:
\[\frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(\xi)}{g'(\xi)}. \]What is the intuitive meaning of this? What is the interpretation?
We can prove this by using Rolle’s theorem. We define a new function \(h\) based on the functions \(f\) and \(g\) as follows:
\[h(x) = (g(b) - g(a))f(x) - (f(b) - f(a))g(x). \]Then we can see that \(h(a) = h(b) = f(a)g(b) - g(a)f(b)\). Because \(f\) and \(g\) are continuous on \([a,b]\) and differentiable on \((a,b)\), the function \(h\) is also continuous on \([a,b]\) and differentiable on \((a,b)\). So it fullfills the conditions of Rolle’s theorem. By Rolle’s theorem, there exists at least one point \(\xi\) in the open interval \((a,b)\) such that \(h'(\xi) = 0\). This then results in the following equation:
\[\begin{align*} 0 &= h'(\xi) \\ &= (g(b) - g(a))f'(\xi) - (f(b) - f(a))g'(\xi) \\ (g(b) - g(a))f'(\xi) &= (f(b) - f(a))g'(\xi) \\ \frac{f(b) - f(a)}{g(b) - g(a)} &= \frac{f'(\xi)}{g'(\xi)}. \end{align*} \]Using this extended mean value theorem, we can now state L’Hôpital’s rule. If we have two functions \(f,g: [a,b]\to\mathbb R\) that are differentiable on \((a,b)\) and fullfill the following conditions:
- \(g'(x) \neq 0\) for all \(x \in (a,b)\),
- \(\lim_{x\to b^-} f(x) = \lim_{x\to b^-} g(x) = 0\) or \(\pm\infty\)
- \(\lim_{x\to b^-} \frac{f'(x)}{g'(x)} = L\) exists (finite or \(\pm\infty\)),
then we can conclude that:
\[\lim_{x\to b^-}\frac{f(x)}{g(x)}= \lim_{x\to b^-}\frac{f'(x)}{g'(x)} = L. \]This means that if the limit of the ratio of the functions \(f\) and \(g\) as \(x\) approaches \(b\) is an indeterminate form, we can evaluate the limit by taking the limit of the ratio of their derivatives instead. The same statement holds for right-hand limits \(x\to a^+\) or for \(b=\infty\) / \(b=-\infty\) (replace the interval accordingly).
no idea but will use the extended mean value theorem
There are some important things to note about L’Hôpital’s rule:
- Indeterminate forms: L’Hôpital’s rule applies only to limits that yield indeterminate forms, specifically \(\frac{0}{0}\) or \(\frac{\infty}{\infty}\). If the limit yields a determinate form, L’Hôpital’s rule is not applicable.
- Reforumlation:If the limit yield a form such as \(0\cdot\infty\), \(\infty-\infty\), \(0^0\), \(1^\infty\), or \(\infty^0\), we can often rewrite the expression into a suitable fraction that yields an indeterminate form using reciprocals or logarithms to obtain \(\frac{0}{0}\) or \(\frac{\infty}{\infty}\), and then apply L’Hôpital’s rule.
- Repeated application: If after one application the new limit again has an indeterminate form, we can apply the rule again (provided the conditions hold each time).
Let us look at the following limit for all \(a > 0\):
\[\lim_{x \to \infty} \frac{\ln x}{x^a}. \]This limit yields the indeterminate form \(\frac{\infty}{\infty}\) as \(x \to \infty\) so we can apply L’Hôpital’s rule:
\[\begin{align*} \lim_{x \to \infty} \frac{\ln x}{x^a} &= \lim_{x \to \infty} \frac{\frac{1}{x}}{ax^{a-1}} \\ &= \lim_{x \to \infty} \frac{1}{ax^a} \\ &= 0. \end{align*} \]This limit is a very important one, as it shows that the logarithm grows much slower than any polynomial function as \(x \to \infty\). This is especially useful in complexity analysis of algorithms, where we often encounter logarithmic and polynomial functions.
Next we can look at the limit:
\[\lim_{x \to 0^+} x \ln x. \]Importantly this limit is going towards zero from the right, so we get the indeterminate form \(0 \cdot (-\infty)\). We can rewrite this as a fraction to apply L’Hôpital’s rule:
\[\lim_{x \to 0^+} x \ln x = \lim_{x \to 0^+} \frac{\ln x}{\frac{1}{x}}. \]Which now has the form \(\frac{-\infty}{\infty}\) as \(x \to 0^+\) so we can now apply L’Hôpital’s rule:
\[\begin{align*} \lim_{x \to 0^+} \frac{\ln x}{\frac{1}{x}} &= \lim_{x \to 0^+} \frac{\frac{1}{x}}{-\frac{1}{x^2}} \\ &= -\lim_{x \to 0^+} -x \\ &= 0. \end{align*} \]If we look at the limit:
\[\lim_{x \to 1} \frac{x^3 - 1}{x^2 - 1}. \]Then as \(x \to 1\) we get the indeterminate form \(\frac{0}{0}\) so we can apply L’Hôpital’s rule:
\[\begin{align*} \lim_{x \to 1} \frac{x^3 - 1}{x^2 - 1} &= \lim_{x \to 1} \frac{3x^2}{2x} \\ &= \lim_{x \to 1} \frac{3x}{2} \\ &= \frac{3}{2}. \end{align*} \]We could’ve also solved this limit by factoring the numerator and denominator:
\[\begin{align*} \lim_{x \to 1} \frac{x^3 - 1}{x^2 - 1} &= \lim_{x \to 1} \frac{(x-1)(x^2 + x + 1)}{(x-1)(x+1)} \\ &= \lim_{x \to 1} \frac{x^2 + x + 1}{x + 1} \\ &= \frac{1^2 + 1 + 1}{1 + 1} \\ &= \frac{3}{2}. \end{align*} \]The reason why we couldn’t just from the start plug in \(x=1\) is because the limit is not defined at that point, as both the numerator and denominator are zero. However, we can still evaluate the limit by using L’Hôpital’s rule or by factoring the expression.