Random Variables
So far we have seen how to work with probabilities of events and how to calculate them. However, we are often interested in quantifying outcomes of random experiments. This is where random variables come into play. A random variable is a function that assigns a numerical value to each outcome of a random experiment. More formally we define a random variable (r.v. or RV) \(X\) as a function as follows:
\[\begin{align*} X: \Omega &\to \mathbb{R} \\ \omega &\mapsto a \in \mathbb{R} \end{align*} \]where \(\Omega\) is the sample space of the random experiment and \(a\) is a real number. Importantly, for the random variable to be well defined, the function \(X\) must be measurable. In other words, for all \(a \in \mathbb{R}\), the following set must be measurable:
\[\{\omega \in \Omega | X(\omega) \leq a\} \in \mathcal{F} \]where measurable means that the set is in the sigma-algebra \(\mathcal{F}\) of the sample space. Because it is in the sigma-algebra, it means that we can assign a probability to the event that the random variable takes on a value less than or equal to \(a\). This is important because it allows us to use the random variable to model the outcomes of the random experiment in a probabilistic way. Here as always we need to be careful if our set of outcomes is not countable as then the sigma-algebra can get complicated and the definition of the random variable can be more complex.
Depending on the values of the random variable we can define different types of random variables. For example, if the experiment maps to a finite or countable set of values, we can define a discrete random variable. If the experiment maps to an uncountable set of values such as a real interval, we can define a continuous random variable. We will see more about this later.
As a simple example let’s look at a fun gambling game we can play with our friends. The game is as follows, we throw a die and if the die shows a 1, 2 or 3 we lose 1 point. If we throw a 4 nothing happens. If we throw a 5 or 6 we win 2 points. We can now define a random variable \(X\) to quantify our profit and describe the outcome of the game. The random variable \(X\) is defined as follows:
\[\forall \omega \in \Omega: X(\omega) = \begin{cases} -1 & \text{ if } \omega = 1, 2, 3 \\ 0 & \text{ if } \omega = 4 \\ 2 & \text{ if } \omega = 5, 6 \end{cases} \]So if we throw a 5 we get \(X(5) = 2\) points. If we throw a 1 we get \(X(1) = -1\) points. Because the random variables range is finite with only 3 possible values, we have a discrete random variable.
Importantly we said that the for all \(a \in \mathbb{R}\) the set \(\{\omega \in \Omega | X(\omega) \leq a\}\) must be in our sigma-algebra. So depending on our sigma-algebra a random variable can be well defined for it or not. Consider the following sigma-algebra:
- \(\mathcal{F}_1 = \mathcal{P}(\Omega)\), the power set of \(\Omega\). This is the largest sigma-algebra and contains all possible events.
- \(\mathcal{F}_2 = \{\emptyset, \Omega\}\), the trivial sigma-algebra. This is the smallest sigma-algebra and contains only the empty set and the whole sample space.
- \(\mathcal{F}_3 = \{\emptyset, \{1, 2, 3\}, \{4, 5, 6\}, \Omega\}\).
- \(\mathcal{F}_4 = \{\emptyset, \{1, 2, 3\}, \{1, 2, 3, 4\}, \{4, 5, 6\}, \{5, 6\}, \{1, 2, 3, 5, 6\}, \{4\}, \Omega\}\).
If we collect the outcomes for all the different random variable values we get the following sets:
\[\{\omega \in \Omega | X(\omega) \leq a\} = \begin{cases} \emptyset & \text{ if } a < -1 \\ \{1, 2, 3\} & \text{ if } -1 \leq a < 0 \\ \{1, 2, 3, 4\} & \text{ if } 0 \leq a < 2 \\ \{1, 2, 3, 4, 5, 6\} & \text{ if } a \geq 2 \end{cases} \]So we notice that the random variable is not well defined for \(\mathcal{F}_2\) as the set \(\{1, 2, 3\}\) is not in the sigma-algebra and not for \(\mathcal{F}_3\) as the set \(\{1, 2, 3, 4\}\) is not in the sigma-algebra. However, it is well defined for \(\mathcal{F}_1\) and \(\mathcal{F}_4\). So we can use the random variable \(X\) to quantify our profit in the game if we use these sigma-algebras. In most cases we will use the power set of the sample space as our sigma-algebra, so we can use any random variable we want. However, it is important to keep in mind that the random variable must be well defined for the sigma-algebra we are using and that we can not always use the power set as our sigma-algebra.
If we have a specific event \(A\) in our sample space \(\Omega\), we can also define a special random variable called an indicator variable. The indicator variable is defined as follows:
\[\forall \omega \in \Omega: 1_A(\omega) = \begin{cases} 1 & \text{ if } \omega \in A \\ 0 & \text{ if } \omega \notin A \end{cases} \]Again we only have two possible values for the indicator variable, 0 and 1, hence we have a discrete random variable. If we also analyze the set \(\{\omega \in \Omega | 1_A(\omega) \leq a\}\) we get the following sets:
\[\{\omega \in \Omega | 1_A(\omega) \leq a\} = \begin{cases} \emptyset & \text{ if } a < 0 \\ A^c & \text{ if } 0 \leq a < 1 \\ \Omega & \text{ if } a \geq 1 \end{cases} \]So we notice that for the indicator variable to be well defined we need to have the event \(A\) in our sigma-algebra as by the definition of a sigma-algebra we need to have the event \(A^c\) in our sigma-algebra as well. So if we have a sigma-algebra that does not contain the event \(A\), we can not use the indicator variable to quantify the event.
Probability of Random Variables
We have seen that a random variable is a function that assigns a numerical value to each outcome of a random experiment and we have seen that we can assign a probability to an event. In this section we want to combine these two concepts and assign a probability to a random variable. We have already seen that for all values \(a \in \mathbb{R}\) of a random variable \(X\) we can define the following set:
\[\{\omega \in \Omega | X(\omega) \leq a\} \in \mathcal{F} \]This set contains all the outcomes of the random experiment for which the random variable \(X\) takes on a value less than or equal to \(a\). Because it is a set of outcomes it means that we can assign a probability to it. We can do this by using the probability measure \(P\) of the sigma-algebra \(\mathcal{F}\). More specifically the probability of the random variable \(X\) taking on a value less than or equal to \(a\) is defined as follows:
\[\P(\{\omega \in \Omega | X(\omega) \leq a\}) \]To simplify the notation we usually omit the dependence on the outcome \(\omega\) and the brackets and just simply write:
\[\P(X \leq a) \]This is the probability of the random variable \(X\) taking on a value less than or equal to \(a\). We can also extend this notation to the case where the random variable \(X\) takes on a value in an interval \((a, b]\):
\[\P(X \in (a, b]) = \P(a < X \leq b) = \P(\{\omega \in \Omega | a < X(\omega) \leq b\}) \]Or also more rarely used to the case where the random variable \(X\) takes on a value in some set \(A\):
\[\P(X \in A) = \P(\{\omega \in \Omega | X(\omega) \in A\}) \]Are these valid definitions of the probability of a random variable? What about equals, less than, greater than, etc?
If we go back to our example of the gambling game we can now calculate the different probabilities of the random variable \(X\) taking on a value less than or equal to \(a\). We defined the random variable \(X\) as follows:
\[\forall \omega \in \Omega: X(\omega) = \begin{cases} -1 & \text{ if } \omega = 1, 2, 3 \\ 0 & \text{ if } \omega = 4 \\ 2 & \text{ if } \omega = 5, 6 \end{cases} \]We have also already analyzed the different sets of outcomes for the different values of \(a\):
\[\{\omega \in \Omega | X(\omega) \leq a\} = \begin{cases} \emptyset & \text{ if } a < -1 \\ \{1, 2, 3\} & \text{ if } -1 \leq a < 0 \\ \{1, 2, 3, 4\} & \text{ if } 0 \leq a < 2 \\ \{1, 2, 3, 4, 5, 6\} & \text{ if } a \geq 2 \end{cases} \]Because when throwing a die we have a laplace experiment we know that the probability of each outcome is \(\frac{1}{6}\). So we can now calculate the different probabilities:
\[\begin{align*} \P(X \leq -1) &= \P(\{1, 2, 3\}) = \frac{3}{6} = \frac{1}{2} \\ \P(X \leq 0) &= \P(\{1, 2, 3, 4\}) = \frac{4}{6} = \frac{2}{3} \\ \P(X \leq 2) &= \P(\{1, 2, 3, 4, 5, 6\}) = \frac{6}{6} = 1 \\ \end{align*} \]What about less than 1 or positive? etc?
We can also do the same and define the proability of an indicator variable for an event \(A\).
\[\begin{align*} \P(1_A \leq 0) = \P(\{\omega \in \Omega | 1_A(\omega) \leq 0\}) = \P(A^c) = 1 - \P(A) \\ \P(1_A \leq 1) = \P(\{\omega \in \Omega | 1_A(\omega) \leq 1\}) = \P(A) \\ \P(1_A \leq 2) = \P(\{\omega \in \Omega | 1_A(\omega) \leq 2\}) = 1 \\ \end{align*} \]Is this correct?
Almost Sure Events
So far we have said that an event \(A\) is a sure event if it contains all the outcomes of the sample space \(\Omega\) and therefore has the proability \(\P(A) = 1\). However, we can also define a different type of sure event called an almost sure event, commonly abbreviated as “a.s.”. An event \(A\) is called almost surely if the following holds:
\[\P(A) = 1 \]Therfore an equivalent definition of an almost sure event \(A\) is also if the complementary event \(A^c\) has the following proability:
\[\P(A^c) = 0 \]Now you might be thinking well what is the difference between a sure event and an almost sure event? The difference is that a sure event contains all the outcomes of the sample space \(\Omega\) and therefore has the probability \(\P(A) = 1\). An almost sure event on the other hand does not contain all the outcomes of the sample space \(\Omega\) but has the probability \(\P(A) = 1\). This means that an almost sure event can be a subset of the sample space \(\Omega\) but still have the probability \(\P(A) = 1\).
Tossing a coin infinitely many times
Monkey typing shakespeare.
Because of this definition on some event we can also extend the idea to random variables. We can for example say that \(X \leq a\) almost surely if the following holds:
\[\P(X \leq a) = 1 \]This means that the set of outcomes for which the random variable \(X\) takes on a value less than or equal to \(a\) is almost surely. In other words, the set of outcomes for which the random variable \(X\) takes on a value less than or equal to \(a\) has the probability \(\P(X \leq a) = 1\).
The same can be done for two random variables \(X\) and \(Y\). We can say that \(X \leq Y\) almost surely if the following holds:
\[\P(X \leq Y) = 1 \]The definition of this set of outcomes is a bit more complex as we have two random variables. However, an example of this would be if we defined the random variable \(X\) as the outcome of a die throw and the random variable \(Y\) boolean value that is true if the die throw is even and false if the die throw is odd.
Discrete Random Variables
We have already seen that if a random variable \(X\) takes on a finite or countable set of values, we can define a discrete random variable. Common examples of discrete random variables are throwing a die, flipping a coin, or counting the number of something happening. We will go into more detail about some special discrete random variables later that have earned their own names.
Probability Mass Function (PMF)
If the random variable is discrete, so in other words the range of the random variable are countable, we can define the probability mass function short PMF, or sometimes also called the density function or just simply discrete distribution of \(X\). The PMF is a function that assigns a probability to each value of the random variable. This means that the PMF maps a value of the range of the random variable to a probability between 0 and 1. More formally we the PMF maps the values as follows:
\[\begin{align*} p_X: \mathbb{R} &\to [0, 1] \\ a &\mapsto p_X(a) \end{align*} \]The PMF is defined as for all discrete values \(a\) of the random variable \(X\) and is defined as follows:
\[p_X(a) = \P(X = a) = \P(\{\omega \in \Omega | X(\omega) = a\}) = \P(X \in \{a\}) \]where \(a\) is a value of the random variable \(X\). Sometimes we also write \(f_X(a)\) instead of \(p_X(a)\) or if the random variable is clear from the context we just write \(p(a)\) or \(f(a)\). Because the PMF is a probability meassure, it must satisfy the matching properties, one of which is that the sum of all probabilities must be equal to 1 so we have:
\[\sum_{a \in X} p_X(a) = 1 \]It is important to remember that the PMF is only defined for discrete random variables. For continuous random variables we will define a different function called the probability density function (PDF) which we will see later. Because it is discrete the graph of the PMF is a series of discrete points, this is commonly shown using a table or a bar graph.


You might be thinking why does this not work for continuous random variables? The reason is rather simple and comes from the definition of a proability meassure and the behaviour of summing over an uncountable set.
This has briefly been hit on when we discuss why we need a sigma-algebra and the fact that we can not assign a probability to an uncountable set.
Cumulative Distribution Function (CDF)
For any random variable \(X\) we can define the cumulative distribution function, short CDF or sometimes also called the distribution function. Again the CDF is a function that assigns a probability to the random variable \(X\) taking on a value less than or equal to \(a\). The CDF is defined for all values of \(a \in \mathbb{R}\) and is defined as follows:
\[\begin{align*} F_X: \mathbb{R} &\to [0, 1] \\ a &\mapsto F_X(a) \end{align*} \]Importantly the difference between the PMF and the CDF is that the CDF assings a probability to the random variable \(X\) taking on a value less than or equal to \(a\) and not just equal to \(a\). This also means that it assigns a probability to the entire range of the random variable \(X\) and not just to a single value. More formally we can define the CDF as follows:
\[F_X(a) = \P(X \leq a) = \P(\{\omega \in \Omega | X(\omega) \leq a\}) \]We will first focus on the case where the random variable is discrete. In this case we can define the CDF as follows:
\[\P(X \leq a) = F_X(a) = \sum_{y \in X, y\leq a} p_X(y) = \sum_{y \in X, y \leq a} \P(X = y) \]Where we sum the probabilities over all values of the random variable \(X\) that are less than or equal to \(a\). Intuitively this makes sense.
Because probabilities are also always between 0 and 1, the CDF is a non-decreasing function or in other words monotonically increasing. This means that the CDF is a function that is always increasing or constant, but never decreasing. This comes from the fact that if \(a \leq b\) then \(\{\omega \in \Omega | X(\omega) \leq a\} \subseteq \{\omega \in \Omega | X(\omega) \leq b\}\), so we have:
\[\P(X \leq a) \leq \P(X \leq b) \]If we have \(a < b\) then we also have the following:
\[\P(a < X \leq b) = F_X(b) - F_X(a) \]This is the so called basic identity and can directly be proven by reordering the disjoint union:
\[\begin{align*} \{X \leq b\} &= \{X \leq a\} \cup \{a < X \leq b\} \\ \P(X \leq b) &= \P(\{X \leq a\} \cup \{a < X \leq b\}) \\ F_X(b) &= F_X(a) + \P(a < X \leq b) \\ \P(a < X \leq b) &= F_X(b) - F_X(a) \end{align*} \]If we go back to our example of the gambling game we already saw some of the probabilities of the random variable \(X\) taking on a value less than or equal to \(a\). We can now define the CDF for the random variable \(X\) for all values of \(a \in \mathbb{R}\):
\[F_X(a) = \begin{cases} 0 & \text{ if } a < -1 \\ \frac{1}{2} & \text{ if } -1 \leq a < 0 \\ \frac{2}{3} & \text{ if } 0 \leq a < 2 \\ 1 & \text{ if } a \geq 2 \end{cases} \]We can also see the basic identity in effect. For example if we want to know the probability that we get points, so \(X > 0\), we can use the basic identity with \(a = 0\) and \(b = 2\):
\[\P(X > 0) = \P(0 < X \leq 2) = F_X(2) - F_X(0) = 1 - \frac{2}{3} = \frac{1}{3} \]This also matches with our intuition as we have 2 outcomes that give us points and 4 outcomes that do not give us points. So the probability of getting points is \(\frac{2}{6} = \frac{1}{3}\). Visually this can be interpreted as taking the part of the function from the left to the point \(b\) and then subtracting the part of the function from the left to the point \(a\).
Again we can also do the same and define the CDF for an indicator variable for an event \(A\).
\[F_X(a) = \begin{cases} 0 & \text{ if } a < 0 \\ 1 - \P(A) & \text{ if } 0 \leq a < 1 \\ 1 & \text{ if } a \geq 1 \end{cases} \]From the example above and the plot we notice that for a function to be a CDF it must fullfill three properties:
- \(F_X(a)\) is a non-decreasing function.
- \(F_X(a)\) is a right-continuous function.
- As the variable value \(a\) goes to infinity the CDF goes to 1 and as \(a\) goes to minus infinity the CDF goes to 0.
The first property is rather trivial and comes from the fact that probabilities are always between 0 and 1 and that we are looking at values less than or equal to \(a\). There are many ways to prove this but one is by monotonicity of the probability measure. If \(a \leq b\) we have \(\{\omega \in \Omega | X(\omega) \leq a\} \subseteq \{\omega \in \Omega | X(\omega) \leq b\}\), so we have:
\[\P(X \leq a) \leq \P(X \leq b) \iff F_X(a) \leq F_X(b) \]The second property is important because it means that the CDF is continuous from the right. This means it can have jumps but that the limits from the right exist and are equal to the value of the function at that point. A CDF can also be continuous from the left, but this is not required. more formally we can say that for all \(a \in \mathbb{R}\):
\[F_X(a) = \lim_{b \to a^+} F_X(b) \]The last property in a way is like with the PMF, to garantee that the CDF is a valid probability measure. For this to be the case the values of the CDF must be between 0 and 1. In other words, the CDF must be bounded between 0 and 1. This means that the CDF must go to 1 as \(a\) goes to infinity and to 0 as \(a\) goes to minus infinity. This is important so it can be used to assign probabilities to events.
Proofs of property 2 and 3.
Conditional Random Variables
Is the same but on a restricted domain, i.e. set of outcomes that match the condition.
Transforming Random Variables
we can remap random variables to new random variables, or create a new random variable from multiple random variables. This is commonly done by using a function \(g\) that maps the random variable to a new random variable. More formally we can define a new mapping.
Independent Random Variables
Just like with events, we can also define the concept of independence for random variables. We say that \(X_1, X_2, \ldots, X_n\) are independent random variables if the following holds:
\[\forall a_1, a_2, \ldots, a_n \in \mathbb{R} \P(X_1 \leq a_1, X_2 \leq a_2, \ldots, X_n \leq a_n) = \P(X_1 \leq a_1) \cdot \P(X_2 \leq a_2) \cdots \P(X_n \leq a_n) = \prod_{i=1}^n \P(X_i \leq a_i) \]This is very similar to the definition of independence for events. The only difference is that we are looking at the joint probability of the random variables taking on a value less than or equal to their respective values \(a_i\). We discuss joint distributions here in more detail.
It actually turns out that the random variables \(X_1, X_2, \ldots, X_n\) are independent if and only if for any choice of Intervals \(I_1, I_2, \ldots, I_n\) the probability that each random variable \(X_i\) takes on a value in the interval \(I_i\) is equal to the product of the probabilities of each random variable taking on a value in their respective intervals. More formally we can say that:
\[\forall I_1, I_2, \ldots, I_n \subseteq \mathbb{R} \P(X_1 \in I_1, X_2 \in I_2, \ldots, X_n \in I_n) = \P(X_1 \in I_1) \cdot \P(X_2 \in I_2) \cdots \P(X_n \in I_n) = \prod_{i=1}^n \P(X_i \in I_i) \]We are throwing two independent dice and are considering the laplace model where \(\Omega = \{1, 2, 3, 4, 5, 6\}^2\) and \(\mathcal{F} = \mathcal{P}(\Omega)\). We can define the random variables \(X\) and \(Y\) as the outcome of the first and second die respectively and then \(Z\) as the sum of the two dice. So for each outcome \(\omega = (x, y)\) we define the random variables as follows:
\[X(\omega) = x \text{ and } Y(\omega) = y \text{ and } Z(\omega) = x + y \]Let’s first start by comparing the random variables \(X\) and \(Y\). To check this we look a the Intervals \(I, J \subseteq \{1, 2, 3, 4, 5, 6\}\) and check if they are independent. We can do this by looking at the joint probability of the random variables \(X\) and \(Y\) taking on a value in the intervals \(I\) and \(J\). This is like checking if the events \(X \in I\) and \(Y \in J\) are independent. So we have:
\[\begin{align*} \P(X \in I, Y \in J) &= \P(\{(x, y) \in \Omega | x \in I, y \in J\}) = \P(I \times J) \\ &= \frac{|I \times J|}{|\Omega|} = \frac{|I| \cdot |J|}{36} \\ &= \frac{|I|}{6} \cdot \frac{|J|}{6} = \frac{|I \times \{1, 2, 3, 4, 5, 6\}|}{36} \cdot \frac{|J \times \{1, 2, 3, 4, 5, 6\}|}{36} \\ &= \P(X \in I) \P(Y \in J) \end{align*} \]So we have shown that the random variables \(X\) and \(Y\) are independent. Because we can find the intervals \(I\) and \(J\) for any values \(x, y \in \RR\) we can also find the sets \(\{X \in I \} = \{X \leq x\}\) and \(\{Y \in J \} = \{Y \leq y\}\) and therefore we can also show that the random variables \(X\) and \(Y\) are independent in the sense of the CDF. So we have:
\[\P(X \leq x, Y \leq y) = \P(X \leq x) \P(Y \leq y) \]This also makes intuitively sense as the two dice are physically independent and throwing one die does not affect the outcome of the other die. However, what about the random variable \(Z\)? We can also check if the random variable \(Z\) is independent of one of the other random variables. To show this we can look at a specific example and check if the random variable \(Z\) is independent of the random variable \(X\). So we can look at the intervals \(I = \{1\}\) and \(J = \{1,2\}\). We can then check the following:
\[\begin{align*} \P(X \leq 1, Z \leq 2) &= \P(\{(1, 1)\}) \\ &= \frac{1}{36} \neq \frac{1}{6} \cdot \frac{2}{36} = \P(X \leq 1) \P(Z \leq 2) \end{align*} \]So we have shown that the random variable \(Z\) is not independent of the random variable \(X\). This also makes intuitively sense as the random variable \(Z\) is dependent on the random variable \(X\) and therefore the two random variables are not independent.
When reading literature on probability and statistics you will often come across the abbreviation “i.i.d.”. This stands for independent and identically distributed. This is a stricter version of independence and means that the random variables or events are independent of each other and have the same distribution. The i.i.d. assumption is often used in statistics and machine learning to simplify the analysis of data and models. More formally we can say that two random variables \(X_1\) and \(X_2\) are i.i.d. if they are independent and the following holds:
\[\forall i,j F_{X_i} = F_{X_j} \]An example of this would be if we have two coins that are thrown independently of each other. We can define the random variables \(X_1\) and \(X_2\) as the outcome of the first and second coin respectively. The two coins are then i.i.d. as they are independent of each other and have the same distribution, so they are identically distributed, so they have the same probability of showing heads or tails. Having the i.i.d condition would also allow us to make the coins unfair, so for example the coin has a 60% chance of showing heads and a 40% chance of showing tails. However, both coins would need to have the same probability measure, so they would both need to have a 60% chance of showing heads and a 40% chance of showing tails, hence they are identically distributed.
Also breaking it down using marginal distribuations
and convolutions if X+Y
Grouping Random Variables
If \(X_1, X_2, \ldots, X_n\) are independent random variables then so is some subset of them, for example \(X_1, X_2\) or \(X_1, X_2, X_3\).
Can also split them into groups and then apply a function to each group and they are still independent. Doesn’t make a lot of sense surely the function can be dependent on the random variables?
Bernoulli Distribution
We have already seen bernoulli experiments as a special case of a random experiment. A Bernoulli experiment is an experiment that has only two possible outcomes, usually called success and failure. We can also define a random variable and therefore distribution for a Bernoulli experiment. We denote a random variable \(X\) as a Bernoulli random variable with parameter \(p\) as \(X \sim \text{Ber}(p)\), where \(p\) is the probability of success. The Bernoulli random variable takes the value 1 with probability \(p\) and the value 0 with probability \(1 - p\). In other words, we can say that a Bernoulli random variable is a discrete random variable that takes the values in \(W=\{0,1\}\) with the following probabilities:
\[\P(X = 0) = 1 - p \text{ and } \P(X = 1) = p \]Plots of the PMF and CDF for the Bernoulli distribution.
We can define a Bernoulli random variable for shooting a penalty in a football game. We can define the random variable \(X\) as follows:
\[X \sim \text{Ber}(p) = \begin{cases} 0 & \text{ if the penalty is missed} \\ 1 & \text{ if the penalty is scored} \end{cases} \]where \(p\) is the probability of scoring the penalty. So if we have a penalty taker that scores 80% of the time, we can define the random variable as \(X \sim \text{Ber}(0.8)\)
Binomial Distribution
We have already seen bernoulli experiments as a special case of a random experiment and the corresponding random variable \(X \sim \text{Ber}(p)\) that defines the Bernoulli distribution.
The idea of the binomial distribution is to extend the Bernoulli distribution to a sequence of \(n\) independent Bernoulli experiments. We can define a random variable \(X\) as a binomial random variable with parameters \(n\) and \(p\) as \(X \sim \text{Bin}(n, p)\), where \(n\) is the number of experiments and \(p\) is the probability of success. The range of the binomial random variable is the set of integers from 0 to \(n\) so it takes values from the set \(W=\{0, 1, 2, \ldots, n\}\) and represents the number of successes in \(n\) independent Bernoulli experiments. So if we have a sequence of \(n\) independent Bernoulli experiments with probability of success \(p\), we can define probability to have exactly \(k\) successes as follows:
\[\P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]where \(\binom{n}{k}\) is the binomial coefficient.
Derivation of the formula using combinatorics.
Also show that sum of 0 to n of the binomial formula is 1.
Proof that binomial is the sum of independent Bernoulli random variables. In particular if n=1 then we have the Bernoulli distribution.
We can also show that two binomial random variables added together is also a binomial random variable.
multiple choice exam with 10 questions and 4 possible answers. What is the probability of getting exactly 6 questions right?
Geometric Distribution
Negative Binomial Distribution
Hypergeometric Distribution
Coupon Collector Problem
Poisson Distribution
Continuous Random Variables
CDF is always left continous with if also right continous then it is continuous.
Probability Density Function (PDF)
Cumulative Distribution Function (CDF)
Uniform Distribution
Exponential Distribution
Normal Distribution
Standard Normal Distribution
Quantiles
Z-Score
Constructing Random Variables
We have seen that a CDF is a function that assigns a probability to the random variable \(X\) taking on a value less than or equal to \(a\) and fulfills the following 3 properties:
- \(F_X(a)\) is a non-decreasing function.
- \(F_X(a)\) is a right-continuous function.
- \(\lim_{a \to -\infty} F_X(a) = 0\) and \(\lim_{a \to +\infty} F_X(a) = 1\).
The question is now if we are given some function \(F:\mathbb{R} \to [0, 1]\) that fulfills these properties, does a random variable \(X\) exist such that \(F_X(a) = F(a)\)? The practical consequence would be that we can then simulate an arbitrary random variable \(X\) given only the CDF.
First bernoulli?
Then uniform?
Then an arbitrary CDF?