Probability

Table of Contents

1. Probability

Intuitively, probability is a measure of likelihood or chance. We can think of probability from two viewpoints:

Probability can be viewed as proportion. The probability of an event is the proportion of that event to the total. For example, the probability of drawing a red ball from an urn of red and blue balls is the proportion of red balls to total balls.

Probability can be also viewed as frequency. The probability of an event is how frequent that event occurs compared to all events in our sample space. For example, we expect flipping a coin to land heads approximately half of the time, so the probability is \(\frac{1}{2}\).

The set of all possible outcomes is called the sample space, and is denoted by \(\Omega\). An event in an experiment is a subset of \(\Omega\). The key question we want to address is what the probability of an event is.

Mathematically, given an event \(E \subset \Omega\), assign a number \(\mathbb{P}(E) \in [0,1]\) called the probability measure. Then, we can define the discrete probability space \((\Omega, \mathbb{P})\) such that:

  1. \(\Omega\) is a sample space
  2. A probability \(\mathbb{P}(\{\omega\})\) for each \(\omega \in \Omega\) such that:
    • \(\mathbb{P}(\{\omega\}) \in [0, 1] \; \forall \omega \in \Omega\)
    • \(\mathbb{P}(\Omega) = \sum_{\omega \in \Omega}\mathbb{P}(\{\omega\}) = 1\)
Example: Biased coin tosses

A biased coin with \(\mathbb{P}(\text{Heads})=p\) is tossed \(n\) times. We want to find \(\mathbb{P}(E_k)\), where \(E_k\) is the event of getting \(k\) heads.

We can define \(S=\{H, T\}\), and our sample space is just all the length \(n\) strings over \(S\). Then \(E_k\) corresponds to all the strings with \(k\) heads and \(n-k\) tails. So, \(\mathbb{P}(E_k) = \binom{n}{k}p^k(1-p)^{n-k}\)

Example: Birthday Paradox

Let \(B_k\) be the event that at least 2 people in a group of \(k\) people have the same birthday. Then, \(B_k^C\) is the event of no collision, i.e. no two people have the same birthday.

The total number of birthday distributions is \(n^k\), where \(n\) is the number of posible birthdays. The number of ways to have no collision is \(n(n-1)(n-2)\cdots (n-k+1)\). Then

\begin{align} \mathbb{P}(B_k^C) = \frac{n(n-1)(n-2)\cdots (n-k+1)}{n^k} \approx e^{-\binom{k}{2}\frac{1}{n}} \end{align}

for large \(n\). Since \(n=365\), \(B_{23} > 0.5\).

1.1. Properties of Probability

An event \(B \subseteq \Omega\) is said to be partitioned into \(n\) events \(B_1, \dots , B_n\) if the following conditions are satisfied:

  1. \(B = B_1 \cup B_2 \cup \cdots \cup B_n\)
  2. \(B_i \cap B_j = \varnothing \; \forall i \neq j\) (that is, \(B_1, \dots, B_n\) are mutually exclusive)

Then, the following properties are satisfied for any valid probability space:

  • Non-negativity: \(\mathbb{P}(A) \geq 0 \; \forall A \subseteq \Omega\)
  • Countable Additivity: if \(B_1, B_2, \dots, B_n\) are partitions of \(B\), then
\begin{align} \mathbb{P}(B) = \sum_{k=1}^n \mathbb{P}(B_k), \quad \forall B \subseteq \Omega \notag \end{align}
  • Normalization: \(\mathbb{P}(\Omega) = 1\)

2. Conditional Probability

Consider two events \(A, B \subseteq \Omega\). \(B\) can then be partitioned into \(A \cap B\) and \(B \setminus (A \cap B)\). Conditional probability asks what the probability of \(A\) is given that \(B\) happens.

Conditioning on event \(B\) changes the probability space from \((\Omega, \mathbb{P})\) to \((B, \mathbb{P}_B)\). Then, the conditional probability of \(A\) given \(B\) is defined as:

\begin{align} \mathbb{P}(A | B) = \mathbb{P}_B(A \cap B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}} \end{align}

We see that this is due to the multiplication rule:

\begin{align} \boxed{\mathbb{P}(A \cap B) = \mathbb{P}(A | B)\mathbb{P}(B)} \end{align}

2.1. Law of Total Probability

Suppose \(B_1, \dots, B_n\) are partitions of \(\Omega\). Then, for any \(A \subseteq \Omega\), we have:

\begin{align} \boxed{\mathbb{P}(A) = \sum_{i=1}^n \mathbb{P}(A | B)\mathbb{P}(B_i)} \end{align}

This is true because we see that it becomes the sum of the intersections of each partition \(B_i\) with \(A\):

probability1.png

2.2. Independence

Consider any two events \(A, B \subseteq \Omega\) and suppose that the chance of \(A\) does not depend on whether or not \(B\) occurs, i.e. \(\mathbb{P}(A | B) = \mathbb{P}(A | B^C)\). Then, since \(B\) and \(B^C\) partitions \(\Omega\), by the Law of Total Probability:

\begin{align} \mathbb{P}(A) &= \mathbb{P}(A | B)\mathbb{P}(B) + \mathbb{P}(A|B^C)\mathbb{P}(B^C) \notag \\ &= \mathbb{P}(A|B)\left(\mathbb{P}(B)+\mathbb{P}(B^C)\right) \notag \\ &= \mathbb{P}(A|B) \notag \end{align}

Then, if \(A\) and \(B\) are independent, by (3) we know that \(\mathbb{P}(A \cap B) = \mathbb{P}(A | B)\mathbb{P}(B)\). But \(\mathbb{P}(A | B) = \mathbb{P}(A)\), so:

\begin{align} \boxed{\mathbb{P}(A\cap B) = \mathbb{P}(A)\mathbb{P}(B)} \end{align}
Last modified: 2026-03-19 13:58