10.13 Sample Space, Sigma-Algebra, Frequency
Probability spaces
Introduction
Consider a box has 4 red balls, 6 blue balls and 10 green balls. We are able to calculate the probability of red, blue and green.
But what is a probability as a mathematical object?
Would the answer be the same if some balls are of difference sizes?
Note that when we ask about a probability, we need to determine the event whose probability we are interested in - while the probability of a specific event (e.g. 'the ball is red') is a number in \([0, 1]\), the probability on its own, is a map that attaches to each event a number. There are three possible outcomes for this experiment: red, blue and green.
Sample Space
The set of all elementary events is called a Sample space and is denoted by \(\Omega\).
Elements of \(\Omega\), that is elementary events, are denoted by \(\omega \in \Omega\).
As a mathematical object, \(\Omega\) is any non-empty set - in this case, \(\Omega = \{\text{red, blue, green}\}\)
Example
-
- We throw a coin. There are two results: heads or tails. Thus \(\Omega = \{H, T\}, |\Omega|=2\).
- We throw a die. There are six possible results. Thus \(\Omega = \{1, 2, 3, 4, 5, 6\}, |\Omega|=6\).
Often we are not interested in a concrete result of an experiment but we just want to know if this result belongs to a subset of \(\Omega\). Such subsets are called events and we denote them with capital letters: \(A, B, C, D\) etc.
-
- We throw two dice. Let \(A\) be an event that the sum of spots is equal to 5. Then \(\Omega = \{(i, j) | 1 \le i \le 6, 1 \le j \le 6\}, A = \{(1, 4), (2, 3), (3, 2), (4, 1)\}\).
- We throw a coin until we get heads. Let \(A\) be an event where at most 3 trials were done. \(\Omega = \{(H), (T,H), (T,T,H), \dots \}. A = \{(H), (T,H), (T,T,H)\}\).
Since we'll be working with sets, we'll need to perform operations on them. So, let's review the notation.
Remark
-
\(\Omega\) - the sample space
-
\(\emptyset\) - the impossible event
-
\(A \cap B\) - events \(A\) and \(B\) both occurred
-
\(A \cap B = \emptyset\) - events \(A\) and \(B\) are mutually exclusive
-
\(A \cup B\) - either \(A\) or \(B\) occurred
-
\(A^c = \Omega \setminus A\) - \(A\) did not happen
-
\(A \setminus B = A \cap B^c\) - \(A\) happened and \(B\) did not happen
-
\(A \subset B\) - event \(A \ne \emptyset\) leads to event \(B\)
For example, we can ask for the probability that 'the ball is either blue or green' (which would have been equivalent to 'ball is not red').
In words, an event is a statement that you can tell whether it is true or not, after seeing the outcome of the experiment.
In this case, all possible events are 'The ball is none of the three colours or any other colour' – mathematically, this will be denoted by the empty set \(\emptyset\), since it contains none of the possible outcomes.
'The ball is red' – denoted by \(\{red\}\)
'The ball is blue' – denoted by \(\{blue\}\)
'The ball is green' – denoted by \(\{green\}\)
'The ball is either red or blue' – denoted by \(\{red, blue\}\)
'The ball is either red or green' – denoted by \(\{red, green\}\)
'The ball is either blue or green' – denoted by \(\{blue, green\}\)
'The ball is any of red, blue or green' – denoted by \(\{red, blue, green\} = \Omega\).
From this exhaustive list, it is clear that all events are subset of \(\Omega\) and in fact, in this case at least, all subsets of \(\Omega\) are events.
Assume we have a fixed sample space \(\Omega\). We want to distinguish a family of events.
We call the collection of all events the event space, usually denoted by \(\mathcal{F}\).
A first choice is to take: \(2^\Omega\) = all subsets of \(\Omega\). This is a good choice when \(\Omega\) is at most countable set.
When \(|\Omega| > \aleph_0\), \(2^\Omega\) is too big and there are problems with defining probability on \(2^\Omega\). The set of all possible subsets of a given set is often denoted by \(\mathcal{P}(A)\) or \(2^\Omega\) and it is called the power set.
We will discuss these problems later. That is why we need to choose a smaller family. On the other hand, \(\mathcal{F}\) should be closed with respect to taking unions, intersections and complements.
Definition of \(\sigma\)-algebra
A family \(\mathcal{F}\) of subsets of \(\Omega\) is called a \(\sigma\)-algebra if:
-
\(\emptyset \in \mathcal{F}\)
-
\(A \in \mathcal{F} \implies A^c \in \mathcal{F}\)
-
\(A_1, A_2, \dots, A_n, \dots \in \mathcal{F} \implies \bigcup_{n=1}^\infty A_n \in \mathcal{F}\)
A pair \((\Omega, \mathcal{F})\) is called a measurable space.
Remark
For any \(\Omega\) the pair \((\Omega, 2^\Omega)\) is a measurable space. Also for any nonempty subset \(A \subset \Omega\) the smallest \(\sigma\)-algebra that contains \(A\) is \(\sigma(A) = \{\emptyset, A, A^c, \Omega\}\).
Fact
Let \((\Omega, \mathcal{F})\) be a measurable space. Then
-
\(A, B \in \mathcal{F} \implies A \setminus B \in \mathcal{F}\)
-
\(A_1, A_2, \dots, A_n, \dots \in \mathcal{F} \implies \bigcap_{n=1}^\infty A_n \in \mathcal{F}\)
Proof
-
\(A \setminus B = A \cap B^c = (A^c \cup B)^c\)
-
\(\bigcap_{k=1}^{\infty} A_k = (\bigcup_{k=1}^{\infty} A_k^c)^c\), but \(A_k^c \in \mathcal{F}\).
Definition of Borel \(\sigma\) -algebra of \(\mathbb{R}^d\)
Borel \(\sigma\) -algebra of \(\mathbb{R}^d\), \(\mathcal{B}(\mathbb{R}^d)\), is the smallest \(\sigma\)-algebra that contains all open subsets of \(\mathbb{R}^d\). Elements of \(\mathcal{B}(\mathbb{R}^d)\) are called Borel subsets.
Example
Let \(d=1\), \(\mathcal{B}(\mathbb{R})\) is the smallest \(\sigma\)-algebra that contains all open sets in \(\mathbb{R}\).
- Intervals \((a,b)\), where \(a,b \in \mathbb{R}\) are in \(\mathcal{B}(\mathbb{R})\),
- \([a,b) \in \mathcal{B}(\mathbb{R})\) as \((a,b] = \bigcap_{n=1}^{\infty} (a, b + \frac{1}{n})\),
- \([a,b) \in \mathcal{B}(\mathbb{R})\) as \([a,b) = \bigcap_{n=1}^{\infty} (a - \frac{1}{n}, b)\),
- \([a,b] \in \mathcal{B}(\mathbb{R})\) as \([a,b] = \bigcap_{n=1}^{\infty} (a - \frac{1}{n}, b + \frac{1}{n})\),
- \(\{a\} \in \mathcal{B}(\mathbb{R})\) as \(\{a\} = \bigcap_{n=1}^{\infty} (a - \frac{1}{n}, a + \frac{1}{n})\),
- \((-\infty, a) \in \mathcal{B}(\mathbb{R})\) (or \((-\infty, a) = \bigcup_{n=1}^{\infty} (-n, a)) \implies [a, \infty) \in \mathcal{B}(\mathbb{R})\),
- \((a, \infty) \in \mathcal{B}(\mathbb{R})\) (or \((a, \infty) = \bigcup_{n=1}^{\infty} (a, n)) \implies (-\infty, a] \in \mathcal{B}(\mathbb{R})\),
- \(\mathbb{Q} \in \mathcal{B}(\mathbb{R})\) as a countable union of points,
- \(\mathbb{R} \setminus \mathbb{Q} \in \mathcal{B}(\mathbb{R})\) as the complement of \(\mathbb{Q}\).
Actually, \(\mathcal{B}(\R)\) is like power set, it's very big. \(\R\in \mathcal{B}(\R)\) also \((\sqrt{\pi},\sqrt{\pi}+1)\in \mathcal{B}(\R)\)
Remark
Not every subset of \(\mathbb{R}^d\) is a Borel subset. An example of non-Borel set is a Vitali set \(V \subset [0,1]\).
Probability
We next define the notion of probability. First, however, in order to get some intuition we consider the frequency of an event. Assume that we can repeat an experiment \(n\) times. Each repetitions happens under the same conditions.
Frequency
We define a relative frequency of an event \(A \in \mathcal{F}\) in the series of \(n\) experiments by:
Proposition of frequency
When \(n\) is large we expect that \(f_n(A)\) is close to the chance \(A\) occurs in a single trial. We easily check that \(f_n\) takes values in \([0, 1]\) and
-
\(f_n(\Omega) = 1\)
-
If \(A_1, \dots, A_j\) are pairwise disjoint, then \(f_n\left(\bigcup_{k=1}^j A_k\right) = \sum_{k=1}^j f_n(A_k).\)
This is because \(\#\text{ experiments}\) in which \(\bigcup_{k=1}^j A_k\) happened equals\(\sum_{k=1}^{j} (\#\text{ experiments in which }A_{k} \text{ happened})\) and by definition
These are fundamental properties that a probability map should have. Are these sufficient for infinite probability spaces? Let us consider the following:
Example
Let \(\Omega = \mathbb{N}^* = \{1, 2, ...\}\) be the positive natural numbers and \(\mathcal{F} = \mathcal{P}(\Omega)\). Suppose that \(P(\{n\}) = \frac{1}{2^n}\), for every \(n \geq 1\). What would we expect the event \(\{2n|n \geq 1\}\) ('the outcome is an even number') to be?
Solution
\(P(\{2n|n \geq 1\}) = \sum_{n=1}^\infty P(\{2n\}) = \sum_{n=1}^\infty \frac{1}{2^{2n}} = \sum_{n=1}^\infty \frac{1}{4^n}\) (Note that the event \(\{n\}\) corresponds to 'the outcome is n').
Then \(\sum_{n=1}^\infty \frac{1}{4^n} = \frac{1}{4} \frac{1}{1 - \frac{1}{4}} = \frac{1}{4} \frac{1}{\frac{3}{4}} = \frac{1}{3}\)
The computation above cannot be justified, unless we extend the property of finite additivity to also hold for countable unions of disjoint events. Indeed, this is what we do!
Probability measure
Given a sample space \(\Omega\) and an event space \(\mathcal{F}\), a function \(P : \mathcal{F} \rightarrow \mathbb{R}\) is called a probability measure if it satisfies
- \(P(B) \in [0, 1]\) for every \(B \in \mathcal{F}\);
- \(P(\Omega) = 1\);
- (Countable additivity) For every \(A_n \in \mathcal{F}\), \(n > 1\) disjoint events (i.e. for all \(m, n >\) such that \(m \neq n\), \(A_m \cap A_n = \emptyset\))
$$ P\left(\bigcup_{n=1}^{\infty} A_{n}\right) = \sum_{n=1}^{\infty} P(A_{n}). $$
We now give the definition of an abstract probability space.
Probability space
A probability space is defined as the triplet \((\Omega, \mathcal{F}, P)\), where
- \(\Omega\) (the sample space) is the set of all possible outcomes of the experiment (we always assume that it is not empty);
- \(\mathcal{F}\) is an event space of subsets of \(\Omega\).
- \(P\) is a probability measure on \(\mathcal{F}\).
Theorem
Assume \((\Omega, \mathcal{F}, P)\) is a probability space, and \(A, B, A_1, ..., A_n, ... \in \mathcal{F}\), then
-
\(P(\emptyset) = 0\)
-
\(P(A^c) = 1 - P(A)\)
-
If \(A \subseteq B\), then \(P(B \setminus A) = P(B) - P(A)\) and \(P(B) \geq P(A)\)
-
\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
-
\(P(\bigcup_{k=1}^\infty A_k) \leq \sum_{k=1}^\infty P(A_k)\)
Proof.
-
\(\Omega \cup \emptyset = \Omega\), \(\Omega \cap \emptyset = \emptyset \implies 1 = P(\Omega) = P(\Omega) + P(\emptyset) \implies P(\emptyset) = 0\).
-
\(\Omega = A \cup A^c\) and \(A \cap A^c = \emptyset \implies 1 = P(\Omega) = P(A) + P(A^c) \implies P(A^c) = 1 - P(A)\).
-
\(B = (B \setminus A) \cup A\) and \((B \setminus A) \cap A = \emptyset \implies P(B) = P(B \setminus A) + P(A)\).
-
\(A \cup B = (A\setminus B)\cup(A\cap B)\cup (B\setminus A)=(A \setminus (A \cap B )) \cup (A \cap B) \cup (B \setminus (A \cap B) ).\)
\(P(A \cup B) = P(A \setminus (A \cap B)) + P(A \cap B) + P(B \setminus (A \cap B))\)
\(= (P(A) - P(A \cap B)) + P(A \cap B) + (P(B) - P(A \cap B))\) since \(A\cap B\subseteq A\) and \(A\cap B\subseteq B\) -
Let \(B_1 = A_1\), \(B_2 = A_2 \setminus B_1\), \(B_3 = A_3 \setminus (B_1 \cup B_2)\), ..., \(B_n = A_n \setminus \bigcup_{k=1}^{n-1} B_k\).
The \(B_k\)'s are mutually exclusive. \(\bigcup_{k=1}^{\infty} A_k = \bigcup_{k=1}^{\infty} B_k\), and \(B_n \subset A_n\), for all \(n\).
Thus \(P(\bigcup_{k=1}^{\infty} A_k) = P(\bigcup_{k=1}^{\infty} B_k) = \sum_{k=1}^{\infty} P(B_k) \le \sum_{k=1}^{\infty} P(A_k)\).
We can use countable additivity to compute the probability of a union of disjoint events. How can we compute the probability of any union of events? The following proposition gives us a way to do this.
Theorem (Inclusion-Exclusion Formula)
Assume \(A_1, A_2, \ldots, A_n \in \mathcal{F}\), then
\(= \sum_{i=1}^{n} P(A_{i}) - \sum_{i<j}P(A_{i} \cap A_{j}) + \sum_{i<j<k}P(A_{i} \cap A_{j} \cap A_{k}) - \cdots + (-1)^{n+1}P(A_{1} \cap A_{2} \cap \cdots \cap A _{n})\)
where the sum \(\sum_{1 \le i_1 < \cdots < i_k \le n}\) is to be interpreted as the sum going through all \(k\)-tuples \((i_1, \ldots, i_k)\) of numbers \(\{1, \ldots, n\}\) with no repetition (inequalities are strict).
This Formula uses concise notation and is not straightforward to interpret. To understand it better, let us consider some specific cases.
For \(n = 2\):
\(P\left[\bigcup_{k=1}^2 A_k\right] = \sum_{k=1}^2 (-1)^{k-1} \sum_{1 \le i_1 < \cdots < i_k \le 2} P(A_{i_1} \cap \cdots \cap A_{i_k})\)
\(= \sum_{1 \le i \le 2} P(A_i) - \sum_{1 \le i_1 < i_2 \le 2} P(A_{i_1} \cap A_{i_2})\) \(= P(A_1) + P(A_2) - P(A_1 \cap A_2).\)
For \(n = 3\):
\(P\left[\bigcup_{k=1}^3 A_k\right] = \sum_{k=1}^3 (-1)^{k-1} \sum_{1 \le i_1 < \cdots < i_k \le 3} P(A_{i_1} \cap \cdots \cap A_{i_k})\)
\(= \sum_{1 \le i \le 3} P(A_i) - \sum_{1 \le i_1 < i_2 \le 3} P(A_{i_1} \cap A_{i_2}) + \sum_{1 \le i_1 < i_2 < i_3 \le 3} P(A_{i_1} \cap A_{i_2} \cap A_{i_3})\)
\(= P(A_1) + P(A_2) + P(A_3) - P(A_1 \cap A_2) - P(A_1 \cap A_3)\) \(- P(A_2 \cap A_3) + P(A_1 \cap A_2 \cap A_3).\)
Proof. (by induction): By theorem we know that it is true for \(n=2\).
Assume it is true for \(n \ge 2\). We need to show that it is true for \(n+1\).
\(P(A_1 \cup A_2 \cup \cdots \cup A_n \cup A_{n+1}) =\) \(P((A_1 \cup A_2 \cup \cdots \cup A_n) \cup A_{n+1})\)
\(= P(A_1 \cup A_2 \cup \cdots \cup A_n) + P(A_{n+1}) - P((A_1 \cup A_2 \cup \cdots \cup A_n) \cap A_{n+1})\)
\(= P(A_1 \cup A_2 \cup \cdots \cup A_n) + P(A_{n+1}) - P((A_1 \cap A_{n+1}) \cup \cdots \cup (A_n \cap A_{n+1}))\).
Now apply the induction hypothesis to \(P(A_1 \cup \dots \cup A_n)\) and \(P((A_1 \cap A_{n+1}) \cup \dots \cup (A_n \cap A_{n+1}))\). The full expansion becomes:
\(P(A_1 \cup A_2 \cup \dots \cup A_n \cup A_{n+1}) = \sum_{k=1}^{n} (-1)^{k-1} \sum_{1 \le i_1 < \dots < i_k \le n} P(A_{i_1} \cap \dots \cap A_{i_k}) + P(A_{n+1})\)
\(- \sum_{k=1}^{n} (-1)^{k-1} \sum_{1 \le i_1 < \dots < i_k \le n} P(A_{i_1} \cap \dots \cap A_{i_k} \cap A_{n+1})\)
Combining these three formulas we get the result.
Example: Equiprobability
\(\Omega\) - a finite set, \(\mathcal{F} = 2^\Omega\), all elementary events have the same probability. Then for all \(\omega \in \Omega\) and for all \(A \subseteq \Omega\) (\(A \in \mathcal{F}\))
-
\(P(\{\omega\}) = \frac{1}{|\Omega|}\)
-
\(P(A) = \frac{|A|}{|\Omega|}\)
Since \(\forall \omega_1, \omega_2 \in \Omega\), \(P(\{\omega_1\}) = P(\{\omega_2\})\), let \(p \in [0, 1]\) such that \(p := P(\{\omega\}), \forall \omega \in \Omega\).
Since \(P\) is a probability measure, \(1 = P(\Omega) = \sum_{\omega \in \Omega}P(\{\omega\}) = \sum_{\omega \in \Omega} p=p \sum_{\omega \in \Omega}1 = p|\Omega|\).
Then \(p = \frac{1}{|\Omega|}\)
We see that for \(A \subseteq \Omega\) as \(P(A) = \sum_{\omega \in A} P(\{\omega\}) = \sum_{\omega \in A} p = p \sum_{\omega \in A} 1 = p|A| = \frac{|A|}{|\Omega|}\).
Then for any \(A \in \mathcal{F}\), \(P(A) = \frac{|A|}{|\Omega|}\)
Example
\(\Omega = \{\omega_1, \ldots, \omega_n, \ldots\}\) - a countable set. Let \(p_1, \ldots, p_n, \ldots\) - sequence of non-negative numbers s.t. \(\sum_{k=1}^{\infty} p_k = 1\). We can choose \(\mathcal{F} = 2^{\Omega}\) and \(P(\{\omega_i\}) = p_i\). This choice defines the probability space \((\Omega, \mathcal{F}, P)\), and for any \(A \in \mathcal{F}\) we have
where
Example: Uniform probability
We choose \(\Omega \subseteq \mathbb{R}^d\), that is \(\Omega\) is a Borel subset of \(\mathbb{R}^d\) and we assume that \(0 < |\Omega| < \infty\), where \(|\Omega| = \int_{\mathbb{R}^d} \mathbf{1}_\Omega\) is the Lebesgue measure\(^1\). Let \(\mathcal{F} = \mathcal{B}(\Omega)\) the smallest \(\sigma\)-algebra that contains all open subsets of \(\Omega\) and for \(A \in \mathcal{F} = \mathcal{B}(\Omega)\),
Then \((\Omega, \mathcal{F}, P)\) is a probability space. We use this probability space to describe experiments where a point(s) are randomly chosen from \(\Omega\) where \(|\Omega|\) is total area and \(|A|\) is chosen area.
\(^{1}\)This integral is actually the Lebesgue integral. For a Riemann integrable function, the Lebesgue integral is equal to the Riemann integral. We will always consider subsets of \(\mathbb{R}^d\) whose characteristic functions are Riemann integrable.