Probability Basics

Investment Theory
Lorenzo Naranjo

Fall 2024

Introduction

  • Investment theory aims to understand how to allocate resources to different assets whose future payoffs are uncertain.
  • To model the future uncertainty of prices and cash flows, we can rely on well-established mathematical concepts that summarize the expected rewards and risks of investing in a portfolio of financial assets.
  • In the following, there is only a finite number of future possibilities to simplify the mathematical exposition.
    • For example, the future price of a stock can go up, stay constant, or go down.
  • However, all the results presented in this chapter hold if we relax this assumption and allow for an infinite number of future outcomes.

Outcomes and Events

Sets

  • A set is a collection of objects.
  • The objects of a set can be anything you want.
    • For example, a set may contain numbers, letters, cars, or pictures.
  • In our case, we will be concerned of sets that contain future possibilities or outcomes that can occur with positive probability.
  • The notes contain examples of all the operations that we can perform with sets A and B such as
    • unions A \cup B and intersections A \cap B
    • differences A \setminus B and complements \overline{A}
    • power sets \mathcal{P}(A) and cartesian products A \times B
    • computing the number of elements of a set |A|

The Sample Space

  • In probability theory, a finite sample space is a non-empty finite set denoted by \Omega.
  • The sample space includes all possible outcomes that can occur.
  • A probability measure is a function that assigns to each element \omega of \Omega a number in [0, 1] so that \sum_{\omega \in \Omega} \operatorname{P}(\omega) = 1.

Events

  • An event A is a subset of \Omega, and we define the probability of that event occurring as \operatorname{P}(A) = \sum_{\omega \in A} \operatorname{P}(\omega).
  • Such a finite probability space is denoted by (\Omega, \operatorname{P}).
  • An immediate consequence of the previous expression is that \operatorname{P}(\Omega) = 1.

Disjoint Events

  • Furthermore, if A and B are disjoint sets of \Omega we have that \begin{aligned} \operatorname{P}(A \cup B) & = \sum_{\omega \in A \cup B} \operatorname{P}(\omega) \\ & = \sum_{\omega \in A} \operatorname{P}(\omega) + \sum_{\omega \in B} \operatorname{P}(\omega) \\ & = \operatorname{P}(A) + \operatorname{P}(B). \end{aligned}
  • If we denote by \overline{A} the complement of A in \Omega, the last expression implies that \operatorname{P}(A) + \operatorname{P}(\overline{A}) = 1.
  • Also, because \overline{\Omega} = \emptyset, we also have that \operatorname{P}(\Omega) + \operatorname{P}(\emptyset) = 1, or \operatorname{P}(\emptyset) = 0.

Example 1 If \Omega = \{ \omega_{1}, \omega_{2}, \omega_{3} \}, then \begin{aligned} \mathcal{P}(\Omega) & = \{ \emptyset, \{\omega_{1}\}, \{\omega_{2}\}, \{\omega_{3}\}, \{\omega_{1}, \omega_{2}\}, \{\omega_{2}, \omega_{3}\}, \{\omega_{1}, \omega_{3}\}, \{\omega_{1}, \omega_{2}, \omega_{3}\}\} \end{aligned} defines the collection of all possible events that we can measure. Note that the cardinality of \mathcal{P}(\Omega) grows exponentially with the size of \Omega.

The function \operatorname{P} such that \operatorname{P}(\omega_{1}) = 1/2, \operatorname{P}(\omega_{2}) = 1/4, and \operatorname{P}(\omega_{3}) = 1/4 defines a probability measure on \Omega.

We have, for example, that \operatorname{P}(\{\omega_{1}, \omega_{3}\}) = 1/2 + 1/4 = 3/4.

Random Variables

Definition

  • If (\Omega, \operatorname{P}) is a finite probability space, a random variable is a real-valued function defined on \Omega.

Example 2 Consider a sample space with four possible outcomes \Omega = \{ \omega_{1}, \omega_{2}, \omega_{3}, \omega_{4} \}. The table below describes the possible values of three random variables denoted by X, Y and Z.

Outcome X Y Z
\omega_{1} -10 20 15
\omega_{2} -5 10 -10
\omega_{3} 5 0 15
\omega_{4} 10 0 -10

Note that the information sets generated by each random variable are different.

Expectation and Variance

  • If X is a random variable defined on a finite probability space (\Omega, \operatorname{P}), the expectation (or expected value) of X is defined to be \operatorname{E}X = \sum_{\omega \in \Omega} X(\omega) \operatorname{P}(\omega), whereas the variance of X is \operatorname{V}(X) = \operatorname{E}(X - \operatorname{E}X)^{2}.
  • The standard deviation is the square-root of the variance, i.e., \sigma_{X} = \sqrt{\operatorname{V}(X)}.

Example 3 Consider the sample space \Omega = \{ \omega_{1}, \omega_{2}, \omega_{3} \} in which we define the probability measure \operatorname{P} such that \operatorname{P}(\omega_{1}) = 1/2, \operatorname{P}(\omega_{2}) = 1/4, and \operatorname{P}(\omega_{3}) = 1/4. There are two random variables X and Y that take values in \Omega according to the table below.

Outcome Probability X Y
\omega_{1} 1/2 10 2
\omega_{2} 1/4 8 40
\omega_{3} 1/4 4 20

Using this information, we can compute \operatorname{E}(X) = 8, \operatorname{E}(Y) = 16, \operatorname{V}(X) = 6, \operatorname{V}(Y) = 246. The standard deviations of X and Y are \sigma_{X} = \sqrt{6} \approx 2.45 and \sigma_{Y} = \sqrt{246} \approx 15.68, respectively.

Covariance

  • The covariance betweem two random variables X and Y defined on a probability space (\Omega, \operatorname{P}) is defined as \operatorname{Cov}(X, Y) = \operatorname{E}(X - \operatorname{E}X) (Y - \operatorname{E}Y), and their correlation is \rho_{X, Y} = \frac{\operatorname{Cov}(X, Y)}{\sigma_{X} \sigma_{Y}}.

Example 4 Continuing with Example 3, we have that \operatorname{Cov}(X, Y) = -18. Thus, \rho_{X, Y} \approx -0.47.

Some Properties of Covariance

  • The covariance of X and Y can also be expressed as \operatorname{Cov}(X, Y) = \operatorname{E}(X Y) - \operatorname{E}(X) \operatorname{E}(Y).
  • The covariance X with X is equal to its variace, \operatorname{Cov}(X, X) = \operatorname{V}(X).
  • The covariance is linear in each of its arguments \operatorname{Cov}(\alpha X, \beta Y) = \alpha \beta \operatorname{Cov}(X, Y).
  • The covariance of X with a constant is zero.

Some Prperties of Correlation

  • The correlation between any two random variables is always between -1 and 1.
  • The correlation of X and X is equal to 1, whereas the correlation of X and -X is -1.

Probability Mass Function

Definition

  • For discrete random variables, the probability mass function (or pmf) is a real-valued function that specifies the probability that the random variable X is equal to a certain value x, i.e., p_{X}(x) = \operatorname{P}(\omega \in \Omega : X(\omega) = x).

Example 5 Suppose we define a probability measure \operatorname{P} to the random variables X and Y defined in Example 2 according to the table below.

Outcome \operatorname{P} X Y
\omega_{1} 0.10 -10 20
\omega_{2} 0.30 -5 10
\omega_{3} 0.40 5 0
\omega_{4} 0.20 10 0

We have that the probability mass function of X and Y are \begin{aligned} p_{X}(x) = \begin{cases} 0.10 & \text{if } x = -10, \\ 0.30 & \text{if } x = -5, \\ 0.40 & \text{if } x = 5, \\ 0.20 & \text{if } x = 10. \end{cases} \end{aligned} \qquad \begin{aligned} p_{Y}(y) = \begin{cases} 0.60 & \text{if } y = 0, \\ 0.30 & \text{if } y = 10, \\ 0.10 & \text{if } y = 20. \end{cases} \end{aligned}

PMF Plot

  • It is sometimes easier to visualize the probability mass function by plotting the probability of different values of the random variable.
(a) The function p_{X}(x) defines the probability of X being equal to x = \{-10, -5, 5, 10\}.
(b) The function p_{Y}(y) defines the probability of Y being equal to y = \{0, 10, 20\}.
Figure 1: The figure plots the probability mass function of the random variables X and Y.

Some Comments

  • It is apparent from the pictures that p_{X}(x) = 0 if x \notin \{-10, -5, 5, 10\}.
  • Indeed, the set \{\omega \in \Omega : X(\omega) = x \} is empty for all x not equal to -10, -5, 5, or 10.
  • Similarly, p_{Y}(y) = 0 if y \notin \{0, 10, 20\}.
  • To simplify notation, we will often write \{X = x\} to denote the set \{\omega \in \Omega : X(\omega) = x\}.
  • Using this notation, we have that p_{X}(x) = \operatorname{P}(X = x).

Expectation

If a random variable is defined for m different values of x, we can re-write the expectation of a random variable as \operatorname{E}(X) = \sum_{i = 1}^{m} x_{i} p_X(x_{i}), \tag{1} which is commonly used in statistics.

Joint Probability Mass Function

  • For two random variables X and Y defined in (\Omega, \mathbb{P}), the set \{X = x, Y = y\} denotes all outcomes in \Omega that satisfy \{X = x\} and \{Y = y\}.
  • Therefore, we have that \{X = x, Y = y\} = \{X = x\} \cap \{Y = y\}.
  • The function p_{X, Y}(x, y) = \operatorname{P}(X = x, Y = y) is called the joint probability mass function of X and Y.

Example 6 The joint pmf of the random variables defined in Example 5 is given in the table below.

\small \begin{array}{c|cccc} X \setminus Y & 0 & 10 & 20 \\ \hline -10 & 0 & 0 & 0.1 \\ -5 & 0 & 0.3 & 0 \\ 5 & 0.4 & 0 & 0 \\ 10 & 0.2 & 0 & 0 \end{array} The function p_{X, Y}(x, y) has many zeros since in Example 5 there are only four outcomes. Any other outcome has probability zero of occurring.

Example 7 We can generate any joint pmf for two random variables as long as the sum of all probabilities is equal to one. The table below reports the joint probabilities of a random variable X taking values in [-1, 0, 1] and a random variable Y taking values in [0, 1, 2, 3]. \small \begin{array}{c|cccc} X \setminus Y & 0 & 1 & 2 & 3 \\ \hline -1 & 0.12500 & 0.09375 & 0.06250 & 0.03125 \\ 0 & 0.06250 & 0.12500 & 0.12500 & 0.06250 \\ 1 & 0.03125 & 0.06250 & 0.09375 & 0.12500 \end{array}

In this case the underlying probability space has at least 3 \times 4 = 12 possible outcomes.

Joint PMF Plot

  • To plot the joint pmf of two random variables we need a three dimensional graph.

Figure 2: The figure plots the joint probability mass function of X and Y in Example 7.

Covariance of Two Random Variables

  • We can use the joint pmf to compute the expectation of a function of two random variables. Indeed, we have that \operatorname{E}(f(X, Y)) = \sum_{x = 1}^{m} \sum_{j = 1}^{n} f(x_{i}, y_{j}) p_{X, Y}(x_{i}, y_{j}).
  • If we write \mu_{X} = \operatorname{E}(X) and \mu_{y} = \operatorname{E}(Y), the covariance of X and Y can be computed as \operatorname{Cov}(X, Y) = \sum_{x = 1}^{m} \sum_{j = 1}^{n} (x_{i} - \mu_{X})(y_{j} - \mu_{Y}) p_{X, Y}(x_{i}, y_{j}).

Partial Probability Mass Functions

  • The joint pmf contains all the information of X and Y since we can recover the individual pmfs of X and Y from it.
  • Indeed, we have that p_{X}(x) = \sum_{j = 1}^{n} p_{X, Y}(x, y_{j}) and p_{Y}(y) = \sum_{i = 1}^{m} p_{X, Y}(x_{i}, y).
  • It is important to note that the joint pmf not only contains the individual information of two random variables but also captures their mutual dependence.

Independence

Definition

  • We say that two events A and B are independent if \operatorname{P}(A \cap B) = \operatorname{P}(A) \operatorname{P}(B).
  • To understand the concept better, suppose that the weather tomorrow can be either sunny, fair or rainy, and that a certain stock can either go up or down in price.
  • We can define \begin{aligned} W & = \{\text{sunny}, \text{fair}, \text{rainy}\}, \\ S & = \{\text{up}, \text{down}\}. \end{aligned}
  • The set of outcomes can be described as all possible pairwise combinations of weather tomorrow and the stock price movement.

Weather Events

  • We can then define the weather events \begin{aligned} \text{Sunny} & = \{(\text{sunny}, \text{up}), (\text{sunny}, \text{down})\}, \\ \text{Fair} & = \{(\text{fair}, \text{up}), (\text{fair}, \text{down})\}, \\ \text{Rainy} & = \{(\text{rainy}, \text{up}), (\text{rainy}, \text{down})\}. \end{aligned}
  • The table below describes the probabilities for tomorrow’s weather.
Weather Sunny Fair Rainy
Probability 0.3 0.5 0.2

Stock Price Events

  • Similarly, the stock events can be defined as \begin{aligned} \text{Up} & = \{(\text{sunny}, \text{up}), (\text{fair}, \text{up}), (\text{rainy}, \text{up})\}, \\ \text{Down} & = \{(\text{sunny}, \text{down}), (\text{fair}, \text{down}), (\text{rainy}, \text{down})\}. \\ \end{aligned}
  • The probabilities of the stock price going up or down are described in the table below.
Stock Up Down
Probability 0.6 0.4

Weather and Stock Price Movements Should be Independent

  • If the weather does not affect the likelihood of the stock going up or down, we should expect to see on sunny days 60% of the time the stock going up and 40% of those days the stock going down.
  • That is, if the weather tomorrow and the stock price movement are independent events, we should expect \operatorname{P}(\text{Stock} \cap \text{Weather}) = \operatorname{P}(\text{Stock}) \operatorname{P}(\text{Weather}), where \text{Stock} is either \text{Up} or \text{Down}, and \text{Weather} is either \text{Sunny}, \text{Fair}, or \text{Rainy}.

Joint Probabilities Consistent with Independence

  • The table below describes the combined probabilities of the stock price movement and the weather tomorrow that are consistent with the independence of those events.
Stock\Weather Sunny Fair Rain
Up 0.18 0.30 0.12
Down 0.12 0.20 0.08

Independence of Random Variables

  • Two random variables X and Y are independent if the events \{X = x\} and \{Y = y\} are independent.
  • Thus, if X and Y are independent we have that \operatorname{P}(X = x, Y = y) = \operatorname{P}(X = x) \operatorname{P}(Y = y), or equivalently p_{X, Y}(x, y) = p_{X}(x) p_{Y}(y).
  • An important consequence of independence is that if X and Y are two independent random variables, then \operatorname{E}(XY) = \operatorname{E}(X) \operatorname{E}(Y).

Independence and Zero Covariance

  • If X and Y are independent, their covariance is equal to zero.
  • Indeed, \begin{aligned} \operatorname{Cov}(X, Y) & = \operatorname{E}(XY) - \operatorname{E}(X) \operatorname{E}(Y) \\ & = \operatorname{E}(X) \operatorname{E}(Y) - \operatorname{E}(X) \operatorname{E}(Y) \\ & = 0. \end{aligned}
  • The opposite statement is not true.

Example 8 Consider two random variables X and Y defined in the table below.

Outcome \operatorname{P} X Y
\omega_{1} 0.40 -1 0
\omega_{2} 0.30 1 1
\omega_{3} 0.30 1 -1

We have that \operatorname{E}(X) = 0.2, \operatorname{E}(Y) = 0, and \operatorname{E}(XY) = 0. Therefore, \operatorname{Cov}(X, Y) = 0 - 0.2 \times 0 = 0, which shows that X and Y are uncorrelated.

However, the two random variables are not independent. If we know that X = -1 then we know that Y = 0. Similarly, learning that Y = 1 tells us that X = 1.

Linear Combinations

Linear Combinations of Two Random Variables

  • In investment theory, we usually study linear combinations of random variables of the form Z = \alpha X + \beta Y.
  • The expectation of Z is just a linear combination of the expectations of X and Y, \operatorname{E}Z = \alpha \operatorname{E}X + \beta \operatorname{E}Y.
  • The variance of Z, though, includes not only the variances of X and Y but also their covariances, \operatorname{V}(Z) = \alpha^{2} \operatorname{V}(X) + \beta^{2} \operatorname{V}(Y) + 2 \alpha \beta \operatorname{Cov}(X, Y).
  • This is an important result which is at the heart of portfolio diversification.

General Case

  • More generally, consider the random variables X_{1}, X_{2}, \ldots, X_{n}, and form a new random variable X such that X = \alpha_{1} X_{1} + \alpha_{2} X_{2} + \ldots + \alpha_{n} X_{n}, where \alpha_{i} \in \mathbb{R} for all i \in \{1, 2, \ldots, n\}.
  • The expectation of X is a linear combination of the of the expectations of X_{1}, X_{2}, \ldots, X_{n}.
  • The variance of X, though, takes into account of all covariances between X_{i} and X_{j}, for i, j = 1, 2, \ldots, n.
  • Indeed, we have that \operatorname{V}(X) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} \alpha_{i} \alpha_{j} \operatorname{Cov}(X_{i}, X_{j}).

Independence Again

  • If the random variables X_{1}, X_{2}, \ldots, X_{n} are independent from each other, we have that \operatorname{Cov}(X_{i}, X_{j}) = 0 for all i \neq j.
  • Recognizing that \operatorname{Cov}(X_{i}, X_{i}) = \operatorname{V}(X_{i}), the variance of X = \alpha_{1} X_{1} + \alpha_{2} X_{2} + \ldots + \alpha_{n} X_{n}, is \operatorname{V}(X) = \sum_{i = 1}^{n} \alpha_{i}^{2} \operatorname{V}(X_{i}).

Example 9 Suppose that X_{1}, X_{2}, \ldots, X_{n} are independent random variables with the same variance denoted by \sigma^{2}. Define X to be the sum of these random variables so that X = X_{1} + X_{2} + \ldots X_{n}. Thus, \operatorname{V}(X) = \sum_{i = 1}^{n} \operatorname{V}(X_{i}) = n \sigma^{2}. This is the result that we use in finance to annualize the variance computed using monthly or daily stock returns.