Probability Basics

This chapter introduces basic probability concepts crucial to understanding the modern investment theory approach. Investment theory aims to understand how to allocate resources to different assets whose future payoffs are uncertain. To model the future uncertainty of prices and cash flows, we can rely on well-established mathematical concepts that summarize the expected rewards and risks of investing in a portfolio of financial assets.

In the following, there is only a finite number of future possibilities to simplify the mathematical exposition. For example, the future price of a stock can go up, stay constant, or go down. However, all the results presented in this chapter hold if we relax this assumption and allow for an infinite number of future outcomes.

Sets

A set is a collection of objects. The objects of a set can be anything you want. For example, a set may contain numbers, letters, cars, or pictures. In our case, we will be concerned of sets that contain future possibilities or outcomes that can occur.

One way to define a set is to enumerate its elements. For example, the set of all integers from 1 to 10 is \[ A = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}. \] Once we have defined a set, we can answer if an object is an element of the set or not. For example, the number 3 is an element of \(A\) whereas the number 20 is not. We use the symbol \(\in\) to denote membership of a set and \(\notin\) to denote the contrary. Therefore, we have that \(3 \in A\) and \(20 \notin A.\)

Some sets can have an infinite number of elements. For example, the natural numbers are defined as \[ \mathbb{N} = \{0, 1, 2, 3, \ldots\}, \] where the triple dots mean that if \(n\) is in \(\mathbb{N},\) then \(n+1\) is also in \(\mathbb{N}.\)

Since all elements of \(A\) are also members of \(\mathbb{N},\) we say that \(A\) is a subset of \(\mathbb{N}\) and write it as \(A \subset \mathbb{N}.\) Using this terminology, we can redefine the set \(A\) defined above in a more Pythonic way: \[ A = \{ n \in \mathbb{N} : n < 11 \}. \] If we are studying sets of natural numbers, it makes sense to define the universe to be \(\mathbb{N}\) and sets under study will be subsets of the the universe.

Now, define the set \(B\) as \[ B = \{6, 7, 8, 9, 10, 11, 12, 13, 14, 15\}. \]

The intersection between \(A\) and \(B\) is the set denoted \(A \cap B\) whose members are both in \(A\) and \(B.\) Using the sets defined above, we have that \[ A \cap B = \{6, 7, 8, 9, 10\}. \] The union of the sets \(A\) and \(B\) is the set denoted \(A \cup B\) whose members are either in \(A,\) \(B,\) or both. Thus, using our previously defined sets we have that \[ A \cup B = \{1, 2, 3, \ldots , 14, 15\}. \] The set difference of \(A\) and \(B\) is the set denoted \(A \setminus B\) whose members are in \(A\) but are not members of \(B.\) Thus, \[ A \setminus B = \{1, 2, 3, 4, 5\} \] and \[ B \setminus A = \{11, 12, 13, 14, 15\}. \] The complement of \(A\) is the set denoted by \(\overline{A}\) whose members are not in \(A.\) Of course this statement only makes sense if we define a universe where the elements not in \(A\) can live. If the universe is \(\mathbb{N},\) then \[ \overline{A} = \mathbb{N} \setminus A = \{11, 12, 13, \ldots\}. \] Similarly, \[ \overline{B} = \{0, 1, 2, 3, 4, 5\} \cup \{21, 22, 23, \ldots\}. \] Note that if you take all the elements of \(A\) out of \(A\) you end up with an empty set, that is \(A \setminus A = \{\}.\) We typically denote the empty set by \(\emptyset,\) but is good to keep in mind that \(\emptyset = \{\}.\) In our universe of natural numbers, no natural number is a member of the empty set. We can write this formally as \(n \notin \emptyset,\) \(\forall n \in \mathbb{N}.\) Thus, the empty set is a subset of any subset of \(\mathbb{N}.\)

The cardinality of the set \(A,\) denoted by \(|A|,\) counts the number of elements in \(A.\) We then have that \(|A| = |B| = 10\) and \(|C| = 3.\) The empty set has cardinality 0 whereas the cardinality of \(\mathbb{N}\) is denoted \(\aleph_{0}.\)

The power set of a set \(C,\) denoted by \(\mathcal{P}(C),\) is the set containing all possible subsets of \(C.\) For example, if \(C = \{1, 2, 3\},\) then \[ \mathcal{P}(C) = \{\{\}, \{1\}, \{2\}, \{3\}, \{1, 2\}, \{2, 3\}, \{1, 3\}, \{1, 2, 3\}\}. \] Clearly, the power sets of \(A\) and \(B\) are much bigger. For a given set \(A,\) the cardinality of its power set is \(2^{|A|}.\) Therefore, \(\mathcal{P}(A)\) and \(\mathcal{P}(B)\) contain each \(2^{10} = 1024\) different sets.

Finally, the cartesian product of \(A\) and \(B\) is the set denoted by \(A \times B\) whose members are all the pairwise combinations of the elements of \(A\) and \(B.\) \[ \begin{array}{c|cccc} A \times B & 6 & 7 & \dots & 15 \\ \hline 1 & (1, 6) & (1, 7) & \dots & (1, 15) \\ 2 & (2, 6) & (2, 7) & \dots & (2, 15) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 10 & (10, 6) & (10, 7) & \dots & (10, 15) \end{array} \]

The cardinality of \(A \times B\) is equal to the product of the cardinalities of \(A\) and \(B,\) i.e., \(|A \times B| = |A| \times |B|.\)

Outcomes and Events

In probability theory, a finite sample space is a non-empty finite set denoted by \(\Omega.\) The sample space includes all possible outcomes that can occur. A probability measure is a function that assigns to each element \(\omega\) of \(\Omega\) a number in \([0, 1]\) so that \[ \sum_{\omega \in \Omega} \prob(\omega) = 1. \] An event \(A\) is a subset of \(\Omega,\) and we define the probability of that event occuring as \[ \prob(A) = \sum_{\omega \in A} \prob(\omega). \tag{1}\] Such a finite probability space is denoted by \((\Omega, \prob).\)

An immediate consequence of Eq. 1 is that \(\prob(\Omega) = 1.\) Furtermore, if \(A\) and \(B\) are disjoint sets of \(\Omega\) we have that \[ \begin{aligned} \prob(A \cup B) & = \sum_{\omega \in A \cup B} \prob(\omega) \\ & = \sum_{\omega \in A} \prob(\omega) + \sum_{\omega \in B} \prob(\omega) \\ & = \prob(A) + \prob(B). \end{aligned} \] If we denote by \(\overline{A}\) the complement of \(A\) in \(\Omega,\) the last expression implies that \(\prob(A) + \prob(\overline{A}) = 1.\) Also, because \(\overline{\Omega} = \emptyset,\) we also have that \(\prob(\Omega) + \prob(\emptyset) = 1,\) or \(\prob(\emptyset) = 0.\)

Example 1 If \(\Omega = \{ \omega_{1}, \omega_{2}, \omega_{3} \},\) then \[ \begin{aligned} \mathcal{P}(\Omega) & = \{ \emptyset, \{\omega_{1}\}, \{\omega_{2}\}, \{\omega_{3}\}, \{\omega_{1}, \omega_{2}\}, \{\omega_{2}, \omega_{3}\}, \{\omega_{1}, \omega_{3}\}, \{\omega_{1}, \omega_{2}, \omega_{3}\}\} \end{aligned} \] defines the collection of all possible events that we can measure. As we saw previously, the cardinality of \(\mathcal{P}(\Omega)\) grows exponentially with the size of \(\Omega.\)

The function \(\prob\) such that \(\prob(\omega_{1}) = 1/2,\) \(\prob(\omega_{2}) = 1/4,\) and \(\prob(\omega_{3}) = 1/4\) defines a probability measure on \(\Omega.\)

We have, for example, that \[ \prob(\{\omega_{1}, \omega_{3}\}) = 1/2 + 1/4 = 3/4. \]

Random Variables

Definition

If \((\Omega, \prob)\) is a finite probability space, a random variable is a real-valued function defined on \(\Omega.\)

Example 2 Consider a sample space with four possible outcomes \(\Omega = \{ \omega_{1}, \omega_{2}, \omega_{3}, \omega_{4} \}.\) The table below describes the possible values of three random variables denoted by \(X,\) \(Y\) and \(Z.\)

Outcome	\(X\)	\(Y\)	\(Z\)
\(\omega_{1}\)	-10	20	15
\(\omega_{2}\)	-5	10	-10
\(\omega_{3}\)	5	0	15
\(\omega_{4}\)	10	0	-10

Observing the values of \(X\) provides perfect information about which event happened. For example, if \(X = 5\) then we know that \(\omega_{3}\) occured.

Knowing the values of \(Y\) or \(Z,\) on the other hand, does not provide the same amount of information. If we learn that \(Y = 0\) we only know that either \(\omega_{3}\) or \(\omega_{4}\) occurred. If we denote by \(\mathcal{F}_{Y}\) the set of events that can be generated by \(Y,\) we have that \[ \mathcal{F}_{Y} = \{ \emptyset, \{\omega_{1}\}, \{\omega_{2}\}, \{\omega_{1}, \omega_{2}\}, \{\omega_{3}, \omega_{4}\}, \{\omega_{1}, \omega_{3}, \omega_{4}\}, \{\omega_{2}, \omega_{3}, \omega_{4}\}, \Omega\}. \] The information set provided by \(Z\) is even smaller, since \[ \mathcal{F}_{Z} = \{ \emptyset, \{\omega_{1}, \omega_{3}\}, \{\omega_{2}, \omega_{4}\}, \Omega\}. \]

Expectation and Variance

If \(X\) is a random variable defined on a finite probability space \((\Omega, \prob),\) the expectation (or expected value) of \(X\) is defined to be \[ \ev X = \sum_{\omega \in \Omega} X(\omega) \prob(\omega), \] whereas the variance of \(X\) is \[ \var(X) = \ev (X - \ev X)^{2}. \] The standard deviation is the square-root of the variance, i.e., \(\sigma_{X} = \sqrt{\var(X)}.\)

Example 3 Consider the sample space \(\Omega = \{ \omega_{1}, \omega_{2}, \omega_{3} \}\) in which we define the probability measure \(\prob\) such that \(\prob(\omega_{1}) = 1/2,\) \(\prob(\omega_{2}) = 1/4,\) and \(\prob(\omega_{3}) = 1/4.\) There are two random variables \(X\) and \(Y\) that take values in \(\Omega\) according to the table below.

Outcome	Probability	\(X\)	\(Y\)
\(\omega_{1}\)	1/2	10	2
\(\omega_{2}\)	1/4	8	40
\(\omega_{3}\)	1/4	4	20

Using this information, we can compute the expectation of each random variable.

\[ \begin{aligned} \ev X & = \frac{1}{2} \times 10 + \frac{1}{4} \times 8 + \frac{1}{4} \times 4 = 8, \\ \ev Y & = \frac{1}{2} \times 2 + \frac{1}{4} \times 40 + \frac{1}{4} \times 20 = 16. \end{aligned} \] Having computed the expectations of \(X\) and \(Y\), we can compute their variances as \[ \begin{aligned} \var(X) & = \frac{1}{2} \times (10 - 8)^{2} + \frac{1}{4} \times (8 - 8)^{2} + \frac{1}{4} \times (4 - 8)^{2} = 6, \\ \var(Y) & = \frac{1}{2} \times (2 - 16)^{2} + \frac{1}{4} \times (40 - 16)^2 + \frac{1}{4} \times (20 - 16)^2 = 246. \end{aligned} \] Finally, the standard deviations of \(X\) and \(Y\) are \(\sigma_{X} = \sqrt{6} \approx 2.45\) and \(\sigma_{Y} = \sqrt{246} \approx 15.68,\) respectively.

Covariance

The covariance betweem two random variables \(X\) and \(Y\) defined on a probability space \((\Omega, \prob)\) is defined as \[ \cov(X, Y) = \ev (X - \ev X) (Y - \ev Y), \] and their correlation is \[ \rho_{X, Y} = \frac{\cov(X, Y)}{\sigma_{X} \sigma_{Y}}. \] The correlation between any two random variables is always between -1 and 1.

Proof

Let \(\sigma_{X}\) and \(\sigma_{Y}\) denote the standard deviations of \(X\) and \(Y,\) respectively. We can then compute \[ \begin{aligned} \ev ((X - \ev X) \sigma_{Y} + (Y - \ev Y) \sigma_{X})^{2} & = (\sigma_{X}^{2} \sigma_{Y}^{2} + 2 \sigma_{X} \sigma_{Y} \cov(X, Y) + \sigma_{Y}^{2} \sigma_{X}^{2}) \\ & = 2 \sigma_{X} \sigma_{Y} (\sigma_{X} \sigma_{Y} + \cov(X, Y)), \end{aligned} \] which implies \(\sigma_{X} \sigma_{Y} + \cov(X, Y) \geq 0\) or \(- \sigma_{X} \sigma_{Y} \leq \cov(X, Y).\)

Similarly, \[ \begin{aligned} \ev ((X - \ev X) \sigma_{Y} - (Y - \ev Y) \sigma_{X})^{2} & = (\sigma_{X}^{2} \sigma_{Y}^{2} - 2 \sigma_{X} \sigma_{Y} \cov(X, Y) + \sigma_{Y}^{2} \sigma_{X}^{2}) \\ & = 2 \sigma_{X} \sigma_{Y} (\sigma_{X} \sigma_{Y} - \cov(X, Y)), \end{aligned} \] which implies \(\sigma_{X} \sigma_{Y} - \cov(X, Y) \geq 0\) or \(\cov(X, Y) \leq \sigma_{X} \sigma_{Y}.\)

Thus, we conclude that \[ -1 \leq \frac{\cov(X, Y)}{\sigma_{X} \sigma_{Y}} \leq 1. \]

Example 4 Continuing with Example 3, we have that \[ \cov(X, Y) = \frac{1}{2} \times (10 - 8)(2 - 16) + \frac{1}{4} (8 - 8)(40 - 16) + \frac{1}{4} (4 - 8)(20 - 16) = -18. \] Thus, \(\rho_{X, Y} \approx -0.47.\)

The covariance of \(X\) and \(Y\) can also be expressed as \[ \cov(X, Y) = \ev(X Y) - \ev(X) \ev(Y). \]

Proof

\[ \begin{aligned} \cov(X, Y) & = \ev(X - \ev(X))(Y - \ev(Y)) \\ & = \ev[X (Y - \ev(Y))] - \ev[\ev(X) (Y - \ev(Y))] \\ & = \ev(XY) - \ev[X \ev(Y)] - \ev(X) \ev(Y - \ev(Y)) \\ & = \ev(XY) - \ev(X) \ev(Y). \\ \end{aligned} \]

Probability Mass Function

For discrete random variables, the probability mass function (or pmf) is a real-valued function that specifies the probability that the random variable \(X\) is equal to a certain value \(x,\) i.e., \[ p_{X}(x) = \prob(\omega \in \Omega : X(\omega) = x). \]

Example 5 Suppose we define a probability measure \(\prob\) to the random variables \(X\) and \(Y\) defined in Example 2 according to the table below.

Outcome	\(\prob\)	\(X\)	\(Y\)
\(\omega_{1}\)	0.10	-10	20
\(\omega_{2}\)	0.30	-5	10
\(\omega_{3}\)	0.40	5	0
\(\omega_{4}\)	0.20	10	0

We have that the probability mass function of \(X\) is \[ p_{X}(x) = \begin{cases} 0.10 & \text{if } x = -10, \\ 0.30 & \text{if } x = -5, \\ 0.40 & \text{if } x = 5, \\ 0.20 & \text{if } x = 10. \end{cases} \]

The probability mass function of \(Y\) only loads on three different values for \(Y.\) \[ p_{Y}(y) = \begin{cases} 0.60 & \text{if } y = 0, \\ 0.30 & \text{if } y = 10, \\ 0.10 & \text{if } y = 20. \end{cases} \]

It is sometimes easier to visualize the probability mass function by plotting the probability of different values of the random variable.

(a) The function \(p_{X}(x)\) defines the probability of \(X\) being equal to \(x = \{-10, -5, 5, 10\}.\)

It is apparent from the pictures that \(p_{X}(x) = 0\) if \(x \notin \{-10, -5, 5, 10\}.\) Indeed, the set \(\{\omega \in \Omega : X(\omega) = x \}\) is empty for all \(x\) not equal to \(-10,\) \(-5,\) \(5,\) or \(10.\) Similarly, \(p_{Y}(y) = 0\) if \(y \notin \{0, 10, 20\}.\)

To simplify notation, we will often write \(\{X = x\}\) to denote the set \(\{\omega \in \Omega : X(\omega) = x\}.\) Using this notation, we have that \(p_{X}(x) = \prob(X = x).\)

If a random variable is defined for \(m\) different values of \(x,\) we can re-write the expectation of a random variable as \[ \ev(X) = \sum_{i = 1}^{m} x_{i} p_X(x_{i}), \tag{2}\] which is commonly used in statistics.

For two random variables \(X\) and \(Y\) defined in \((\Omega, \mathbb{P}),\) the set \(\{X = x, Y = y\}\) denotes all outcomes in \(\Omega\) that satisfy \(\{X = x\}\) and \(\{Y = y\}.\) Therefore, we have that \[ \{X = x, Y = y\} = \{X = x\} \cap \{Y = y\}. \] The function \[ p_{X, Y}(x, y) = \prob(X = x, Y = y) \] is called the joint probability mass function of \(X\) and \(Y.\)

Example 6 The joint pmf of the random variables defined in Example 5 is given in the table below.

\[ \begin{array}{c|cccc} X \setminus Y & 0 & 10 & 20 \\ \hline -10 & 0 & 0 & 0.1 \\ -5 & 0 & 0.3 & 0 \\ 5 & 0.4 & 0 & 0 \\ 10 & 0.2 & 0 & 0 \end{array} \] The function \(p_{X, Y}(x, y)\) has many zeros since in Example 5 there are only four outcomes. Any other outcome then has probability zero of occuring.

Example 7 We can generate any joint pmf for two random variables as long as the sum of all probabilities is equal to one. The table below reports the joint probabilities of a random variable \(X\) taking values in \([-1, 0, 1]\) and a random variable \(Y\) taking values in \([0, 1, 2, 3].\) \[ \begin{array}{c|cccc} X \setminus Y & 0 & 1 & 2 & 3 \\ \hline -1 & 0.12500 & 0.09375 & 0.06250 & 0.03125 \\ 0 & 0.06250 & 0.12500 & 0.12500 & 0.06250 \\ 1 & 0.03125 & 0.06250 & 0.09375 & 0.12500 \end{array} \]

In this case the underlying probability space has at least \(3 \times 4 = 12\) possible outcomes. The figure below plots the joint pmf of \(X\) and \(Y.\)

To plot the joint pmf of two random variables we need a three dimensional graph.

Figure 2: The figure plots the joint probability mass function of \(X\) and \(Y\) in Example 7.

We can use the joint pmf to compute the expectation of a function of two random variables. Indeed, we have that \[ \ev(f(X, Y)) = \sum_{x = 1}^{m} \sum_{j = 1}^{n} f(x_{i}, y_{j}) p_{X, Y}(x_{i}, y_{j}). \tag{3}\] If we write \(\mu_{X} = \ev(X)\) and \(\mu_{y} = \ev(Y),\) Eq. 3 implies that the covariance of \(X\) and \(Y\) can be computed as \[ \cov(X, Y) = \sum_{x = 1}^{m} \sum_{j = 1}^{n} (x_{i} - \mu_{X})(y_{j} - \mu_{Y}) p_{X, Y}(x_{i}, y_{j}). \] The joint pmf contains all the information of \(X\) and \(Y\) since we can recover the individual pmfs of \(X\) and \(Y\) from it. Indeed, we have that \[ p_{X}(x) = \sum_{j = 1}^{n} p_{X, Y}(x, y_{j}) \] and \[ p_{Y}(y) = \sum_{i = 1}^{m} p_{X, Y}(x_{i}, y). \] It is important to note that the joint pmf not only contains the individual information of two random variables but also captures their mutual dependence.

Independence

We say that two events \(A\) and \(B\) are independent if \(\prob(A \cap B) = \prob(A) \prob(B).\)

Example 8 Suppose that the weather tomorrow can be either sunny, fair or rainy. In addition, a certain stock tomorrow can either go up or down in price.

We can define \[ W = \{\text{sunny}, \text{fair}, \text{rainy}\} \] and \[ S = \{\text{up}, \text{down}\}. \] The set of outcomes can be described as all possible pairwise combinations of weather tomorrow and the stock price movement. The sample space \(\Omega\) is then the cartesian product of \(W\) and \(S,\) i.e., \(\Omega = W \times S.\)

We can then define the weather events \[ \begin{aligned} \text{Sunny} & = \{(\text{sunny}, \text{up}), (\text{sunny}, \text{down})\}, \\ \text{Fair} & = \{(\text{fair}, \text{up}), (\text{fair}, \text{down})\}, \\ \text{Rainy} & = \{(\text{rainy}, \text{up}), (\text{rainy}, \text{down})\}. \end{aligned} \]

The table below describes the probabilities for tomorrow’s weather.

Weather	Sunny	Fair	Rainy
Probability	0.3	0.5	0.2

Similarly, the stock events can be defined as \[ \begin{aligned} \text{Up} & = \{(\text{sunny}, \text{up}), (\text{fair}, \text{up}), (\text{rainy}, \text{up})\}, \\ \text{Down} & = \{(\text{sunny}, \text{down}), (\text{fair}, \text{down}), (\text{rainy}, \text{down})\}. \\ \end{aligned} \]

The probabilities of the stock price going up or down are described in the table below.

Stock	Up	Down
Probability	0.6	0.4

If the weather does not affect the likelihood of the stock going up or down, we should expect to see on sunny days 60% of the time the stock going up and 40% of those days the stock going down.

That is, if the weather tomorrow and the stock price movement are independent events, we should expect \[ \prob(\text{Stock} \cap \text{Weather}) = \prob(\text{Stock}) \prob(\text{Weather}), \] where \(\text{Stock}\) is either \(\text{Up}\) or \(\text{Down},\) and \(\text{Weather}\) is either \(\text{Sunny},\) \(\text{Fair},\) or \(\text{Rainy}.\)

The table below decribes the combined probabilities of the stock price movement and the weather tomorrow that are consistent with the indepence of those events.

Stock\Weather	Sunny	Fair	Rain
Up	0.18	0.30	0.12
Down	0.12	0.20	0.08

The previous example shows how to generate indepent events out of two finite probability spaces \((\Omega_{1}, \prob_{1})\) and \((\Omega_{2}, \prob_{2}).\) If we define \(\Omega = \Omega_{1} \times \Omega_{2}\) and let \(\prob(\omega_{1}, \omega_{2}) = \prob_{1}(\omega_{1}) \prob_{2}(\omega_{2})\) for each \(\omega_{1} \in \Omega_{1}\) and \(\omega_{2} \in \Omega_{2},\) the pair \((\Omega, \prob)\) is a well-defined finite probability space. In this new probability space, the events \(A = \omega_{1} \times \Omega_{2}\) and \(B = \Omega_{1} \times \omega_{2}\) are independent for any \(\omega_{1} \in \Omega_{1}\) and \(\omega_{2} \in \Omega_{2}.\)

Proof

We have that \[ \begin{aligned} \prob(A) & = \sum_{\omega_{2} \in \Omega_{2}} \prob(\omega_{1}, \omega_{2}) \\ & = \sum_{\omega_{2} \in \Omega_{2}} \prob_{1}(\omega_{1}) \prob_{2}(\omega_{2}) \\ & = \prob_{1}(\omega_{1}) \sum_{\omega_{2} \in \Omega_{2}} \prob_{2}(\omega_{2}) \\ & = \prob_{1}(\omega_{1}). \end{aligned} \] Similarly, \(\prob(B) = \prob_{2}(w_{2}).\) Since \(A \cap B = \{(\omega_{1}, \omega_{2})\},\) we have that \(\prob(A \cap B) = \prob(A) \prob(B).\)

Example 9 The sample space \(\Omega\) is always independent from any event \(A \subset \Omega\) since \(\prob(A \cap \Omega) = \prob(A) = \prob(A) \prob(\Omega).\) Intuitively, an outcome always happens independently of whether \(A\) happens or not.

Two random variables \(X\) and \(Y\) are independent if the events \(\{X = x\}\) and \(\{Y = y\}\) are independendent. Thus, if \(X\) and \(Y\) are independent we have that \[ \prob(X = x, Y = y) = \prob(X = x) \prob(Y = y), \] or equivalently \[ p_{X, Y}(x, y) = p_{X}(x) p_{Y}(y). \]

An important consequence of independence is that if \(X\) and \(Y\) are two independent random variables, then \[ \ev(XY) = \ev(X) \ev(Y). \tag{4}\]

Proof

If the domains of \(X\) and \(Y\) are \(\{x_{1}, x_{2}, \ldots, x_{m}\}\) and \(\{y_{1}, y_{2}, \ldots, y_{n}\},\) respectively, we can then write \[ \begin{aligned} \ev(X) & = \sum_{i = 1}^{m} x_{i} p_{X}(x_{i}), \\ \ev(Y) & = \sum_{j = 1}^{n} y_{j} p_{Y}(y_{j}). \end{aligned} \] Thus, \[ \begin{aligned} \ev(X) \ev(Y) & = \left( \sum_{i = 1}^{m} x_{i} p_{X}(x_{i}) \right) \left( \sum_{j = 1}^{n} y_{j} p_{Y}(y_{j}) \right) \\ & = \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i} y_{j} p_{X}(x_{i}) p_{Y}(y_{j}) \\ & = \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i} y_{j} p_{X, Y}(x_{i}, y_{j}) \\ & = \ev(X Y). \end{aligned} \]

Eq. 4 implies that if \(X\) and \(Y\) are independent, their covariance is equal to zero.

Proof

\[ \begin{aligned} \cov(X, Y) & = \ev(XY) - \ev(X) \ev(Y) \\ & = \ev(X) \ev(Y) - \ev(X) \ev(Y) \\ & = 0. \end{aligned} \]

The opposite statement is not true.

Example 10 Consider two random variables \(X\) and \(Y\) defined in the table below.

Outcome	\(\prob\)	\(X\)	\(Y\)
\(\omega_{1}\)	0.40	-1	0
\(\omega_{2}\)	0.30	1	1
\(\omega_{3}\)	0.30	1	-1

We have that \[ \begin{aligned} \ev(X) & = 0.4 \times (-1) + 0.6 \times 1 = 0.2, \\ \ev(Y) & = 0.4 \times 0 + 0.3 \times 1 + 0.3 \times (-1) = 0, \\ \ev(XY) & = 0.4 \times (-1) \times 0 + 0.3 \times 1 \times 1 + 0.3 \times 1 \times (-1) = 0. \end{aligned} \] Therefore, \(\cov(X, Y) = 0 - 0.2 \times 0 = 0,\) which shows that \(X\) and \(Y\) are uncorrelated.

However, the two random variables are not independent. If we know that \(X = -1\) then we know that \(Y = 0.\) Similarly, learning that \(Y = 1\) tells us that \(X = 1.\)

Linear Combinations

In investment theory, we usually study linear combinations of random variables of the form \(Z = \alpha X + \beta Y.\) The expectation of \(Z\) is just a linear combination of the expectations of \(X\) and \(Y,\) \[ \ev Z = \alpha \ev X + \beta \ev Y. \tag{5}\] The variance of \(Z,\) though, includes not only the variances of \(X\) and \(Y\) but also their covariances, \[ \var(Z) = \alpha^{2} \var(X) + \beta^{2} \var(Y) + 2 \alpha \beta \cov(X, Y). \tag{6}\] This is an important result which is at the heart of portfolio diversification.

Proof

The expectation of \(Z\) is computed as \[ \begin{aligned} \ev Z & = \ev (\alpha X + \beta Y) \\ & = \sum_{\omega \in \Omega} (\alpha X(\omega) + \beta Y(\omega)) \prob(\omega) \\ & = \alpha \sum_{\omega \in \Omega} X(\omega) \prob(\omega) + \beta \sum_{\omega \in \Omega} Y(\omega) \prob(\omega) \\ & = \alpha \ev X + \beta \ev Y. \end{aligned} \]

The variance of \(Z\) is computed as \[ \begin{aligned} \var(Z) & = \var(\alpha X + \beta Y) \\ & = \ev (\alpha X + \beta Y - \ev (\alpha X + \beta Y))^{2} \\ & = \ev (\alpha (X - \ev X) + \beta (Y - \ev Y))^{2} \\ & = \ev (\alpha^{2} (X - \ev X)^{2} + \beta^{2} (Y - \ev Y)^{2} + 2 \alpha \beta (X - \ev X)(Y - \ev Y)) \\ & = \alpha^{2} \ev (X - \ev X)^{2} + \beta^{2} \ev (Y - \ev Y)^{2} + 2 \alpha \beta \ev (X - \ev X)(Y - \ev Y) \\ & = \alpha^{2} \var(X) + \beta^{2} \var(Y) + 2 \alpha \beta \cov(X, Y). \end{aligned} \]

More generally, consider the random variables \(X_{1}, X_{2}, \ldots, X_{n},\) and form a new random variable \(X\) such that \[ X = \alpha_{1} X_{1} + \alpha_{2} X_{2} + \ldots + \alpha_{n} X_{n}, \] where \(\alpha_{i} \in \mathbb{R}\) for all \(i \in \{1, 2, \ldots, n\}.\)

The expectation of \(X\) is a linear combination of the of the expectations of \(X_{1}, X_{2}, \ldots, X_{n}.\) The variance of \(X,\) though, takes into account of all covariances between \(X_{i}\) and \(X_{j},\) for \(i, j = 1, 2, \ldots, n.\) Indeed, we have that \[ \var(X) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} \alpha_{i} \alpha_{j} \cov(X_{i}, X_{j}). \tag{7}\] The previous expression can be simplified if the random variables \(X_{1}, X_{2}, \ldots, X_{n}\) are independent from each other. In such case, we have that \(\cov(X_{i}, X_{j}) = 0\) for all \(i \neq j.\) Recognizing that \(\cov(X_{i}, X_{i}) = \var(X_{i}),\) Eq. 7 implies that \[ \var(X) = \sum_{i = 1}^{n} \alpha_{i}^{2} \var(X_{i}). \tag{8}\]

Example 11 Suppose that \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables with the same variance denoted by \(\sigma^{2}.\) Define \(X\) to be the sum of these random variables so that \[ X = X_{1} + X_{2} + \ldots X_{n}. \] Eq. 8 implies that \[ \var(X) = \sum_{i = 1}^{n} \var(X_{i}) = n \sigma^{2}. \]