11. Statistics Preliminaries#

11.1. The Normal Distribution#

Figure made with TikZ

Fig. 11.1 The figure shows the density function of a normally distributed random variable with mean \(\mu\) and standard deviation \(\sigma.\)

We say that a real-valued random variable (RV) \(X\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\) if its probability density function (PDF) is:

\[\begin{equation*} f(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x - \mu)^{2}}{2 \sigma^{2}}} \end{equation*}\]

and we usually write \(X \sim \Normal(\mu, \sigma^{2}).\) The parameters \(\mu\) and \(\sigma\) are related to the first and second moments of \(X.\)

Property 11.1 (Moments of the Normal Distribution)

The parameter \(\mu\) is the mean or expectation of \(X\) while \(\sigma\) denote its standard deviation. The variance of \(X\) is given by \(\sigma^{2}.\)

As with any real-valued random variable \(X,\) in order to compute the probability that \(X \leq x\) we need to integrate the density function from \(-\infty\) to \(x \colon\)

\[\begin{equation*} \prob(X \leq x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(u - \mu)^{2}}{2 \sigma^{2}}} du. \end{equation*}\]

The function \(F(x) = \prob(X \leq x)\) is called the cumulative distribution function of \(X\). The Leibniz integral rule implies that \(F^{\prime}(x) = f(x).\)

11.1.1. The Standard Normal Distribution#

Figure made with TikZ

Fig. 11.2 The blue shaded area represents \(\cdf(z).\)

An important case of normally distributed random variables is when \(\mu = 0\) and \(\sigma = 1\). In this case we say that \(Z \sim \Normal(0, 1)\) has the standard normal distribution and its cumulative distribution function is usually denoted by the capital Greek letter \(\Phi\) (phi), and is defined by the integral:

\[\begin{equation*} \cdf(z) = \prob(Z \leq z) = \int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^{2}}{2}} \, dx. \end{equation*}\]

Since the integral cannot be solved in closed-form, the probability must then be obtained from a table or using a computer. For example, in Python we can compute \(\cdf(-0.4)\) by typing the following:

from scipy.stats import norm
norm.cdf(-0.4)
0.3445782583896758

If you prefer to use Excel, you need to type in a cell =norm.s.dist(-0.4,TRUE), which yields the same answer.

11.1.2. Left-Tail Probability#

Knowing how to compute or approximate \(\cdf(z)\) allows us to compute \(\prob(X \leq x)\) when \(X \sim \Normal(\mu, \sigma^{2})\) since \(Z = \frac{X - \mu}{\sigma} \sim \Normal(0, 1) \colon\)

\[\begin{align*} \prob(X \leq x) & = \prob\left( \frac{X - \mu}{\sigma} \leq \frac{x - \mu}{\sigma} \right) \\ & = \prob\left( Z \leq \frac{x - \mu}{\sigma} \right) \\ & = \cdf\left(\frac{x - \mu}{\sigma}\right) \end{align*}\]

where \(Z = \dfrac{X - \mu}{\sigma} \sim \Normal(0, 1)\) is called a Z-score.

Example 11.1

Suppose that \(X \sim \Normal(\mu, \sigma^{2})\) with \(\mu = 10\) and \(\sigma = 25.\) What is the probability that \(X \leq 0\)?

\[\begin{align*} \prob(X \leq 0) & = \prob\left( Z \leq \tfrac{0 - 10}{25} \right) \\ & = \cdf(-0.40) \\ & = 0.3446. \end{align*}\]

11.1.3. Right-Tail Probability#

Figure made with TikZ

Fig. 11.3 The right-tail probability is the probability of the whole distribution, which is one, minus the left-tail probability.

For a random variable \(X,\) the right-tail probability is defined as \(\prob(X > x).\) Since \(\prob(X \leq x) + \prob(X > x) = 1,\) we have that:

\[\begin{equation*} \prob(X > x) = 1 - \prob(X \leq x). \end{equation*}\]

Example 11.2

Suppose that \(X \sim \Normal(\mu, \sigma^{2})\) with \(\mu = 10\) and \(\sigma = 25\). What is the probability that \(X > 12\)?

\[\begin{align*} \prob(X \leq 12) & = \prob\left( Z \leq \tfrac{12 - 10}{25} \right) \\ & = \cdf(0.08) \\ & = 0.5319. \end{align*}\]

Therefore, \(\prob(X > 12) = 1 - 0.5319 = 0.4681.\)

11.1.4. Interval Probability#

Figure made with TikZ

Fig. 11.4 If you subtract the area to the left of \(x_{1}\) to the area that is to the left of \(x_{2}\) you obtain the probability of \(x_{1} < X \leq x_{2}.\)

The probability that a random variable \(X\) falls within an interval \((X_{1}, X_{2}]\) is given by \(\prob(x_{1} < X \leq x_{2}) = \prob(X \leq x_{2}) - \prob(X \leq x_{1}).\)

Example 11.3

Suppose that \(X \sim \Normal(\mu, \sigma^{2})\) with \(\mu = 10\) and \(\sigma = 25\). What is the probability that \(2 < X \leq 14\)?

\[\begin{align*} \prob(X \leq 14) & = \prob\left( Z \leq \tfrac{14 - 10}{25} \right) \\ & = \cdf(0.16) \\ & = 0.5636, \\ \prob(X \leq 2) & = \prob\left( Z \leq \tfrac{2 - 10}{25} \right) \\ & = \cdf(-0.32) \\ & = 0.3745. \end{align*}\]

Therefore, \(\prob(2 < X \leq 14) = 0.5636 - 0.3745 = 0.1891\).

11.1.5. Percentiles#

Figure made with TikZ

Fig. 11.5 The right-tail percentile is the value \(z_{\alpha}\) that gives an area to the right equal to \(\alpha\).

For a standard normal variable \(Z\), a right-tail percentile is the value \(z_{\alpha}\) above which we obtain a certain probability \(\alpha.\) Mathematically, this means finding \(z_{\alpha}\) such that:

\[ \prob(Z > z_{\alpha}) = \alpha \Leftrightarrow \prob(Z \leq z_{\alpha}) = 1 - \alpha. \]

This implies that \(\cdf(z_{\alpha}) = 1 - \alpha\), or \(z_{\alpha} = \cdf^{-1}(1 - \alpha)\), where \(\cdf^{-1}(\cdot)\) denotes the inverse function of \(\cdf(\cdot)\). Again, there is no closed-form expression for this function and we need a computer to obtain the values. For example, say that \(\alpha = 0.025\). In Python we could compute \(z_{\alpha} = \cdf^{-1}(0.975)\) by using the function ppf included in scipy.stats.norm as follows:

from scipy.stats import norm
norm.ppf(0.975)
1.959963984540054

In Excel the function =norm.s.inv(0.975) provides the same result.

The following table shows common values for \(z_{\alpha}\):

\(\boldsymbol{\alpha}\)

\(\boldsymbol{z_{\alpha}}\)

0.050

1.64

0.025

1.96

0.010

2.33

0.005

2.58

Figure made with TikZ

Fig. 11.6 The areas on each side are both equal to \(\alpha/2.\)

A \((1 - \alpha)\) two-sided confidence interval (CI) defines left and right percentiles such that the probability on each side is \(\alpha/2\). For a standard normal variable \(Z\), the symmetry of its pdf implies:

\[\begin{equation*} \prob(Z \leq -z_{\alpha/2}) = \prob(Z > z_{\alpha/2}) = \alpha/2 \end{equation*}\]

Example 11.4

Since \(z_{2.5\%} = 1.96\), the 95% confidence interval of \(Z\) is \([-1.96, 1.96]\). This means that if we randomly sample this variable 100,000 times, approximately 95,000 observations will fall inside this interval.

If \(X \sim \Normal(\mu, \sigma^{2})\), its confidence interval is determined by \(\xi\) and \(\zeta\) such that:

\[\begin{align*} & \prob(X \leq \xi) = \alpha / 2 \\ & \hspace{0.3in} \Rightarrow \prob(Z \leq \tfrac{\xi - \mu}{\sigma}) = \alpha/2, \\ & \prob(X > \zeta) = \alpha / 2 \\ & \hspace{0.3in} \Rightarrow \prob(Z > \tfrac{\zeta - \mu}{\sigma}) = \alpha/2, \end{align*}\]

which implies that \(-z_{\alpha/2} = \tfrac{\xi - \mu}{\sigma}\) and \(z_{\alpha/2} = \tfrac{\zeta - \mu}{\sigma}\).The \((1 - \alpha)\) confidence interval for \(X\) is then \([\mu - z_{\alpha/2}\sigma, \mu + z_{\alpha/2}\sigma]\).

Example 11.5

Suppose that \(X \sim \Normal(\mu, \sigma^{2})\) with \(\mu = 10\) and \(\sigma = 25\). Since \(z_{2.5\%} = 1.96\), the 95% confidence interval of \(X\) is:

\[\begin{equation*} [10-1.96(25), 10+1.96(25)] = [-39, 59]. \end{equation*}\]

11.2. The Lognormal Distribution#

If \(X \sim \Normal(\mu, \sigma^{2})\), then \(Y = e^{X}\) is said to be lognormally distributed with the same parameters. The pdf of a lognormally distributed random variable \(Y\) can be obtained from the pdf of \(X\).

Figure made with TikZ

Fig. 11.7 The figure shows the difference between a normal and a lognormal PDF with the same parameters.

Property 11.2 (Lognormal Density)

If \(Y\) is lognormally distributed with parameters \(\mu\) and \(\sigma^{2}\), the PDF of \(Y\) is given by:

\[\begin{equation*} f(y) = \frac{1}{y \sqrt{2 \pi \sigma^{2}}} e^{-\frac{(\ln(y) - \mu)^{2}}{2 \sigma^{2}}}. \end{equation*}\]

Unlike the normal density, the lognormal density function is not symmetric around its mean. Normally distributed variables can take values in \((-\infty, \infty)\), whereas lognormally distributed variables are always positive.

11.2.1. Computing Probabilities#

We can use the fact that the logarithm of a lognormal random variable is normally distributed to compute cumulative probabilities.

Example 11.6

Let \(Y = e^{4 + 1.5 Z}\) where \(Z \sim \Normal(0, 1)\). What is the probability that \(Y \leq 100\)?

\[\begin{align*} \prob(Y \leq 100) & = \prob(e^{X} \leq 100) \\ & = \prob(X \leq \ln(100)) \\ & = \prob\left(Z \leq \tfrac{\ln(100) - 4}{1.5}\right) \\ & = \cdf(0.4034) \\ & = 0.6567 \end{align*}\]

11.2.2. Confidence Interval#

Let \(Y = e^{\mu + \sigma Z}\) where \(Z \sim \Normal(0, 1)\). We have that:

\[\begin{align*} & -z_{\alpha/2} < Z \leq z_{\alpha/2} \\ & \hspace{0.4in} \Rightarrow \mu - \sigma z_{\alpha/2} < \mu + \sigma Z \leq \mu + \sigma z_{\alpha/2} \\ & \hspace{0.4in} \Rightarrow e^{\mu - \sigma z_{\alpha/2}} < e^{\mu + \sigma Z} \leq e^{\mu + \sigma z_{\alpha/2}} \end{align*}\]

The \((1 - \alpha)\) confidence interval for \(Y\) is \([e^{\mu - \sigma z_{\alpha/2}}, e^{\mu + \sigma z_{\alpha/2}}]\).

Example 11.7

Let \(Y = e^{4 + 1.5 Z}\) where \(Z \sim \Normal(0, 1)\). The 95% confidence interval for \(Y\) is:

\[ [e^{4 - 1.96(1.5)}, e^{4 + 1.96(1.5)}] = [2.89, 1032.71]. \]

11.2.3. Moments#

Property 11.3 (Moments of a Lognormal Distribution)

Let \(Y = e^{\mu + \sigma Z}\) where \(Z \sim \Normal(0, 1)\). We have that:

\[\begin{align*} \ev(Y) & = e^{\mu + 0.5 \sigma^{2}} \\ \var(Y) & = e^{2\mu + \sigma^{2}} (e^{\sigma^{2}} - 1) \\ \stdev(Y) & = \ev(Y) \sqrt{e^{\sigma^{2}} - 1} \end{align*}\]

Example 11.8

Let \(Y = e^{4 + 1.5 Z}\) where \(Z \sim \Normal(0, 1)\). The expectation and standard deviation of \(Y\) are:

\[\begin{align*} \ev(Y) & = e^{4 + 0.5(1.5^{2})} = 168.17 \\ \stdev(Y) & = 168.17 \sqrt{e^{1.5^{2}} - 1} = 489.95 \end{align*}\]

11.2.4. Partial Expectations#

When pricing a call option, the payoff is positive if the option is in-the-money and zero otherwise. We usually use an indicator function to quantify this behavior:

\[\begin{equation*} \1{Y > K} = \begin{cases} 0 & \text{if $Y \leq K$} \\ 1 & \text{if $Y > K$} \end{cases} \end{equation*}\]

Property 11.4 (Partial Expectations)

Let \(Y = e^{X}\) where \(X \sim \Normal(\mu, \sigma^{2})\). Then we have that:

\[\begin{align*} \ev\left(Y \1{Y > K}\right) & = e^{\mu + \frac{1}{2}\sigma^{2}} \cdf\left(\frac{\mu + \sigma^{2} - \ln(K)}{\sigma}\right) \\ \ev\left(K \1{Y > K}\right) & = K \cdf\left(\frac{\mu - \ln(K)}{\sigma}\right) \end{align*}\]

11.3. Practice Problems#

Exercise 11.1

Suppose that \(X\) is a normally distributed random variable with mean \(\mu=12\) and standard deviation \(\sigma=20\).

  1. What is the probability that \(X \leq 0\)?

  2. What is the probability that \(X \leq -4\)?

  3. What is the probability that \(X > 8\)?

  4. What is the probability that \(4 < X \leq 10\)?

Exercise 11.2

Suppose that \(X\) is a normally distributed random variable with mean \(\mu=10\) and standard deviation \(\sigma=20\). Compute the

  1. 90%,

  2. 95%, and

  3. 99%

confidence interval for \(X\).

Exercise 11.3

Suppose that \(X=\ln(Y)\) is a normally distributed random variable with mean \(\mu=3.9\) and standard deviation \(\sigma=15\).

  1. What is the probability that \(Y \leq 6\)?

  2. What is the probability that \(Y > 4\)?

  3. What is the probability that \(3 < Y \leq 12\)?

  4. What is the probability that \(Y \leq 0\)?

Exercise 11.4

Suppose that \(X=\ln(Y)\) is a normally distributed random variable with mean \(\mu=2.7\) and standard deviation \(\sigma=1\). Compute the

  1. 90%,

  2. 95%, and

  3. 99%

confidence interval for \(X\) and report the corresponding values for \(Y\).

Exercise 11.5

Let \(Y = e^{\mu + \sigma Z}\) where \(\mu = 1\), \(\sigma = 2\) and \(Z \sim \Normal(0, 1)\). Compute:

  1. \(\ev(Y)\)

  2. \(\stdev(Y) = \sqrt{\ev(Y^{2}) - \ev(Y)^{2}}\)

  3. \(\ev(Y^{0.3})\)

  4. \(\ev(Y^{-1})\)