The Index Model

Investment Theory

Lorenzo Naranjo

Fall 2024

Introduction

Markowitz analysis requires estimates of all covariances and expected returns.
Estimating the covariance between two assets is not hard, and can be done with precision, but estimating the pairwise covariances of five hundred different assets is a large number!
- Indeed, for five hundred stocks we would have to estimate 500 \times 499 / 2 = 124{,}750 different covariances.
One way to circumvent this problem is to realize that stock returns exhibit a factor structure driven by the market.
The index model is like the CAPM but makes an additional assumption.
- All firm-specific risks are not only uncorrelated with the market but also they are uncorrelated with each other.

The Model

The single index model is a linear regression between the excess returns of a stock and the excess returns of the market portfolio.
Let’s denote by r_{i} the return of stock i over a given period, and define R_{i} = r_{i} - r_{f} as the excess return over the risk-free asset.
Similarly, denote by R_{m} the excess return of the market over the risk-free rate.
The index model postulates R_{i} = \alpha_{i} + \beta_{i} R_{m} + e_{i}, \tag{1} where \operatorname{E}(e_{i}) = 0 and \operatorname{Cov}(R_{m}, e_{i}) = 0.

The Additional Assumption

In the single index model, the beta with the market captures the exposure of all securities to systematic risk, which is the risk shared by all securities.
Therefore, the error term is idiosyncratic.
So far, everything looks like the CAPM.
The single index model also assumes that \operatorname{Cov}(e_{i}, e_{j}) = 0 for any two securities i and j whose returns are not perfectly correlated.
- This might not be true in practice.
- Many large firms might be exposed to an earthquake unrelated to the market, making their idiosyncratic risks correlated.

Variance Decomposition

The assumptions of the single index model imply that the variance of R_{i} can be split into two parts: \begin{aligned} \sigma_{i}^{2} & = \operatorname{V}(\alpha_{i} + \beta_{i} R_{m} + e_{i}) \\ & = \beta_{i}^{2} \operatorname{V}(R_{M}) + \operatorname{V}(e_{i}) + 2 \beta_{m} \operatorname{Cov}(R_{m}, e_{i}) \\ & = \beta_{i}^{2} \sigma_{M}^{2} + \sigma^{2}(e_{i}). \end{aligned} \tag{2}
The first component of \sigma_{i}^{2} is the systematic variance, which depends on the beta of the security but also the variance of the market.
The second term of the variance of i is typically computed as the difference between the variance of i and the systematic variance.

Example 1 (Variance Decomposition) Suppose that you have the following regression for stock A: R_{A} = \alpha_{A} + \beta_{A} R_{M} + e_{A}, where \alpha_{A} = 0.02, \beta_{A} = 1.2, \sigma(e_{A}) = 30\% and \sigma_{M} = 25\%. The variance of A can be computed as follows: \sigma_{A}^{2} = 1.2^{2} \times 0.25^{2} + 0.30^{2} = 0.09 + 0.09 = 0.18. In the previous expression, the systematic and idiosyncratic variances are the same. The standard deviation of A is then \sigma_{A} = \sqrt{0.18} = 42.43\%.

Computing the Beta

The beta of the model can be estimated from the covariance of R_{i} and R_{m}, and the variance of R_{m}.
Indeed, we have that \begin{aligned} \operatorname{Cov}(R_{i}, R_{m}) & = \operatorname{Cov}(\beta_{i} R_{m}, R_{m}) + \operatorname{Cov}(e_{i}, R_{m}) \\ & = \beta_{i} \operatorname{V}(R_{m}), \end{aligned} where we use the fact that \operatorname{Cov}(R_{m}, R_{m}) = \operatorname{V}(R_{m}).
Thus, in the index model we must have that \beta_{i} = \frac{\operatorname{Cov}(R_{i}, R_{m})}{\operatorname{V}(R_{m})}, \tag{3} just like in the CAPM.

The Security Characteristic Line (SCL)

The alpha of the security can then be computed as \alpha_{i} = \operatorname{E}(R_{i}) - \beta_{i} \operatorname{E}(R_{M}). \tag{4}
The line y = \alpha_{i} + \beta_{i} x is called the security characteristic line (SCL) of security i.
- If we plot this line using a line chart, \alpha_{i} is the intercept and \beta_{i} is the slope coefficient of the line.

Understanding the Alpha

In the single index model, if the alpha is different from zero then there is a potential arbitrage opportunity.
- Portfolio L: Collect all securities with positive alpha and form a portfolio.
- Portfolio S: Collect all securities with negative alpha and form a portfolio.
If the number of securities in each portfolio is large, then the variance of the firm-specific risk goes to zero.
Make the beta of each portfolio the same, say equal to one, by investing or borrowing in the risk-free asset.
By going long L and short S, we have a zero-cost portfolio with no risk that earns a positive return!

R-Squared

Following the statistic literature, the proportion of systematic variance to total variance is called the R-squared of security i and can be expressed as \text{R-squared} = \frac{\beta_{i}^{2} \sigma_{M}^{2}}{\sigma_{i}^{2}} = 1 - \frac{\sigma^{2}(e_{i})}{\sigma_{i}^{2}}. \tag{5}
Therefore, the R-squared can also be expressed as one minus the proportion of idiosyncratic variance to total variance.
Since the single index model aims to decompose the total variance of a security into two orthogonal components, the R-squared gives us the proportions of this decomposition.

Example 2 (Decomposing the Variance) You run the regression R_{B} = \alpha_{B} + \beta_{B} R_{M} + e_{B}. Your regression package reports that \alpha_{B} = -0.01, \beta_{B} = 0.8 and the R-squared is 0.4. If the volatility of the market is 25% per year, the systematic variance is 0.8^{2} \times 0.25^{2} = 0.04. Since 40% of the variance is systematic, we have that \sigma_{B}^{2} = \frac{0.04}{0.4} = 0.10, which implies that \sigma_{B} = \sqrt{0.10} = 31.62\% per year. We also know that 60% of the variance is firm-specific, which means that \sigma^{2}(e_{B}) = 0.6 \times 0.10 = 0.06, so that \sigma(e_{B}) = \sqrt{0.06} = 24.49\% per year.

R-Squared is the Squared Correlation

Equation (5) can also be expresed in term of the correlation between R_{i} and R_{M}. \beta_{i} = \frac{\operatorname{Cov}(R_{i}, R_{M})}{\operatorname{V}(R_{M})} = \frac{\sigma_{i} \sigma_{M} \rho_{i, M}}{\sigma_{M}^{2}} = \frac{\sigma_{i} \rho_{i, M}}{\sigma_{M}}, \tag{6} implies \text{R-squared} = \frac{\beta_{i}^{2} \sigma_{M}^{2}}{\sigma_{i}^{2}} = \frac{\frac{\sigma_{i}^{2} \rho_{i, M}^{2}}{\sigma_{M}^{2}} \sigma_{M}^{2}}{\sigma_{i}^{2}} = \rho_{i, M}^{2}. \tag{7}
The R-squared of a regression of R_{i} on R_{M} is just the square of the correlation between R_{i} and R_{M}.
- We typically use the greek letter rho (\rho)for correlation, which corresponds to the latin letter r.

An Example with Real Data

As an example, let’s analyze the monthly returns of Microsoft (Ticker: MSFT) from June 2014 until June 2024. All data comes from Yahoo Finance.
As a proxy for the risk-free rate, I use the 13-week Treasury Bill CBOE Index (Ticker ^IRX).
The rate is expressed per year, so I convert it to a monthly rate by: r_{\text{monthly}} = (1 + r_{\text{annual}})^{1/12} - 1.
To proxy for the market, I use the SPDR S&P 500 ETF Trust (Ticker: SPY), which allows me to include the dividend distribution of the stocks forming the S&P 500.
The monthly returns are computed using the adjusted price series to obtain a holding period return that includes dividends.

Descriptive Statistics

The table presents descriptive statistics of Microsoft and S&P 500 monthly returns using data from June 2014 until June 2024.

Ticker	MSFT	SPY
Mean (%)	2.196	0.976
St. Dev. (%)	6.239	4.405

The table shows that during the period, the monthly returns of Microsoft are more volatile than those of the S&P 500, but so is the average excess returns of the two series.

Plotting the Data

The figure plots the excess monthly returns of the market portfolio, proxied by SPY vs. the excess monthly returns of MSFT from June-1993 until June-2024.

Clearly, the points cluster around the SCL, and we can see that the range of returns of Microsoft is significantly wider than the S&P 500.

Estimating the Regression

We can use ordinary least squares (OLS) to estimate the slope coefficient and the intercept of the SCL.
Many statistical packages allow to do this. The results below are computed using the Python library statsmodels.

Regression Results

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   MSFT   R-squared:                       0.488
Model:                            OLS   Adj. R-squared:                  0.484
Method:                 Least Squares   F-statistic:                     112.5
Date:                Mon, 14 Oct 2024   Prob (F-statistic):           7.30e-19
Time:                        21:50:46   Log-Likelihood:                 203.31
No. Observations:                 120   AIC:                            -402.6
Df Residuals:                     118   BIC:                            -397.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0123      0.004      2.934      0.004       0.004       0.021
SPY            0.9894      0.093     10.605      0.000       0.805       1.174
==============================================================================
Omnibus:                       10.257   Durbin-Watson:                   2.269
Prob(Omnibus):                  0.006   Jarque-Bera (JB):               14.540
Skew:                           0.432   Prob(JB):                     0.000696
Kurtosis:                       4.470   Cond. No.                         22.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Interpreting the Results

The table presents many numbers but for our purposes we can focus on just a few of them.
First, under the column coef we can see that the estimate for the intercept is 0.0123 whereas the beta estimate for MSFT is 0.9894.
Since the beta of Microsoft is close to one, we can conclude that Microsoft carries almost the same systematic risk as the market.
As a check, we can re-compute the R-squared of the regression using the beta and variance of MSFT, and the variance of the market: \text{R-squared} = \frac{0.9894^{2} \times 4.405^{2}}{6.239^{2}} = 0.488.

More Stocks

The figure plots the excess monthly returns of the market portfolio labeled as RMRF and proxied by SPY, vs. the excess monthly returns of Citigroup (C), BlackRock (BLK), Nvidia (NVDA), and Tesla (TSLA) labeled RETRF, from June-1993 until June-2024.

Regression Estimates

The table presents linear regression estimates for Citigroup (C), BlackRock (BLK), Nvidia (NVDA), and Tesla (TSLA).
- The independent variable is the market portfolio proxied by SPY.
- Data from June-2014 until June-2024.

	Alpha		Beta	R-Squared
	Estimate	P-value
BLK	-0.0025	0.511	1.390	0.697
C	-0.0075	0.206	1.543	0.542
NVDA	0.0380	0.000	1.807	0.348
TSLA	0.0160	0.282	1.830	0.207

Analysis

TSLA has the lowest R-squared, whereas almost 70% of BlackRock’s variance is explained by the market.
We see that both Nvidia and Tesla load on significant systematic risk.
- Cash flows are exposed to how the economy does.
Citigroup also has a high beta, which is typical of financial firms.
BlackRock has the lowest beta of the four stocks, although higher than the beta we estimated for Microsoft.
Nvidia is the only stock that has a positive alpha statistically different from zero at the 5% significance level.
- The alpha of Nvidia is indeed impressive.
- It has out-performed the market by 3.86% per month during the last 10 years.

Correlations

Finally, the table below shows the pairwise correlations between each stock and the market.

	Correlation	Correlation Squared
BLK	0.835	0.697
C	0.736	0.542
NVDA	0.590	0.348
TSLA	0.455	0.207

As expected, the square of the correlation corresponds to the R-squared reported in the regression table.

Implications of the Model

When you run a regression of the excess returns of a security on the excess returns of the market, the residuals are automatically orthogonal to the regressor.
- In equation (1) we must have that \operatorname{Cov}(R_{m}, e) = 0 for all securities.
The crucial assumption of the single index model is that the market captures all the depence between securities.
The implication of this assumption is that the covariance of the residuals between two securities is zero as long as their excess returns are not perfectly correlated.

No Perfect Correlation

We saw before that if A and B are perfectly correlated, we must have that R_{A} = w R_{B} for some w \neq 0.
- If this was the case, the covariance between e_{A} and e_{B} is not zero even though they are different assets.
- More precisely, we have that \operatorname{Cov}(e_{A}, e_{P}) = \operatorname{Cov}(e_{A}, w e_{A}) = w \sigma^{2}(e_{A}) \neq 0, provided that w \neq 0, i.e., you do not invest everything in the risk-free asset.
In the following, when we talk about two different assets it is implicitly assumed that their excess returns are not perfectly correlated, unless stated otherwise.

Covariance Structure

The covariance of the excess returns between securities i and j is \begin{aligned} \operatorname{Cov}(R_{i}, R_{j}) & = \operatorname{Cov}(\alpha_{i} + \beta_{i} R_{M} + e_{i}, \alpha_{j} + \beta_{j} R_{M} + e_{j}) \\ & = \beta_{i} \beta_{j} \operatorname{Cov}(R_{M}, R_{M}) \\ & = \beta_{i} \beta_{j} \sigma_{M}^{2}. \end{aligned} \tag{8}
The previous expression says that in the single index model, the covariance between of any two different securities is given by their exposures to the market and the variance of the market.

Example 3 (Computing a Covariance) Suppose that you have run regressions of excess returns of two securities A and B on the excess returns of the market. You find that \beta_{A} = 1.2 and beta_{B} = 0.9. If the variance of the market is 25% per year, then \operatorname{Cov}(R_{A}, R_{B}) = 1.2 \times 0.9 \times 0.25^{2} = 0.0675. If in addition we know that \sigma_{A} = 30\% and \sigma_{B} = 35\%, then we also have \rho_{A, B} = \frac{0.0675}{0.30 \times 0.35} = 0.643.

Correlation Structure

We can express equation (8) in terms of correlations as \begin{aligned} \rho_{i, j} & = \frac{\operatorname{Cov}(R_{i}, R_{j})}{\sigma_{i} \sigma_{j}} = \frac{\beta_{i} \beta_{j} \sigma_{M}^{2}}{\sigma_{i} \sigma_{j}} \\ & = \frac{\frac{\sigma_{i} \rho_{i, M}}{\sigma_{M}} \frac{\sigma_{j} \rho_{j, M}}{\sigma_{M}} \sigma_{M}^{2}}{\sigma_{i} \sigma_{j}} \\ & = \rho_{i, M} \rho_{j, M}. \end{aligned} \tag{9}
Therefore, in the single-index model, all pairwise correlations between two assets can be computed as the product of their correlations with the market.