The Index Model

Introduction

One of the disadvantages of the Markowitz model is that it requires estimates of all covariances and expected returns. Estimating the covariance between two assets is not hard, and can be done with precision, but estimating the pairwise covariances of five hundred different assets is a large number! Indeed, for five hundred stocks we would have to estimate \(500 \times 501 / 2 = 125{,}250\) different covariances. One way to circumvent this problem is to realize that stock returns exhibit a factor structure driven by the market.

Indeed, for many stocks the market explains a large fraction of their variablity. Whatever is not explained by the market is firm-specific risk. The index model formalizes this intuition by splitting the variance of each stock into systematic and idiosyncratic variance.

The Model

The single index model is a linear regression between the excess returns of a stock and the excess returns of the market portfolio. Let’s denote by \(r_{i}\) the return of stock \(i\) over a given period, and define \(R_{i} = r_{i} - r{f}\) as the excess return over the risk-free asset. Similarly, denote by \(R_{m}\) the excess return of the market over the risk-free rate. The index model postulates \[ R_{i} = \alpha_{i} + \beta_{i} R_{m} + e_{i}, \tag{1}\] where \(\ev(e_{i}) = 0\) and \(\cov(R_{m}, e_{i}) = 0.\)

In the single index model, the beta with the market captures the exposure of all securities to systematic risk, which is the risk shared by all securities. Therefore, the error term is idiosyncratic. The single index model assumes that \(\cov(e_{i}, e_{j}) = 0\) for any two securities \(i\) and \(j\) whose returns are not perfectly correlated.

Variance Decomposition

The assumptions of the single index model imply that the variance of \(R_{i}\) can be split into two parts: \[ \begin{aligned} \sigma_{i}^{2} & = \var(\alpha_{i} + \beta_{i} R_{m} + e_{i}) \\ & = \beta_{i}^{2} \var(R_{M}) + \var(e_{i}) + 2 \beta_{m} \cov(R_{m}, e_{i}) \\ & = \beta_{i}^{2} \sigma_{M}^{2} + \sigma^{2}(e_{i}). \end{aligned} \tag{2}\] The first component of \(\sigma_{i}^{2}\) is the systematic variance, which depends on the beta of the security but also the variance of the market. The second term of the variance of \(i\) is typically computed as the difference between the variance of \(i\) and the systematic variance.

Example 1 Suppose that you have the following regression for stock \(A\): \[ R_{A} = \alpha_{A} + \beta_{A} R_{M} + e_{A}, \] where \(\alpha_{A} = 0.02,\) \(\beta_{A} = 1.2,\) \(\sigma(e_{A}) = 30\%\) and \(\sigma_{M} = 25\%.\) The variance of \(A\) can be computed as follows: \[ \sigma_{A}^{2} = 1.2^{2} \times 0.25^{2} + 0.30^{2} = 0.09 + 0.09 = 0.18. \] In the previous expression, the systematic and idiosyncratic variances are the same. The standard deviation of \(A\) is then \(\sigma_{A} = \sqrt{0.18} = 42.43\%.\)

The Security Characteristic Line (SCL)

The beta of the model can be estimated from the covariance of \(R_{i}\) and \(R_{m},\) and the variance of \(R_{m}.\) Indeed, we have that \[ \begin{aligned} \cov(R_{i}, R_{m}) & = \cov(\beta_{i} R_{m}, R_{m}) + \cov(e_{i}, R_{m}) \\ & = \beta_{i} \var(R_{m}), \end{aligned} \] where we use the fact that \(\cov(R_{m}, R_{m}) = \var(R_{m}).\) Thus, in the index model we must have that \[ \beta_{i} = \frac{\cov(R_{i}, R_{m})}{\var(R_{m})}. \tag{3}\] The alpha of the security can then be computed as \[ \alpha_{i} = \ev(R_{i}) - \beta_{i} \ev(R_{M}). \tag{4}\] The line \[ y = \alpha_{i} + \beta_{i} x \] is called the security characteristic line (SCL) of security \(i.\) If we plot this line using a line chart, \(\alpha_{i}\) is the intercept and \(\beta_{i}\) is the slope coefficient of the line.

R-Squared

Following the statistic literature, the proportion of systematic variance to total variance is called the R-squared of security \(i\) and can be expressed as \[ \text{R-squared} = \frac{\beta_{i}^{2} \sigma_{M}^{2}}{\sigma_{i}^{2}} = 1 - \frac{\sigma^{2}(e_{i})}{\sigma_{i}^{2}}. \tag{5}\] Therefore, the R-squared can also be expressed as one minus the proportion of idiosyncratic variance to total variance. Since the single index model aims to decompose the total variance of a security into two orthogonal components, the R-squared gives us the proportions of this decomposition.

Example 2 You regress the excess returns of stock \(B\) on the excess returns of the market: \[ R_{B} = \alpha_{B} + \beta_{B} R_{M} + e_{B}. \] Your regression package reports that \(\alpha_{B} = -0.01,\) \(\beta_{B} = 0.8\) and the R-squared is 0.4. If the volatility of the market is 25% per year, the systematic variance is \(0.8^{2} \times 0.25^{2} = 0.04.\) Since 40% of the variance is systematic, we have that \[ \sigma_{B}^{2} = \frac{0.04}{0.4} = 0.10, \] which implies that \(\sigma_{B} = \sqrt{0.10} = 31.62\%\) per year. We also know that 60% of the variance is firm-specific, which means that \[ \sigma^{2}(e_{B}) = 0.6 \times 0.10 = 0.06, \] so that \(\sigma(e_{B}) = \sqrt{0.06} = 24.49\%\) per year.

Eq. 5 can also be expresed in term of the correlation between \(R_{i}\) and \(R_{M}.\) Indeed, since \[ \beta_{i} = \frac{\cov(R_{i}, R_{M})}{\var(R_{M})} = \frac{\sigma_{i} \sigma_{M} \rho_{i, M}}{\sigma_{M}^{2}} = \frac{\sigma_{i} \rho_{i, M}}{\sigma_{M}}, \tag{6}\] we have that \[ \text{R-squared} = \frac{\beta_{i}^{2} \sigma_{M}^{2}}{\sigma_{i}^{2}} = \frac{\frac{\sigma_{i}^{2} \rho_{i, M}^{2}}{\sigma_{M}^{2}} \sigma_{M}^{2}}{\sigma_{i}^{2}} = \rho_{i, M}^{2}. \tag{7}\] Thus, the R-squared of a regression of \(R_{i}\) on \(R_{M}\) is just the square of the correlation between \(R_{i}\) and \(R_{M}.\) The name R-squared comes from the fact that we typically use the greek letter rho (\(\rho\)), which corresponds to the latin letter r, for correlation.

An Example with Real Data

As an example, let’s analyze the monthly returns of Microsoft (Ticker: MSFT) from June 2014 until June 2024. All data comes from Yahoo Finance. As a proxy for the risk-free rate, I use the 13-week Treasury Bill CBOE Index (Ticker ^IRX). The rate is expressed per year, so I convert it to a monthly rate by: \[ r_{\text{monthly}} = (1 + r_{\text{annual}})^{1/12} - 1. \] To proxy for the market, I use the SPDR S&P 500 ETF Trust (Ticker: SPY), which allows me to include the dividend distribution of the stocks forming the S&P 500. The monthly returns are computed using the adjusted price series to obtain a holding period return that includes dividends.

The table below presents some descriptive statistics of both series of excess monthly returns.

Table 1: The table presents descriptive statistics of Microsoft and S&P 500 monthly returns using data from June 2014 until June 2024.

Ticker	MSFT	SPY
Mean (%)	2.196	0.976
St. Dev. (%)	6.239	4.405

The table shows that during the period, the monthly returns of Microsoft are more volatile than those of the S&P 500, but so is the average excess returns of the two series.

The figure below presents a scatter plot of the data. Clearly, the points cluster around the SCL, and we can see that the range of returns of Microsoft is significantly wider than the S&P 500.

Figure 1: The figure plots the excess monthly returns of the market portfolio, proxied by SPY vs. the excess monthly returns of MSFT from June-1993 until June-2024.

We can use ordinary least squares (OLS) to estimate the slope coefficient and the intercept of the SCL. Many statistical packages allow to do this. The results below are computed using the Python library statsmodels.

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   MSFT   R-squared:                       0.488
Model:                            OLS   Adj. R-squared:                  0.484
Method:                 Least Squares   F-statistic:                     112.5
Date:                Mon, 14 Oct 2024   Prob (F-statistic):           7.30e-19
Time:                        21:50:01   Log-Likelihood:                 203.31
No. Observations:                 120   AIC:                            -402.6
Df Residuals:                     118   BIC:                            -397.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0123      0.004      2.934      0.004       0.004       0.021
SPY            0.9894      0.093     10.605      0.000       0.805       1.174
==============================================================================
Omnibus:                       10.257   Durbin-Watson:                   2.269
Prob(Omnibus):                  0.006   Jarque-Bera (JB):               14.540
Skew:                           0.432   Prob(JB):                     0.000696
Kurtosis:                       4.470   Cond. No.                         22.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The table presents many numbers but for our purposes we can focus on just a few of them. First, under the column coef we can see that the estimate for the intercept is 0.0123 whereas the beta estimate for MSFT is 0.9894. Since the beta of Microsoft is close to one, we can conclude that Microsoft carries almost the same systematic risk as the market.

We can compute the R-squared of the regression using the beta and variane of MSFT, and the variance of the market: \[ \text{R-squared} = \frac{0.9894^{2} \times 4.405^{2}}{6.239^{2}} = 0.488. \] The variance of MSFT and the S&P 500 are computed by squaring the standard deviations reported in Table 1. We can see that the computed R-squared corresponds to the R-squared reported by the regression package, implying that 48.9% of the variance is explained by the exposure of MSFT to the market. The remaining variance is firm-specific risk.

Differences in Beta and R-Squared

The figure below shows a scatter plot of market excess monthly returns vs. the monthly excess returns of two financial and two technology stocks:

Citigroup (Ticker: C)
BlackRock (Ticker: BLK)
Nvidia (Ticker: NVDA)
Tesla (Ticker: TSLA)

As before, we proxy the the market portfolio using the SPDR S&P 500 ETF Trust (Ticker: SPY), which allows us to include the dividend distribution of the stocks forming the S&P 500. The risk-free rate is obtained from the 13-week Treasury Bill CBOE Index (Ticker ^IRX).

Figure 2: The figure plots the excess monthly returns of the market portfolio labeled as RMRF and proxied by SPY, vs. the excess monthly returns of Citigroup (C), BlackRock (BLK), Nvidia (NVDA), and Tesla (TSLA) labeled RETRF, from June-1993 until June-2024.

The picture shows that different stocks have different degrees of firm-specific risk. BlackRock excess returns align quite well with market returns, whereas Tesla excess returns are the most dispersed. In general, you would expect more dispersion in technology stocks since by definition new technologies might or might not work. The fact that you get a technology to work is independent of what the market does.

The table below present relevant values of the regression.

Table 2: The table presents the Alpha, Beta and R-Squared estimates obtained by running a linear regression of excess monthly returns for Citigroup (C), BlackRock (BLK), Nvidia (NVDA), and Tesla (TSLA) on the market portfolio proxied by SPY using data from June-2014 until June-2024.

	Alpha		Beta	R-Squared
	Estimate	P-value
BLK	-0.0025	0.511	1.390	0.697
C	-0.0075	0.206	1.543	0.542
NVDA	0.0380	0.000	1.807	0.348
TSLA	0.0160	0.282	1.830	0.207

First, we can see that the R-squared is the lowest for Tesla, which is apparent from the pictures. The market explains a small fraction of the variance for the stock. On the other hand, almost 70% of BlackRock’s variance is explained by the market.

Also, we see that both Nvidia and Tesla load on significant systematic risk. Certainly, their cash flows are exposed to how the economy does and this is reflected on their high betas. Citigroup also has a high beta, which is typical of financial firms that are also exposed to how the market performs. BlackRock has the lowest beta of the four stocks, although is still higher than the beta we estimated for Microsoft.

During the period the only stock that has a positive alpha statistically different from zero at the 5% significance level is Nvidia. For the other stocks, we cannot reject the null hypothesis that the alpha is different from zero. The alpha of Nvidia is indeed impressive. It has out-performed the market by 3.86% per month during the last 10 years.

Finally, the table below shows the pairwise correlations between each stock and the market.

Table 3: The table shows the pairwise correlations and the square of their values between excess monthly returns for Citigroup (C), BlackRock (BLK), Nvidia (NVDA), and Tesla (TSLA) and the market portfolio proxied by SPY using data from June-2014 until June-2024.

	Correlation	Correlation Squared
BLK	0.835	0.697
C	0.736	0.542
NVDA	0.590	0.348
TSLA	0.455	0.207

As expected, the square of the correlation corresponds to the R-squared reported in Table 2.

Implications of the Model

When you run a regression of the excess returns of a security on the excess returns of the market, the residuals are automatically orthogonal to the regressor. Therefore, in Eq. 1 we must have that \(\cov(R_{m}, e) = 0\) for all securities.

The crucial assumption of the single index model is that the only systematic source of risk is the exposure of each security to the market. The implication of this assumption is that the covariance of the residuals between two securities is zero as long as their excess returns are not perfectly correlated.

We saw before that if \(A\) and \(B\) are perfectly correlated, we must have that \(R_{A} = w R_{B}\) for some \(w \neq 0.\) If this was the case, the covariance between \(e_{A}\) and \(e_{B}\) is not zero even though they are different assets. More precisely, we have that \[ \cov(e_{A}, e_{P}) = \cov(e_{A}, w e_{A}) = w \sigma^{2}(e_{A}) \neq 0, \] provided that \(w \neq 0,\) i.e., you do not invest everything in the risk-free asset.

IN the following, when we talk about two different assets it is implicitely assumed that their excess returns are not perfectly correlated, unless stated otherwise.

Covariance Structure

The covariace of the excess returns between securities \(i\) and \(j\) is \[ \begin{aligned} \cov(R_{i}, R_{j}) & = \cov(\alpha_{i} + \beta_{i} R_{M} + e_{i}, \alpha_{j} + \beta_{j} R_{M} + e_{j}) \\ & = \beta_{i} \beta_{j} \cov(R_{M}, R_{M}) \\ & = \beta_{i} \beta_{j} \sigma_{M}^{2}. \end{aligned} \tag{8}\] The previous expression says that in the single index model, the covariance between of any two different securities is given by their exposures to the market and the variance of the market. We can express Eq. 8 in terms of correlations as \[ \begin{aligned} \rho_{i, j} & = \frac{\cov(R_{i}, R_{j})}{\sigma_{i} \sigma_{j}} = \frac{\beta_{i} \beta_{j} \sigma_{M}^{2}}{\sigma_{i} \sigma_{j}} \\ & = \frac{\frac{\sigma_{i} \rho_{i, M}}{\sigma_{M}} \frac{\sigma_{j} \rho_{j, M}}{\sigma_{M}} \sigma_{M}^{2}}{\sigma_{i} \sigma_{j}} \\ & = \rho_{i, M} \rho_{j, M}. \end{aligned} \tag{9}\] Therfore, in the single-index model, all pairwise correlations between two assets can be computed as the product of their correlations with the market.