Estimating the Beta of an Asset

Getting Ready

Packages

In this notebook we will be using the following packages.

import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

sns.set_theme()

There are three packages that we have seen before: yfinance, matplotlib, and seaborn. In the code cell there is now a new package called statsmodels, which is a Python module that provides a wide range of statistical tools and models for data analysis. It is built on top of the scientific computing library NumPy and the data manipulation library pandas. Statsmodels is designed to work seamlessly with these libraries to provide a comprehensive suite of statistical tools. The formula.api sub-library will allow us to run regressions as in R.

The Capital Asset Pricing Model

The Capital Asset Pricing Model (CAPM) predicts that the expected excess return of any asset is proportional to its exposure to the market portfolio: \[ E[R_i] - r_f = \beta_i \left( E[R_m] - r_f \right), \] where \(R_i = r_i - r_f\) is the excess return of asset \(i\), \(R_m = r_m - r_f\) is the excess return on the market portfolio, and \(\beta_i\) measures how sensitive asset \(i\) is to market-wide movements. An asset with \(\beta_i = 1\) moves one-for-one with the market; \(\beta_i > 1\) implies amplified sensitivity (more volatile than the market); \(\beta_i < 1\) implies a dampened response.

In the CAPM, \(\beta_i\) is the only source of cross-sectional variation in expected returns — two assets with the same beta should earn the same expected return regardless of any other characteristic. We test this restriction by allowing the regression to include an intercept \(\alpha_i\), known as Jensen’s alpha, which measures how much the asset has earned above or below the CAPM prediction on average. If the model holds exactly, \(\alpha_i\) should be zero.

The model is estimated by regressing excess asset returns on excess market returns: \[ R_i = \alpha_i + \beta_i R_m + e_i. \]

Estimating a Linear Regression

Suppose that you want to estimate the coefficients of the following model: \[ y = \alpha + \beta x + e, \] where \(y\) is your dependent variable and \(x\) is your independent variable. The relationship is not exact since there is an error term \(e\). This model is called a linear regression of \(y\) on \(x\).

To estimate the parameters of the model we can use Ordinary Least Squares (OLS), which is a statistical method used to estimate the parameters of a linear regression model. In linear regression, the goal is to find the line of best fit that describes the relationship between a dependent variable and one or more independent variables.

OLS works by minimizing the sum of the squared differences between the actual values of the dependent variable and the predicted values from the linear regression model. The predicted values are calculated using the estimated coefficients of the independent variables in the regression equation.

Suppose that have a dataframe df which contains a column named Y and another column named X. If you want to regress Y on X, you can do smf.ols('Y ~ X', data = df).fit():

  • smf uses the statsmodels.formula.api library.
  • ols('Y ~ X', data = df) creates the model. The data is the dataframe df and the model is Y ~ X
  • fit() then fits the model.

That is, \(y = \alpha + \beta x + e \Longleftrightarrow\) smf.ols('y ~ x', data = df).fit().

Data

To estimate the CAPM we need two inputs: a proxy for the market portfolio and a proxy for the risk-free rate. We will use SPY, which stands for the SPDR S&P 500 exchange-traded fund (ETF). It is one of the most popular ETFs in the world and is designed to track the performance of the S&P 500 stock market index. The S&P 500 is a market-capitalization-weighted index of the 500 largest publicly traded companies in the United States.

The SPY is designed to provide investors with the ability to easily buy and sell shares that represent a diversified portfolio of the underlying stocks in the S&P 500. For us it is useful since it captures the total return of investing in the portfolio of companies that make the S&P 500.

We will compute the beta of Microsoft (MSFT), Apple (AAPL) and Tesla (TSLA) using data from Dec 01, 2019 until Jan 01, 2025. The CAPM is estimated using excess returns — returns above the risk-free rate. We use ^IRX, the 13-week Treasury Bill yield, as our proxy for \(r_f\).

tickers = ['SPY', 'MSFT', 'AAPL', 'TSLA', '^IRX']
start = '2019-12-01'
end = '2025-01-01'
df = yf.download(tickers, start=start, end=end, auto_adjust=False, progress=False).loc[:, 'Adj Close']
ret = df[['SPY', 'MSFT', 'AAPL', 'TSLA']].resample('ME').last().pct_change().dropna()
rf = df['^IRX'].resample('ME').last() / 1200
rf = rf.reindex(ret.index)
exc = ret.subtract(rf, axis=0)

Let us go through the code step by step.

  • ret is computed only for the four stock tickers. We exclude ^IRX here because it is already an interest rate, not a price, so taking pct_change() on it would be meaningless.
  • rf takes the last ^IRX observation of each month and divides by 1200. The division by 100 converts the annualized percentage (e.g. 5.25) to a decimal (0.0525), and the further division by 12 converts it from an annual rate to a monthly one.
  • rf = rf.reindex(ret.index) aligns rf to exactly the same dates as ret. This is necessary because ^IRX may have slightly different trading days than equities, and reindex fills in matching dates while dropping any mismatches.
  • exc = ret.subtract(rf, axis=0) subtracts the monthly risk-free rate from every asset’s return row by row. The result is a dataframe of excess returns — the raw returns minus what could have been earned risk-free.

Industry practitioners usually estimate the CAPM beta using monthly returns spanning between 36 to 60 months. Our choice of dates implies that we have 60 months of data.

Estimating the Beta of MSFT

Plotting the Data

We start by generating a scatter plot of monthly excess returns: MSFT minus \(r_f\) against SPY minus \(r_f\).

sns.scatterplot(x=exc['SPY'], y=exc['MSFT'], alpha=0.5)
plt.title('MSFT vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess MSFT Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

The graph clearly shows that a large fraction of the monthly variation in the returns on MSFT is explained by the SPY. Of course, the relationship is not perfect but there is clearly a tendency. This is the beta of the stock.

Estimating the CAPM Regression

We can now estimate the beta of the slope coefficient of the regression: \[ r_{\text{MSFT}} - r_f = \alpha + \beta (r_{\text{SPY}} - r_f) + e. \]

To regress excess MSFT returns on excess SPY returns I use MSFT ~ SPY as my model and exc as my data, i.e. res_msft = smf.ols('MSFT ~ SPY', data=exc).fit(). This will generate a set of results that I store in the variable res_msft. The function summary(slim=True) then presents the results in a nice format. Notice that display(res_msft.summary(slim=True)) uses a different font and the table looks odd in some web browsers. Use print(res_msft.summary(slim=True)) instead. The intercept \(\alpha\) is Jensen’s alpha — it measures how much the asset has outperformed the return predicted by the CAPM. If the model holds exactly, \(\alpha\) should be zero.

res_msft = smf.ols('MSFT ~ SPY', data=exc).fit()
print(res_msft.summary(slim=True))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   MSFT   R-squared:                       0.525
Model:                            OLS   Adj. R-squared:                  0.517
No. Observations:                  60   F-statistic:                     64.15
Covariance Type:            nonrobust   Prob (F-statistic):           5.90e-11
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0077      0.006      1.305      0.197      -0.004       0.020
SPY            0.8945      0.112      8.010      0.000       0.671       1.118
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plotting the Security Characteristic Line (SCL)

We can visualize the regression by overlaying the fitted line on the scatter plot. Seaborn’s regplot function does both in a single call: it plots the observations as a scatter plot and automatically draws the OLS regression line through them. The argument scatter_kws={'alpha': 0.5} controls the transparency of the dots, and ci=None suppresses the confidence band around the line.

sns.regplot(x=exc['SPY'], y=exc['MSFT'], scatter_kws={'alpha': 0.5}, ci=None)
plt.title('MSFT vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess MSFT Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

Estimating the Beta of TSLA

Tesla (TSLA) is known for large price swings, so we expect its beta to be considerably higher than MSFT’s. We follow the same steps as before.

Plotting the Data

sns.scatterplot(x=exc['SPY'], y=exc['TSLA'], alpha=0.5)
plt.title('TSLA vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess TSLA Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

The scatter is noticeably wider than for MSFT, reflecting Tesla’s higher idiosyncratic volatility. A positive co-movement with SPY is still visible, but the fit is looser.

Estimating the CAPM Regression

res_tsla = smf.ols('TSLA ~ SPY', data=exc).fit()
print(res_tsla.summary(slim=True))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   TSLA   R-squared:                       0.301
Model:                            OLS   Adj. R-squared:                  0.289
No. Observations:                  60   F-statistic:                     24.94
Covariance Type:            nonrobust   Prob (F-statistic):           5.75e-06
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0403      0.025      1.633      0.108      -0.009       0.090
SPY            2.3197      0.465      4.994      0.000       1.390       3.250
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plotting the Security Characteristic Line (SCL)

sns.regplot(x=exc['SPY'], y=exc['TSLA'], scatter_kws={'alpha': 0.5}, ci=None)
plt.title('TSLA vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess TSLA Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

Estimating the Beta of AAPL

Apple (AAPL) is a large, mature technology company. Its beta is generally expected to be close to one — correlated with the market but less volatile than TSLA.

Plotting the Data

sns.scatterplot(x=exc['SPY'], y=exc['AAPL'], alpha=0.5)
plt.title('AAPL vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess AAPL Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

The points cluster more tightly around the diagonal than for TSLA, suggesting that Apple’s returns track the market more closely with less idiosyncratic noise.

Estimating the CAPM Regression

res_aapl = smf.ols('AAPL ~ SPY', data=exc).fit()
print(res_aapl.summary(slim=True))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   AAPL   R-squared:                       0.573
Model:                            OLS   Adj. R-squared:                  0.565
No. Observations:                  60   F-statistic:                     77.69
Covariance Type:            nonrobust   Prob (F-statistic):           2.69e-12
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0097      0.007      1.329      0.189      -0.005       0.024
SPY            1.2077      0.137      8.814      0.000       0.933       1.482
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plotting the Security Characteristic Line (SCL)

sns.regplot(x=exc['SPY'], y=exc['AAPL'], scatter_kws={'alpha': 0.5}, ci=None)
plt.title('AAPL vs. SPY Monthly Excess Returns')
plt.xlabel('Excess SPY Return')
plt.ylabel('Excess AAPL Return')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

Practice Problems

Problem 1

Estimate the beta of AMD stock with respect the SPY using monthly excess returns (i.e., returns minus the risk-free rate) from January 2018 until January 2023. Use ^IRX as the proxy for the risk-free rate.

Solution

In order to have monthly returns starting January 2018, I will download price data from December 2017.

df = yf.download(tickers=['AMD', 'SPY', '^IRX'], start='2017-12-01', end='2023-01-01', auto_adjust=False, progress=False).loc[:, 'Adj Close']
ret = df[['AMD', 'SPY']].resample('ME').last().pct_change().dropna()
rf = df['^IRX'].resample('ME').last() / 1200
rf = rf.reindex(ret.index)
exc = ret.subtract(rf, axis=0)
res_capm = smf.ols('AMD ~ SPY', data=exc).fit()
print(res_capm.summary(slim=True))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    AMD   R-squared:                       0.398
Model:                            OLS   Adj. R-squared:                  0.388
No. Observations:                  60   F-statistic:                     38.38
Covariance Type:            nonrobust   Prob (F-statistic):           6.49e-08
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0289      0.018      1.623      0.110      -0.007       0.065
SPY            2.0409      0.329      6.195      0.000       1.381       2.700
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Using the data, the beta of AMD is estimated to be 1.6650.