from getfactormodels import FamaFrenchFactors
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()Fama-French Factors
Theoretical Background
The Fama-French Three-Factor Model
Modern financial theory recognizes that the CAPM alone is insufficient to explain the cross-section of asset prices. For many years, the dominant asset pricing model has been the one proposed by Fama and French (1993), in which the return of a security \(i\) is determined not only by its exposure to the market but also by other systematic risk factors that affect the investment opportunity set: \[ R_{i} = a_{i} + b_{i} R_{m} + s_{i} \mathit{SMB} + h_{i} \mathit{HML} + e_{i} \]
where \(R_{i} = r_{i} - r_{f}\) represents the monthly excess returns for security \(i\) over the risk-free rate, \(R_{m} = r_{m} - r_{f}\) represents the monthly excess returns of the market, \(\mathit{SMB}\) denotes the size factor, \(\mathit{HML}\) denotes the value factor, and \(e_i\) is the idiosyncratic (residual) return not explained by the factors. The intercept \(a_i\) measures the average return left unexplained by the model — commonly referred to as Jensen’s alpha — and the coefficients \(b_i\), \(s_i\), and \(h_i\) are the factor loadings that capture the sensitivity of security \(i\) to each systematic risk factor.
The size factor (SMB) captures the empirical regularity, documented by Fama and French (1992), that small-cap stocks have historically earned higher average returns than large-cap stocks, even after accounting for their greater exposure to market risk. This size premium is often attributed to the higher business risk, lower liquidity, and greater financial distress risk of small firms, which investors demand compensation for bearing.
The value factor (HML) captures the equally well-documented pattern, also in Fama and French (1992), that stocks with a high book-to-market ratio (value stocks) have historically outperformed stocks with a low book-to-market ratio (growth stocks), again after controlling for market exposure. This value premium is interpreted either as compensation for distress risk — value stocks tend to be financially fragile firms — or as a result of investor overreaction that systematically misprices these stocks.
By including size and value alongside the market factor, the Fama-French three-factor model explains a substantially larger share of the cross-sectional variation in average returns than the CAPM. It has become a foundational benchmark in academic research and is widely applied in industry for performance evaluation and portfolio construction.
The Fama-French factors are constructed using a 2×3 double sort. Stocks are first divided into two size groups — Small and Big — based on whether their market capitalization is below or above the NYSE median. Within each size group, stocks are further sorted into three book-to-market groups — Value, Neutral, and Growth — using the 30th and 70th NYSE percentiles as breakpoints. This produces six value-weighted portfolios: Small Value, Small Neutral, Small Growth, Big Value, Big Neutral, and Big Growth.
\(\mathit{SMB}\) (Small Minus Big) is the average return on the three small portfolios minus the average return on the three big portfolios, averaging across all book-to-market groups to isolate the size effect: \[ \begin{align*} \mathit{SMB} = & \frac{1}{3} \left( \text{Small Value} + \text{Small Neutral} + \text{Small Growth} \right) \\ & - \frac{1}{3} \left( \text{Big Value} + \text{Big Neutral} + \text{Big Growth} \right). \end{align*} \] \(\mathit{HML}\) (High Minus Low) is the average return on the two value portfolios minus the average return on the two growth portfolios, averaging across size groups to isolate the value effect. Note that the Neutral portfolios are excluded from this calculation: \[ \mathit{HML} = \frac{1}{2} \left( \text{Small Value} + \text{Big Value} \right) - \frac{1}{2} \left( \text{Small Growth} + \text{Big Growth} \right). \]
The return on the market \(R_{m}\) is the value-weighted return of all CRSP firms incorporated in the US and listed on the NYSE, AMEX, or NASDAQ that have a CRSP share code of 10 or 11 at the beginning of month \(t\), good shares and price data at the beginning of \(t\), and good return data for \(t\), minus the one-month Treasury bill rate (from Ibbotson Associates).
The Fama-French Five-Factor Model
Despite its success, the three-factor model leaves important patterns in average returns unexplained. Novy-Marx (2013) shows that more profitable firms earn higher average returns even after controlling for size and value, while Titman et al. (2004) document that firms that invest aggressively tend to earn lower subsequent returns. Neither of these patterns is captured by the original three factors. To address these shortcomings, Fama and French (2015) introduce a five-factor model that augments the three-factor model with profitability and investment factors, \[ R_{i} = a_{i} + b_{i} R_{m} + s_{i} \mathit{SMB} + h_{i} \mathit{HML} + r_{i} \mathit{RMW} + c_{i} \mathit{CMA} + e_{i} \]
where \(\mathit{RMW}\) (Robust Minus Weak) is the return difference between diversified portfolios of stocks with high and low operating profitability, and \(\mathit{CMA}\) (Conservative Minus Aggressive) is the return difference between portfolios of firms with low and high investment rates — referred to by Fama and French as conservative and aggressive firms, respectively.
The factors \(\mathit{RMW}\) and \(\mathit{CMA}\) are constructed using the same 2×3 double-sort procedure as \(\mathit{HML}\), but the second sort variable changes: for \(\mathit{RMW}\), stocks are sorted on operating profitability (annual revenues minus cost of goods sold, interest expense, and selling, general, and administrative expenses, scaled by book equity); for \(\mathit{CMA}\), stocks are sorted on investment, measured as the annual growth in total assets.
A key testable implication of the model is that if the five factors span all priced sources of systematic risk, then the intercept \(a_{i}\) should be statistically indistinguishable from zero for any stock or portfolio. A significantly positive \(a_i\) would indicate that the model fails to account for some dimension of expected returns — a finding that would motivate the search for additional risk factors.
Connection to the Mean-Variance Frontier
It is useful to connect multifactor models back to the mean-variance framework. In the CAPM, the market portfolio is the tangency portfolio — the single efficient portfolio whose covariance with any asset fully determines that asset’s expected return. One portfolio suffices because the pricing kernel is one-dimensional.
Multifactor models arise when the true pricing kernel is multidimensional. Merton (1973) provides the theoretical foundation through the Intertemporal CAPM: if the investment opportunity set changes over time — so that expected returns, volatilities, or correlations fluctuate — then rational investors will hedge against adverse shifts in these conditions by holding portfolios correlated with the relevant state variables. This hedging demand generates additional priced risk factors beyond the market. Fama and French (1996) shows explicitly how the Fama-French factors can be interpreted as mimicking portfolios for these ICAPM state variables, providing the formal theoretical link between the empirical model and the mean-variance frontier.
In mean-variance terms, the factors are factor-mimicking portfolios — long-short portfolios designed to isolate one dimension of systematic risk at a time. Their key property is that, taken together, they should span the mean-variance frontier. Any portfolio on the frontier spanned by the factors prices all assets correctly, in the same way that the market portfolio alone prices all assets in the CAPM. This is precisely what the \(a_i = 0\) condition tests: if any asset has a non-zero alpha, the factors do not fully span the pricing kernel, signaling that there is at least one missing dimension of systematic risk. This is why the empirical asset pricing literature continues to propose new factors — each one is a candidate for a dimension of the true tangency portfolio that existing models have failed to capture.
Python Packages
The Package getfactormodels
The library getfactormodels allows users to retrieve data for various multi-factor asset pricing models, including the Fama-French three-factor and five-factor models. The data is sourced directly from Kenneth French’s website.
Installation
You need to install getfactormodels before you use it for the first time. In a terminal or Anaconda command prompt type:
pip install getfactormodelsor in a Jupyter cell you can type
!pip install getfactormodels
Loading the Required Packages
In this example we will use the following packages:
Getting the Fama-French Factors
We will get the Fama-French factors starting in 1963. For this, we define the starting date as:
start_date = '1963-01-01'We download the Fama-French five-factor model from Kenneth French’s website:
ff = FamaFrenchFactors(model='5', frequency='m', start_date=start_date).load().to_pandas()Let’s inspect the downloaded data:
display(ff)| Mkt-RF | SMB | HML | RMW | CMA | RF | |
|---|---|---|---|---|---|---|
| date | ||||||
| 1963-07-31 | -0.0039 | -0.0048 | -0.0081 | 0.0064 | -0.0115 | 0.0027 |
| 1963-08-31 | 0.0508 | -0.0080 | 0.0170 | 0.0040 | -0.0038 | 0.0025 |
| 1963-09-30 | -0.0157 | -0.0043 | 0.0000 | -0.0078 | 0.0015 | 0.0027 |
| 1963-10-31 | 0.0254 | -0.0134 | -0.0004 | 0.0279 | -0.0225 | 0.0029 |
| 1963-11-30 | -0.0086 | -0.0085 | 0.0173 | -0.0043 | 0.0227 | 0.0027 |
| ... | ... | ... | ... | ... | ... | ... |
| 2025-09-30 | 0.0339 | -0.0218 | -0.0105 | -0.0206 | -0.0222 | 0.0033 |
| 2025-10-31 | 0.0196 | -0.0130 | -0.0310 | -0.0521 | -0.0403 | 0.0037 |
| 2025-11-30 | -0.0013 | 0.0147 | 0.0376 | 0.0142 | 0.0068 | 0.0030 |
| 2025-12-31 | -0.0036 | -0.0022 | 0.0242 | 0.0040 | 0.0037 | 0.0034 |
| 2026-01-31 | 0.0102 | 0.0326 | 0.0370 | 0.0183 | 0.0181 | 0.0030 |
751 rows × 6 columns
Since we are working directly with excess returns, we do not need the risk-free rate separately — it has already been subtracted from the market return. We therefore drop the RF column. We also rename Mkt-RF to RMRF for cleaner notation consistent with the model equation above.
ff = ff.rename(columns={'Mkt-RF': 'RMRF'}).drop(columns=['RF'])display(ff)| RMRF | SMB | HML | RMW | CMA | |
|---|---|---|---|---|---|
| date | |||||
| 1963-07-31 | -0.0039 | -0.0048 | -0.0081 | 0.0064 | -0.0115 |
| 1963-08-31 | 0.0508 | -0.0080 | 0.0170 | 0.0040 | -0.0038 |
| 1963-09-30 | -0.0157 | -0.0043 | 0.0000 | -0.0078 | 0.0015 |
| 1963-10-31 | 0.0254 | -0.0134 | -0.0004 | 0.0279 | -0.0225 |
| 1963-11-30 | -0.0086 | -0.0085 | 0.0173 | -0.0043 | 0.0227 |
| ... | ... | ... | ... | ... | ... |
| 2025-09-30 | 0.0339 | -0.0218 | -0.0105 | -0.0206 | -0.0222 |
| 2025-10-31 | 0.0196 | -0.0130 | -0.0310 | -0.0521 | -0.0403 |
| 2025-11-30 | -0.0013 | 0.0147 | 0.0376 | 0.0142 | 0.0068 |
| 2025-12-31 | -0.0036 | -0.0022 | 0.0242 | 0.0040 | 0.0037 |
| 2026-01-31 | 0.0102 | 0.0326 | 0.0370 | 0.0183 | 0.0181 |
751 rows × 5 columns
Note that we could have done everything in just one line:
ff = (FamaFrenchFactors(model='5', frequency='m', start_date=start_date)
.load()
.to_pandas()
.rename(columns={'Mkt-RF': 'RMRF'})
.drop(columns='RF')
)Analyzing the Fama-French Factors
Let’s now plot the monthly factors.
axes = ff.plot(figsize=(12,8), subplots=True)
for c in axes:
c.axhline(y=0, lw=1)
plt.show()The figure uses a (12,8) size so that each factor occupies its own subplot, making it easier to compare their individual time-series behavior. The horizontal line at \(y = 0\) serves as a visual reference to identify periods when a factor earned a positive or negative premium.
A key property of any valid asset pricing factor is that it must represent a dimension of systematic risk — one that cannot be eliminated through diversification. If a factor premium could be arbitraged away by holding a broad portfolio, rational investors would do so and the premium would disappear. In the following plot, we smooth out short-run noise by computing a 7-year (84-month) rolling average for each factor, which makes it easier to assess whether the factor premiums have been persistent over time or have weakened in recent decades.
axes = ff.rolling(84).mean().plot(figsize=(12,8), subplots=True, sharey=True)
for c in axes:
c.axhline(y=0, lw=1)
plt.show()The shared \(y\)-axis (sharey=True) allows for a direct visual comparison of the magnitude of the premiums across all five factors. The market premium \(\mathit{RMRF}\) is positive for most of the sample, reflecting the long-run equity risk premium, though it dips sharply around major downturns such as the dot-com crash and the 2008 financial crisis.
The remaining factors tell a more mixed story. The size premium \(\mathit{SMB}\) shows a pronounced decline after the early 1980s, raising questions about whether the small-cap premium has diminished as capital markets became more efficient and information about small firms more accessible. The value premium \(\mathit{HML}\) was strong through most of the 20th century but has been notably weak since the mid-2000s, a period during which growth stocks — particularly in the technology sector — delivered unusually high returns. By contrast, the profitability factor \(\mathit{RMW}\) has been relatively stable and consistently positive, suggesting that the tendency of profitable firms to outperform has been more durable. The investment factor \(\mathit{CMA}\) has also remained above zero for most of the sample, consistent with the view that conservative firms systematically earn higher returns than aggressive firms.
Overall, the graph illustrates that factor premiums are not static: they vary substantially over time, and their persistence — or lack thereof — is a central question in the empirical asset pricing literature.
Estimating a Factor Model
The real payoff from having these factors is using them to explain the returns of individual assets. Let’s estimate the five-factor model for Microsoft (MSFT) — the same stock we analyzed under the CAPM in Module 2. We use yfinance to compute monthly excess returns and then merge with the Fama-French factors.
import pandas as pd
import yfinance as yf
import statsmodels.formula.api as smf
start_date_msft = '1999-12-01'
end_date = '2026-01-01'
stock = (yf
.download('MSFT', start=start_date_msft, end=end_date, auto_adjust=True, progress=False, multi_level_index=False)
.loc[:, ['Close']]
.resample('ME').last()
.pct_change()
.rename(columns={'Close': 'RET'})
)
stock.index = stock.index.to_period('M')
# Unlike the factor visualization above, we retain RF here — it is needed to compute MSFT's excess return.
ff5 = (FamaFrenchFactors(model='5', frequency='m', start_date=start_date_msft, end_date=end_date)
.load()
.to_pandas()
.rename(columns={'Mkt-RF': 'RMRF'})
)
ff5.index = pd.to_datetime(ff5.index).to_period('M')
merged = (pd.merge(stock, ff5, left_index=True, right_index=True)
.assign(RETRF=lambda d: d['RET'] - d['RF'])
.drop(columns=['RET', 'RF'])
.dropna()
)We now regress MSFT excess returns on all five factors.
res = smf.ols('RETRF ~ RMRF + SMB + HML + RMW + CMA', data=merged).fit()
print(res.summary(slim=True)) OLS Regression Results
==============================================================================
Dep. Variable: RETRF R-squared: 0.458
Model: OLS Adj. R-squared: 0.450
No. Observations: 312 F-statistic: 51.82
Covariance Type: nonrobust Prob (F-statistic): 7.88e-39
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.0055 0.003 1.567 0.118 -0.001 0.012
RMRF 1.0760 0.083 13.019 0.000 0.913 1.239
SMB -0.3329 0.128 -2.596 0.010 -0.585 -0.081
HML -0.3341 0.135 -2.466 0.014 -0.601 -0.067
RMW 0.0591 0.152 0.389 0.698 -0.240 0.358
CMA -0.5269 0.201 -2.618 0.009 -0.923 -0.131
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
A few things to note. The market beta is close to 1, which makes sense for a large-cap stock. The negative HML loading indicates a growth tilt — MSFT trades at a high price relative to book value, consistent with the market pricing in strong future earnings. The alpha (\(a_i\)) tests whether the five factors fully explain MSFT’s average return: a value near zero and a large p-value would suggest the model does a good job. A significant positive alpha would indicate the stock has outperformed even after accounting for all five risk exposures.
This is exactly the approach we will apply to mutual fund returns in the next notebook, where the goal is to determine whether a fund manager earns genuine alpha or is simply being compensated for factor exposures a passive strategy could replicate.
Practice Problems
Problem 1 Compute the correlation matrix of the five Fama-French factors using monthly data from December 1999 until January 2023. Which pairs of factors exhibit the most positive and most negative correlations?
Solution
ff = (FamaFrenchFactors(model='5', frequency='m', start_date='1999-12-01', end_date='2023-01-01')
.load()
.to_pandas()
.rename(columns={'Mkt-RF': 'RMRF'})
.drop(columns='RF')
)
ff.corr().round(2)| RMRF | SMB | HML | RMW | CMA | |
|---|---|---|---|---|---|
| RMRF | 1.00 | 0.27 | -0.03 | -0.34 | -0.24 |
| SMB | 0.27 | 1.00 | 0.02 | -0.50 | 0.00 |
| HML | -0.03 | 0.02 | 1.00 | 0.38 | 0.63 |
| RMW | -0.34 | -0.50 | 0.38 | 1.00 | 0.26 |
| CMA | -0.24 | 0.00 | 0.63 | 0.26 | 1.00 |