Predicting Short-Rate Changes

This notebook studies the dynamics of the short-term interest rate using monthly data from July 1926. We ask two questions: does the level of the risk-free rate predict its own future changes, and does the slope of the yield curve add predictive power beyond simple mean-reversion?

The first question is motivated by Vasicek (1977), one of the earliest equilibrium models of the term structure. The Vasicek model implies that the short rate follows a mean-reverting process, \[ \Delta r^f_{t + 1} = \alpha + \beta r^f_t + e_{t + 1}, \] where \(\Delta r^f_{t + 1} = r^f_{t + 1} - r^f_t\). Mean-reversion requires \(\beta < 0\): when the rate is above its long-run equilibrium \(r^* = -\alpha/\beta\), the expected change is negative, pulling it back down. We estimate this model on the full sample using Newey-West standard errors to correct for heteroskedasticity and autocorrelation.

Vasicek, Oldrich. 1977. “An Equilibrium Characterization of the Term Structure.” Journal of Financial Economics 5 (2): 177–88.

The second question is motivated by the expectations hypothesis of the term structure. If long rates are averages of expected future short rates, then the spread \(s_t = r^{10y}_t - r^f_t\) encodes the market’s expectation of where short rates are heading: a steep upward-sloping curve signals anticipated increases, an inverted curve signals anticipated cuts. We use the 10-year Treasury yield from CBOE (available from 1962) and augment the Vasicek regression with the lagged spread and the lagged change in the short rate to assess whether these predictors improve on simple mean-reversion.

Getting the Data

from getfactormodels import FamaFrenchFactors
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

sns.set_theme()

The short-term risk-free rate comes from the Fama-French data library via getfactormodels. The monthly RF series is reported as a decimal monthly return, so we multiply by 12 and by 100 to express it as an annualized percentage comparable to the 10-year yield. The 10-year Treasury yield (^TNX) is downloaded from Yahoo Finance at daily frequency and sampled at month-end. The two series are joined on a common month-end index, and the term spread is defined as SPREAD \(= \text{TNX} - \text{RF}\).

start_date = '1926-01-01'

ff3 = (FamaFrenchFactors(model='3', frequency='m', start_date=start_date)
       .load()
       .to_pandas()
       )

rf = 12 * ff3[['RF']] * 100
rf.index = pd.to_datetime(rf.index.astype(str)) + pd.offsets.MonthEnd(0)

tnx = (yf
       .download('^TNX', start=start_date, auto_adjust=False,
                 progress=False, multi_level_index=False)
       ['Close']
       .resample('ME').last()
       .rename('TNX')
       )
tnx.index = tnx.index + pd.offsets.MonthEnd(0)

rf = rf.join(tnx, how='left')
rf['SPREAD'] = rf['TNX'] - rf['RF']

First Look at the Data

We first plot the level of the 1-month Treasury rate from July 1926.

plt.plot(rf.index, rf['RF'])
plt.title('1-Month Treasury Rate')
plt.ylabel('Rate (%)')
plt.show()

The previous plot shows that the risk-free rate is a persistent time-series, i.e. its level changes slowly through time. In econometrics we say that the level of the risk-free rate is highly autocorrelated. One way to see this is to plot \(r_{t+1}^f\) vs. \(r_{t}^f\).

sns.scatterplot(x=rf['RF'].shift(1), y=rf['RF'], alpha=0.5)
plt.title('Scatter Plot of $r_{t+1}^f$ vs. $r_{t}^f$')
plt.xlabel('$r_{t}^f$')
plt.ylabel('$r_{t+1}^f$')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

This persistence creates a well-known problem called spurious regression. If two unrelated but persistent variables are regressed on each other, OLS will tend to find a highly significant relationship even though none exists in reality. The reason is mechanical: because each series moves slowly and stays close to its own past values, any two persistent series will appear correlated over a long sample simply by chance. Standard t-statistics are severely inflated, and R² can be misleadingly high. The solution is to work with the change in the variable rather than its level. First-differencing removes the slow drift that drives spurious correlation, leaving a stationary series that is amenable to standard regression inference.

rf['DELTA_RF'] = rf['RF'] - rf['RF'].shift(1)
sns.scatterplot(x=rf['RF'].shift(1), y=rf['DELTA_RF'], alpha=0.5)
plt.title('Scatter Plot of $\\Delta r_{t+1}^f$ vs. $r_{t}^f$')
plt.xlabel('$r_{t}^f$')
plt.ylabel('$\\Delta r_{t+1}^f$')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

The scatter plot looks far less structured than the previous one. Whereas the level plot showed an almost perfect 45-degree line — a telltale sign of spurious persistence — the change plot is a diffuse cloud with no immediately obvious pattern. This is the right starting point for regression: any predictability we find in the changes will reflect genuine dynamics rather than the mechanical correlation induced by persistence.

We can verify this formally using the augmented Dickey-Fuller (ADF) test. The test asks whether the series has a unit root — a stochastic trend that makes the series wander without bound and never settle around a fixed mean. The null hypothesis is that a unit root is present (non-stationarity); the alternative is that the series is stationary (mean-reverting). A large negative ADF statistic and a small p-value lead us to reject the null and conclude stationarity. We expect to fail to reject for the level \(r^f_t\) — consistent with the slow drift visible in the time-series plot — and to reject for the first difference \(\Delta r^f_t\), confirming that first-differencing is the right transformation before running regressions.

from statsmodels.tsa.stattools import adfuller

for label, series in [('RF level', rf['RF'].dropna()),
                      ('ΔRF change', rf['DELTA_RF'].dropna())]:
    stat, p, *_ = adfuller(series, autolag='AIC')
    print(f"{label}: ADF stat = {stat:.3f}, p-value = {p:.3f}")

RF level: ADF stat = -2.101, p-value = 0.244
ΔRF change: ADF stat = -8.181, p-value = 0.000

A Simple Mean-Reversion Model

Although no pattern is visible to the naked eye, the Vasicek model predicts a subtle negative slope: when the rate is high, changes should be slightly negative on average, and vice versa. Detecting this requires regression. We estimate \[ \Delta r^f_{t + 1} = \alpha + \beta r^f_t + e_{t + 1}. \]

For this, we regress the change in the rates vs. the past level of the risk-free rate. We control for heteroscedasticity and autocorrelation of the residuals using the Newey-West correction (Newey and West 1987). The number of lags is equal to the integer part of \(T^{1/4}.\)

Newey, Whitney K., and Kenneth D. West. 1987. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55 (3): 703–8.

nw_lags = int(len(rf)**0.25)

results = (smf
           .ols('DELTA_RF ~ RF.shift(1)', data=rf)
           .fit(cov_type='HAC', cov_kwds={'maxlags': nw_lags})
           )

print(results.summary(slim=True))

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               DELTA_RF   R-squared:                       0.012
Model:                            OLS   Adj. R-squared:                  0.011
No. Observations:                1193   F-statistic:                     7.045
Covariance Type:                  HAC   Prob (F-statistic):            0.00805
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.0784      0.023      3.415      0.001       0.033       0.123
RF.shift(1)    -0.0238      0.009     -2.654      0.008      -0.041      -0.006
===============================================================================

Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction

The results show that both estimates for \(\alpha\) and \(\beta\) have p-values less than 1%, so we conclude that both \(\alpha\) and \(\beta\) are significant at the 1% level. The fact that \(\beta < 0\) implies that the risk-free rate is mean-reverting. To see this, note that the long-run equilibrium rate is \(r^* = -\alpha/\beta > 0\). When \(r^f_t > r^*\), the expected change \(\alpha + \beta r^f_t = \beta(r^f_t - r^*)\) is negative since \(\beta < 0\), pulling the rate back toward \(r^*\). The opposite holds when \(r^f_t < r^*\).

We can compute the implied equilibrium rate directly from the estimated coefficients:

alpha_hat = results.params['Intercept']
beta_hat  = results.params['RF.shift(1)']
r_star    = -alpha_hat / beta_hat
print(f"Implied long-run equilibrium: r* = {r_star:.2f}%")
print(f"Sample mean of RF:             {rf['RF'].mean():.2f}%")

Implied long-run equilibrium: r* = 3.29%
Sample mean of RF:             3.24%

We can now add the line with the predicted values from the regression.

sns.regplot(x=rf['RF'].shift(1), y=rf['DELTA_RF'], scatter_kws={'alpha': 0.5})
plt.title('Scatter Plot of $\\Delta r_{t+1}^f$ vs. $r_{t}^f$')
plt.xlabel('$r_{t}^f$')
plt.ylabel('$\\Delta r_{t+1}^f$')
plt.axhline(y=0, color='gray', linewidth=1)
plt.axvline(x=0, color='gray', linewidth=1)
plt.show()

As expected, the slope coefficient of the line is negative.

Adding More Predictors

The simple model uses only the lagged level \(r^f_t\) to predict future rate changes. Two natural extensions suggest themselves. First, rate changes may be autocorrelated: if the Fed is in the middle of a tightening or easing cycle, last month’s move is likely to continue. Second, the slope of the yield curve — the difference between the 10-year yield and the short rate — encodes the market’s expectation of where short rates are heading. A steep curve signals anticipated rate increases; an inverted curve signals anticipated cuts. Both effects could improve predictive power beyond simple mean-reversion.

The term spread is constructed using ^TNX, the CBOE index that tracks the yield on the 10-year US Treasury note, downloaded from yfinance. Since ^TNX is only available from January 1962, the extended model uses a restricted sample. Before adding the new regressors, we first re-estimate the simple model on this sub-sample to confirm that mean-reversion holds there too and is not driven by the pre-1962 period.

Before estimating the extended model, we plot the short rate, the 10-year yield, and the term spread over the restricted sample to give context for the quantity that will turn out to be the dominant predictor.

fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)
rf_plot = rf.dropna(subset=['SPREAD'])
axes[0].plot(rf_plot.index, rf_plot['RF'],  label='Short rate (RF)')
axes[0].plot(rf_plot.index, rf_plot['TNX'], label='10-year yield (TNX)')
axes[0].set_ylabel('Rate (%)')
axes[0].legend()
axes[1].plot(rf_plot.index, rf_plot['SPREAD'], color='steelblue')
axes[1].axhline(0, color='gray', linewidth=1)
axes[1].set_ylabel('Spread (pp)')
plt.suptitle('Short Rate, 10-Year Yield, and Term Spread (1962–present)')
plt.tight_layout()
plt.show()

rf2 = rf.dropna(subset=['SPREAD'])
nw_lags_r = int(len(rf2)**0.25)

results_r = (smf
             .ols('DELTA_RF ~ RF.shift(1)', data=rf2)
             .fit(cov_type='HAC', cov_kwds={'maxlags': nw_lags_r})
             )

print(results_r.summary(slim=True))

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               DELTA_RF   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.012
No. Observations:                 767   F-statistic:                     5.462
Covariance Type:                  HAC   Prob (F-statistic):             0.0197
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.1140      0.039      2.895      0.004       0.037       0.191
RF.shift(1)    -0.0260      0.011     -2.337      0.019      -0.048      -0.004
===============================================================================

Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction

The coefficient on RF.shift(1) remains negative and significant on the restricted sample, so mean-reversion is not a pre-1962 artifact. We now extend the model to \[ \Delta r^f_{t+1} = \alpha + \beta r^f_t + \gamma \Delta r^f_t + \delta s_t + e_{t+1}. \]

The first new predictor is the lagged change \(\Delta r^f_t = r^f_t - r^f_{t-1}\), coded as DELTA_RF.shift(1). Applying .shift(1) to DELTA_RF lags it by one month so that when predicting \(\Delta r^f_{t+1}\), we use \(\Delta r^f_t\) — the change that just occurred. This tests whether rate changes are autocorrelated. For example, if the Fed begins a cutting cycle, rate decreases may persist over several months, making a positive \(\gamma\) plausible.

The second new predictor is the lagged term spread \(s_t = r^{10y}_t - r^f_t\) (SPREAD.shift(1)). The spread measures the slope of the yield curve: when long rates are well above short rates (\(s_t > 0\)), the market anticipates rising short rates, consistent with a positive \(\delta\). Using the spread rather than TNX directly avoids the high collinearity between RF.shift(1) and the level of the 10-year yield, since both series move closely together over the long run.

results2 = (smf
            .ols('DELTA_RF ~ RF.shift(1) + DELTA_RF.shift(1) + SPREAD.shift(1)', data=rf2)
            .fit(cov_type='HAC', cov_kwds={'maxlags': nw_lags_r})
            )

print(results2.summary(slim=True))

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               DELTA_RF   R-squared:                       0.088
Model:                            OLS   Adj. R-squared:                  0.084
No. Observations:                 767   F-statistic:                     14.04
Covariance Type:                  HAC   Prob (F-statistic):           6.46e-09
=====================================================================================
                        coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept            -0.1699      0.075     -2.252      0.024      -0.318      -0.022
RF.shift(1)          -0.0050      0.013     -0.374      0.708      -0.031       0.021
DELTA_RF.shift(1)    -0.0996      0.092     -1.078      0.281      -0.281       0.081
SPREAD.shift(1)       0.1295      0.035      3.714      0.000       0.061       0.198
=====================================================================================

Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction

The coefficient on RF.shift(1) (\(\hat{\beta}\)) is no longer significant once SPREAD is controlled for. This happens because SPREAD = TNX \(-\) RF already contains RF with a negative sign, so including SPREAD alongside RF.shift(1) amounts to including both the 10-year yield and the short rate as separate regressors. OLS can no longer distinguish between “mean-reversion in the level” and “response to the slope,” and the level effect is absorbed entirely by the spread.

The coefficient on DELTA_RF.shift(1) (\(\hat{\gamma}\)) also loses significance. The momentum in rate changes appears to be largely captured by the slope: a steep curve already encodes the market’s expectation of a sequence of rate increases, leaving little additional predictive content in the most recent month’s change.

The coefficient on SPREAD.shift(1) (\(\hat{\delta}\)) is highly significant with a positive sign: a steep yield curve — long rates well above short rates — predicts rising short rates, consistent with the expectations hypothesis. An inverted curve (\(s_t < 0\)) predicts falling rates, consistent with the well-documented leading relationship between yield curve inversions and monetary easing. The spread is therefore the dominant predictor in the extended model.

The following table compares all three specifications. Newey-West standard errors are reported in parentheses below each coefficient. The table uses standard significance codes: *** \(p < 0.01\), ** \(p < 0.05\), * \(p < 0.10\).

def fmt_coef(res, var):
    if var not in res.params:
        return ('', '')
    p = res.pvalues[var]
    stars = '***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.10 else ''
    coef = f"{res.params[var]:.3f}{stars}"
    se   = f"({res.bse[var]:.3f})"
    return (coef, se)

specs = {
    'Simple (full)':    results,
    'Simple (1962+)':   results_r,
    'Extended (1962+)': results2,
}
coef_vars = ['Intercept', 'RF.shift(1)', 'DELTA_RF.shift(1)', 'SPREAD.shift(1)']

# Build multi-row index: coefficient row then SE row for each variable
index_labels = []
for v in coef_vars:
    index_labels += [v, '']

table = {}
for name, res in specs.items():
    col = []
    for v in coef_vars:
        coef, se = fmt_coef(res, v)
        col += [coef, se]
    table[name] = col

df_table = pd.DataFrame(table, index=index_labels)
df_table.loc['N']  = {name: str(int(res.nobs))        for name, res in specs.items()}
df_table.loc['R²'] = {name: f"{res.rsquared:.3f}"     for name, res in specs.items()}

display(df_table)

	Simple (full)	Simple (1962+)	Extended (1962+)
Intercept	0.078***	0.114***	-0.170**
	(0.023)	(0.039)	(0.075)
RF.shift(1)	-0.024***	-0.026**	-0.005
	(0.009)	(0.011)	(0.013)
DELTA_RF.shift(1)			-0.100
			(0.092)
SPREAD.shift(1)			0.130***
			(0.035)
N	1193	767	767
R²	0.012	0.013	0.088

The table highlights two findings. First, the simple mean-reversion model is stable across samples: the estimates of \(\hat\alpha\) and \(\hat\beta\) and their significance levels are virtually unchanged between the full sample and the 1962+ subsample, so mean-reversion in interest rates is not an artifact of any particular period. Second, adding the term spread substantially improves fit: R² rises from 0.013 to 0.088 on the restricted sample — almost a sevenfold improvement — and the predictive power of the spread for future short-rate changes is well established in the literature.

Campbell and Shiller (1991) show that the spread between long and short rates predicts future changes in short rates in the direction implied by the expectations hypothesis of the term structure: a steep upward-sloping yield curve signals that the market anticipates rising short rates, while an inverted curve signals anticipated cuts. Their evidence confirms the directional prediction, but they also document that the estimated coefficient on the spread is systematically below one, meaning the market consistently underreacts to yield curve signals — a finding often referred to as a failure of the pure expectations hypothesis. Fama and Bliss (1987) reach a similar conclusion using forward rates: long-maturity forward rates contain information about future spot rates consistent with the expectations hypothesis at longer horizons. Mishkin (1988) also finds that the term spread has predictive content for future interest rate changes across a range of maturities. Taken together, the literature supports the significant and positive coefficient on SPREAD.shift(1) found here.

Campbell, John Y., and Robert J. Shiller. 1991. “Yield Spreads and Interest Rate Movements: A Bird’s Eye View.” Review of Economic Studies 58 (3): 495–514.

Fama, Eugene F., and Robert R. Bliss. 1987. “The Information in Long-Maturity Forward Rates.” American Economic Review 77 (4): 680–92.

Mishkin, Frederic S. 1988. “The Information in the Term Structure: Some Further Results.” Journal of Political Economy 96 (2): 307–28.

Practice Problems

Problem 1 What does it mean that interest rates are persistent?

Solution

Persistence relates to the fact that the time-series changes slowly over time. A scatter plot of a persistent variable \(x_{t}\) vs. \(x_{t-1}\) will look as if there is predictability, although this is just spurious.

Problem 2 Using the estimated \(\hat\alpha\) and \(\hat\beta\) from the simple mean-reversion model on the full sample, compute the implied long-run equilibrium rate \(r^* = -\hat\alpha/\hat\beta\). How does \(r^*\) compare to the sample mean of the short rate? What does any discrepancy tell you about the finite-sample behavior of the model?

Solution

From the regression \(\Delta r^f_{t+1} = \hat\alpha + \hat\beta r^f_t + e_{t+1}\), the equilibrium \(r^*\) is the level at which the expected change is zero: \(\hat\alpha + \hat\beta r^* = 0 \Rightarrow r^* = -\hat\alpha/\hat\beta\). In a Vasicek model this is the unconditional mean to which the process reverts. In finite samples, \(r^*\) will not exactly equal the sample mean because OLS minimises squared residuals rather than matching moments; both estimates are also subject to sampling error. A large gap between \(r^*\) and the sample mean suggests either that the sample period contains a persistent deviation from the long-run equilibrium (e.g. the prolonged low-rate era post-2008) or that the linear mean-reversion model is only an approximation of the true dynamics.