import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()The GARCH Model
Modeling Conditional Volatility
Even though the rolling window approach to estimate volatility is easy to understand and implement, it has some important drawbacks:
- it gives the same weights to all observations in the window
- it does not use all the information in the sample to estimate the volatility at any point in time
The idea of a GARCH model is to impose some additional structure to the evolution of volatility over time. Engle (1982) introduced the ARCH model to capture time-varying volatility in inflation, and Bollerslev (1986) extended it by allowing the conditional variance to depend on its own lags, giving rise to the GARCH model. One of the simplest GARCH models has the following structure: \[ \begin{align*} r_{t} & = \mu + \sigma_{t} \varepsilon_{t} \\ \sigma_{t}^{2} & = (1 - \alpha - \beta) \sigma_{L}^{2} + \alpha r_{t-1}^{2} + \beta \sigma_{t-1}^{2}, \\ \end{align*} \] where \(\varepsilon_{t} \sim N(0, 1)\) and \(\{\varepsilon_{t}\}_{t \geq 0}\) is an iid sequence of random variables and \(\sigma_{t}^{2}\) denotes the conditional variance. This model is called a GARCH(1,1) model because it uses one lagged squared return and one previous estimate of the variance.
The variance expression says that the conditional variance weighs three components:
- The long run variance denoted by \(\sigma_{L}^{2}\)
- The last squared return \(r_{t-1}^{2}\)
- The previous variance estimate \(\sigma_{t-1}^{2}\)
The model recursively estimates the variance by updating the previous estimate with the latest squared return and by giving a certain weight to the long-run variance of the model.
In the estimation, it is usual to make the following re-parametrization \(\omega = (1 - \alpha - \beta) \sigma_{L}^{2}\) so the model for the variance becomes \[ \sigma_{t}^{2} = \omega + \alpha r_{t-1}^{2} + \beta \sigma_{t-1}^{2}. \]
Parameter Estimation
The GARCH model is usually estimated by maximizing the likelihood of the model innovations given the observed data. The i.i.d. assumption on the residuals implies that the full likelihood function is given by \[ L = \prod_{t=1}^{n} \frac{1}{\sqrt{2 \pi \sigma_{t}^{2}}} \exp\left(-\frac{1}{2} \left(\frac{r_{t} - \mu}{\sigma_{t}}\right)^{2}\right). \] Maximizing the likelihood function with respect to the model parameters is equivalent to maximizing the log-likelihood given by \[ \ln(L) = -\frac{n}{2} \ln(2 \pi) - \frac{1}{2} \sum_{t=1}^{n} \left( \ln(\sigma_{t}^{2}) + \left(\frac{r_{t} - \mu}{\sigma_{t}}\right)^{2} \right). \] Note that maximizing the above quantity is equivalent to minimizing the following sum with respect to \(\omega,\) \(\alpha,\) and \(\beta,\) \[ S = \sum_{t=1}^{n} \ln(\sigma_{t}^{2}) + \left(\frac{r_{t} - \mu}{\sigma_{t}}\right)^{2}. \]
Therefore, estimating a GARCH model by maximum likelihood is similar to OLS except that here we minimize a different sum. Unlike OLS, though, there is no closed-form solution for this minimization and an optimization library is needed to obtain the parameter estimates.
Loading the Data
start_date = '1993-01-01'
sp500 = yf.download('^GSPC', start=start_date, auto_adjust=False, progress=False, multi_level_index=False)
returns = 100 * sp500['Close'].pct_change().dropna()Estimating the GARCH Model
Luckily for us, there is a Python library called arch that estimates the GARCH model in just one simple step. The arch library can be installed by typing in a terminal
pip install arch --upgradeor
conda install archdepending on which platform you are working on. The GitHub repository of the arch library is https://github.com/bashtage/arch and documentation for the library can be found at https://bashtage.github.io/arch/.
The easiest way to specify a model is by using the model constructor arch_model. The model is then estimated using fit. Setting update_freq=0 avoids displaying all iterations and only shows the optimization results.
from arch import arch_model
am = arch_model(returns)
res = am.fit(update_freq=0)Optimization terminated successfully (Exit mode 0)
Current function value: 11152.623589544473
Iterations: 14
Function evaluations: 84
Gradient evaluations: 14
Once the model is estimated we can display the estimation results as follows.
print(res.summary()) Constant Mean - GARCH Model Results
==============================================================================
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -11152.6
Distribution: Normal AIC: 22313.2
Method: Maximum Likelihood BIC: 22341.4
No. Observations: 8345
Date: Sun, Mar 01 2026 Df Residuals: 8344
Time: 11:42:48 Df Model: 1
Mean Model
============================================================================
coef std err t P>|t| 95.0% Conf. Int.
----------------------------------------------------------------------------
mu 0.0678 8.663e-03 7.831 4.837e-15 [5.086e-02,8.482e-02]
Volatility Model
============================================================================
coef std err t P>|t| 95.0% Conf. Int.
----------------------------------------------------------------------------
omega 0.0202 3.857e-03 5.232 1.677e-07 [1.262e-02,2.774e-02]
alpha[1] 0.1122 1.129e-02 9.938 2.850e-23 [9.008e-02, 0.134]
beta[1] 0.8721 1.199e-02 72.743 0.000 [ 0.849, 0.896]
============================================================================
Covariance estimator: robust
Model coefficients are stored in res.params and can be retrieved by their names. For example, res.params['omega'] contains the numerical estimate for \(\omega.\)
Note that the daily long-run standard deviation is given by: \[ \sigma_{L} = \sqrt{\frac{\omega}{1 - \alpha - \beta}} \] which can be annualized by multiplying by \(\sqrt{252}.\)
np.sqrt(252*res.params['omega']/(1-res.params['alpha[1]']-res.params['beta[1]'])).round(2)np.float64(17.99)
It is interesting to see that this estimate is extremely close to the constant annualized standard deviation of returns.
round(np.sqrt(252)*returns.std(), 2)np.float64(18.38)
More information about the arch library and GARCH type models can be found here.
Plotting the Estimated Volatility of Returns
It is also possible to quickly plot the standardized residuals defined as: \[
\varepsilon_{t} = \frac{r_{t} - \mu}{\sigma_{t}},
\] and the annualized conditional volatility using plot.
res.plot(annualize="D")
plt.show()The conditional volatility is stored in res.conditional_volatility. In the figure below I plot the annualized conditional volatility and compare it with the VIX.
The VIX is the CBOE Volatility Index, a measure of the market’s expectation of S&P 500 volatility over the next 30 calendar days (~21 trading days), derived from the prices of S&P 500 options and expressed as an annualized percentage. Because it is backed out from option prices, it reflects not just the market’s forecast of realized volatility but also the price investors are willing to pay to hedge against volatility — which is why it tends to exceed realized volatility on average.
vix = yf.download('^VIX', start=start_date, auto_adjust=False, progress=False, multi_level_index=False).loc[:, 'Close']
plt.plot(res.conditional_volatility*np.sqrt(252), label='GARCH Volatility')
plt.plot(vix, label='VIX')
plt.title('VIX vs. GARCH Conditional Volatility')
plt.legend()
plt.show()It is interesting to see that the annualized conditional volatility tracks the VIX quite closely. The difference between the VIX and the expected realized volatility over the next 21 trading days is the volatility premium.
The VIX measures the market’s expectation of realized variance over the next 30 calendar days (~21 trading days), annualized. A proper comparison therefore requires the GARCH-implied expected realized variance over the same horizon, not just the spot conditional variance. For a GARCH(1,1) model with persistence \(p = \alpha + \beta\) and long-run variance \(\sigma_L^2 = \omega/(1-\alpha-\beta)\), the \(h\)-step ahead variance forecast is \[ E_t[\sigma_{t+h}^2] = \sigma_L^2 + p^{h-1}(\sigma_{t+1}^2 - \sigma_L^2), \] so the cumulated \(H\)-day expected variance is \[ \sum_{h=1}^{H} E_t[\sigma_{t+h}^2] = H \sigma_L^2 + (\sigma_{t+1}^2 - \sigma_L^2) \frac{1 - p^H}{1 - p}. \] Annualizing gives the expected volatility to compare against the VIX.
omega = res.params['omega']
alpha = res.params['alpha[1]']
beta = res.params['beta[1]']
sigma_L2 = omega / (1 - alpha - beta)
p = alpha + beta
H = 21
cv_sq = omega + alpha * returns**2 + beta * res.conditional_volatility**2
geo_sum = (1 - p**H) / (1 - p)
cum_var_forecast = H * sigma_L2 + (cv_sq - sigma_L2) * geo_sum
expected_vol = np.sqrt(252 / H * cum_var_forecast)
plt.plot(vix - expected_vol)
plt.title('Volatility Premium')
plt.show()The volatility premium, also called the variance risk premium, is the compensation investors demand for bearing volatility risk. Because investors pay above the expected cost of hedging to insure against sharp volatility spikes, the premium is typically positive. Bollerslev et al. (2009) show that it predicts stock returns at quarterly horizons, though the result is sensitive to how expected realized variance is measured — it holds when using high-frequency realized variance but not with cruder proxies such as GARCH conditional variance. Drechsler and Yaron (2011) provide a theoretical foundation within a long-run risks framework.
Practice Problems
Problem 1 Generate a plot of the GARCH volatility for the NASDAQ Composite (^IXIC) over the same period.
Solution
start_date = '1993-01-01'
sp500 = yf.download('^IXIC', start=start_date, auto_adjust=False, progress=False, multi_level_index=False)
returns = 100 * sp500['Close'].pct_change().dropna()
am = arch_model(returns)
res = am.fit(update_freq=0)
plt.plot(res.conditional_volatility*np.sqrt(252), label='GARCH Volatility')
plt.title('Nasdaq Volatility')
plt.legend()
plt.show()Optimization terminated successfully (Exit mode 0)
Current function value: 13304.219737363164
Iterations: 12
Function evaluations: 72
Gradient evaluations: 12