import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.optimize import minimize
sns.set_theme()Mean-Variance Optimization with Multiple Assets
Extending the Analysis to N-Risky Assets
In an earlier notebook we derived the investment opportunity set with two risky assets. We now extend this to any number of risky assets and add a risk-free asset.
In \((\mu, \sigma)\) space, the investment opportunity set spanned by the risk-free asset and any risky portfolio is a CAL with intercept \(r_f\) and slope equal to the portfolio’s Sharpe ratio. Any rational risk-averse investor therefore prefers the CAL with the maximum Sharpe ratio.
This is a classical problem in the investments literature, originating with Markowitz (1952). The typical solution requires estimating the full \(N \times N\) covariance matrix of asset returns and then inverting it — a step that involves matrix calculus and can be hard to follow if you are not fluent in linear algebra. We avoid the matrix inversion by pre-computing the covariance matrix and letting a numerical optimizer evaluate the Sharpe ratio at each candidate weight vector.
Loading the Libraries
We load the same libraries as before, plus scipy. From scipy.optimize we import minimize, which we will use to search for the portfolio with the highest Sharpe ratio.
Loading the Data
We use seven popular sector ETFs as candidates for the portfolio.
| Ticker | Name |
|---|---|
| XLF | Financial Select Sector SPDR Fund |
| XLE | Energy Select Sector SPDR Fund |
| XLU | Utilities Select Sector SPDR Fund |
| XLI | Industrial Select Sector SPDR Fund |
| XLP | Consumer Staples Select Sector SPDR Fund |
| XLV | Health Care Select Sector SPDR Fund |
| XLK | Technology Select Sector SPDR Fund |
These funds were selected to provide a representative cross-section of sectors. You could change the list and the time-frame as you like.
The training window spans twenty years (December 2004 – December 2024). The long horizon is deliberate: average returns are notoriously difficult to estimate precisely. A standard result in the empirical finance literature is that, even with several decades of data, the standard error on an annualized mean return is large enough to make the estimate nearly uninformative on its own. Using a short window would make the estimated \(\boldsymbol{\mu}\) so noisy that the optimizer would be fitting to luck rather than to any genuine signal in expected returns.
start_date = '2004-12-01'
end_date = '2025-01-01'
tickers = ['XLF', 'XLE', 'XLU', 'XLI', 'XLP', 'XLV', 'XLK']
ret = (yf
.download(tickers, start=start_date, end=end_date, auto_adjust=False, progress=False)['Adj Close']
.resample('ME')
.last()
.pct_change()
.dropna()
)Computing the Sharpe Ratio
A portfolio of \(N\) risky assets with weights \(w_1, \ldots, w_N\) (summing to one) earns return \(r_P = w_1 r_1 + \cdots + w_N r_N.\) Its expected return and standard deviation are: \[ \mu_P = \mathbf{w}^\top \boldsymbol{\mu}, \qquad \sigma_P = \sqrt{12\,\mathbf{w}^\top \Sigma \mathbf{w}}, \] where \(\boldsymbol{\mu}\) collects annualized per-asset expected returns, \(\Sigma\) is the covariance matrix of monthly returns, and \(\sqrt{12}\) scales the standard deviation to an annual basis. Both \(\mu_P\) and \(\sigma_P\) depend on the data only through \(\boldsymbol{\mu}\) and \(\Sigma\), so we pre-compute those two objects once and let the optimizer search over \(\mathbf{w}\).
We proxy \(r_f\) with the 13-week Treasury Bill yield (^IRX from Yahoo Finance), reported as an annualized percentage, averaged over the training window and divided by 100 to convert to a decimal. \(\boldsymbol{\mu}\) is the vector of sample monthly means scaled by 12 and \(\Sigma\) is the sample covariance matrix of monthly returns.
irx = (yf
.download(['^IRX'], start=start_date, end=end_date, auto_adjust=False,
progress=False, multi_level_index=False)['Adj Close']
.dropna())
r_f = irx.mean() / 100
cov = ret.cov()
mu = ret.mean() * 12The Sharpe ratio is \(\mathit{SR} = (\mu_P - r_f)/\sigma_P.\) All three quantities are on an annual scale: mu is the vector of monthly means scaled by 12, r_f is an annualized decimal rate, and the factor of 12 inside the square root scales the monthly portfolio variance \(\mathbf{w}^\top \Sigma \mathbf{w}\) to an annual scale.
def Sharpe(w, cov, mu, r_f):
mu_P = w @ mu
sigma_P = np.sqrt(12 * w @ cov @ w)
return (mu_P - r_f) / sigma_PFinding the Optimal Portfolio
The portfolio with the maximum Sharpe ratio is found by minimizing -Sharpe. We consider two scenarios: one with no restrictions on individual weights and one that rules out short positions entirely. When bounds are imposed, bounds is a list of (min, max) pairs — one per asset — that tells the optimizer the allowable range for each weight.
Both scenarios share a common equality constraint: the weights must sum to one. We define it once here and reuse it in each call to minimize.
n = len(tickers)
x0 = np.ones(n) / n
cons = {'type': 'eq', 'fun': lambda w: w.sum() - 1}Why We Restrict Weights
Unconstrained mean-variance optimization has a well-known practical weakness: extreme sensitivity to its inputs. Small errors in estimated expected returns get amplified into large, offsetting long and short positions — a phenomenon Michaud (1989) called error maximization. Because expected returns and covariances must be estimated from finite samples, they are inevitably noisy. The optimizer treats genuine signal and statistical artifacts identically, so the resulting weights can look extreme even when the underlying data are fairly ordinary.
Weight constraints are the standard remedy. By limiting how large any position can be, we prevent the optimizer from over-expressing its confidence in noisy signals. In this sense constraints act as a regularizer, much like ridge regression in statistics: they accept a small loss of in-sample optimality in exchange for a much more stable, diversified portfolio that is less likely to blow up when estimates turn out to be wrong. Jagannathan and Ma (2003) formalize this intuition: they show that imposing the short-sale constraint is mathematically equivalent to shrinking extreme elements of the sample covariance matrix toward zero, which explains why the constraint improves out-of-sample performance even in cases where the true optimal portfolio requires some shorting. There are also practical reasons to impose constraints: many institutional investors are legally prohibited from shorting, short positions carry borrowing costs and margin requirements, and unconstrained portfolios tend to generate high turnover because small changes in inputs can produce large swings in optimal weights.
Scenario 1 — unconstrained: individual weights are unrestricted, so the optimizer is free to build large long and short positions.
res_unc = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
constraints=cons)
weights_unc = 100*pd.DataFrame({'Weight (%)': res_unc.x}, index=tickers).round(4)
weights_unc.index.name = 'Ticker'
weights_unc| Weight (%) | |
|---|---|
| Ticker | |
| XLF | 4.06 |
| XLE | -34.67 |
| XLU | -13.04 |
| XLI | 59.99 |
| XLP | 39.91 |
| XLV | 13.49 |
| XLK | 30.26 |
Scenario 2 — no short-selling: each weight is bounded to \([0,\, 1]\), ruling out short positions entirely.
bounds_long = [(0, 1)] * n
res_long = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
constraints=cons, bounds=bounds_long)
weights_long = 100*pd.DataFrame({'Weight (%)': res_long.x}, index=tickers).round(4)
weights_long.index.name = 'Ticker'
weights_long| Weight (%) | |
|---|---|
| Ticker | |
| XLF | 0.00 |
| XLE | 0.00 |
| XLU | 0.00 |
| XLI | 43.74 |
| XLP | 21.59 |
| XLV | 22.28 |
| XLK | 12.39 |
Out-of-Sample Performance
We now compare both portfolios against SPY over a period entirely outside the training window. This lets us see whether imposing the short-sale constraint — moving from unconstrained to long-only — helps or hurts realized performance.
test_start = '2025-01-01'
test_end = '2026-02-26'
ret_test = (yf
.download(tickers, start=test_start, end=test_end, auto_adjust=False, progress=False)['Adj Close']
.pct_change()
.dropna()
)
spy_test = (yf
.download(['SPY'], start=test_start, end=test_end, auto_adjust=False, progress=False, multi_level_index=False)['Adj Close']
.pct_change()
.dropna()
)
irx_test = (yf
.download(['^IRX'], start=test_start, end=test_end, auto_adjust=False,
progress=False, multi_level_index=False)['Adj Close']
.dropna())
r_f_test = irx_test.mean() / 100We use daily (not monthly) test returns to get finer resolution on the performance path. We apply each set of fixed weights to each day’s return and chain the results into cumulative growth of \(\$1.\)
w_ew = np.ones(n) / n
cum_unc = (1 + (ret_test[tickers] * res_unc.x).sum(axis=1)).cumprod()
cum_long = (1 + (ret_test[tickers] * res_long.x).sum(axis=1)).cumprod()
cum_ew = (1 + (ret_test[tickers] * w_ew).sum(axis=1)).cumprod()
cum_spy = (1 + spy_test).cumprod()
fig, ax = plt.subplots()
cum_unc.plot(ax=ax, label='Unconstrained')
cum_long.plot(ax=ax, label='Long-Only')
cum_ew.plot(ax=ax, label='Equal-Weight (1/N)')
cum_spy.plot(ax=ax, label='SPY')
ax.set_title('Out-of-Sample Performance: Jan 2025 – Feb 2026')
ax.set_ylabel('Growth of $1')
ax.legend()
plt.show()Since the test returns are daily, we scale the mean by 252 and the standard deviation by \(\sqrt{252}\) to put both on an annual basis.
def oos_stats(daily_rets, rf):
ann_ret = daily_rets.mean() * 252
sigma = daily_rets.std() * np.sqrt(252)
sr = (ann_ret - rf) / sigma
tot = (1 + daily_rets).prod() - 1
return {'Ann. Return': f'{ann_ret:.1%}', 'Ann. Vol': f'{sigma:.1%}',
'Sharpe': f'{sr:.2f}', 'Total Return': f'{tot:.1%}'}
rets = {
'Unconstrained': (ret_test[tickers] * res_unc.x).sum(axis=1),
'Long-Only': (ret_test[tickers] * res_long.x).sum(axis=1),
'Equal-Weight': (ret_test[tickers] * w_ew).sum(axis=1),
'SPY': spy_test,
}
pd.DataFrame({k: oos_stats(v, r_f_test) for k, v in rets.items()}).T| Ann. Return | Ann. Vol | Sharpe | Total Return | |
|---|---|---|---|---|
| Unconstrained | 20.0% | 17.7% | 0.90 | 23.2% |
| Long-Only | 21.8% | 14.7% | 1.21 | 26.5% |
| Equal-Weight | 20.0% | 14.5% | 1.10 | 24.0% |
| SPY | 17.8% | 18.7% | 0.74 | 20.0% |
Interpreting the Results
The test window is the honest report card: weights are fixed at their in-sample values and applied to returns the optimizer never saw. Both portfolios were optimized on the training sample, so in-sample performance tells us little. The real question is whether the constraints imposed for stability actually help when markets do not cooperate with our estimates.
In our test window the long-only portfolio outperforms the unconstrained one, consistent with the weight-constraint intuition: the unconstrained optimizer entered the test period holding large leveraged positions built on estimated parameters, and when conditions shifted, those concentrated bets weighed on performance. Two caveats apply. First, fourteen months is a short window — standard errors on Sharpe ratios over such a horizon are large enough that the ranking could easily reverse in another period. Second, with a different sector universe or a different training cutoff the result can go the other way. DeMiguel et al. (2009) make this precise: they evaluate fourteen mean-variance strategies across multiple datasets and find that none consistently beats the naive \(1/N\) portfolio on a risk-adjusted basis. That finding holds on average across many settings, not in any single test window. We include \(1/N\) precisely because it is such a robust benchmark — though equal-weighting requires periodic rebalancing as prices drift, generating turnover and transaction costs that SPY, being market-cap weighted, largely avoids.
SPY is a useful reality check. Passive index funds set a high bar: they require no estimation, incur minimal costs, and capture the long-run equity premium without the risk of misspecification. The difficulty of beating a passive index out of sample is one of the strongest empirical arguments for low-cost passive investing as the default recommendation for most investors.
Practice Problems
Problem 1 Compute the in-sample Sharpe ratio of the equal-weight portfolio and compare it to the unconstrained and long-only optimized portfolios. Which portfolio has the highest in-sample Sharpe ratio, and why is this ranking not surprising?
Solution
w_ew = np.ones(n) / n
pd.DataFrame(
{'In-Sample Sharpe': {
'Unconstrained': Sharpe(res_unc.x, cov, mu, r_f),
'Long-Only': Sharpe(res_long.x, cov, mu, r_f),
'Equal-Weight': Sharpe(w_ew, cov, mu, r_f),
}}
).round(4)| In-Sample Sharpe | |
|---|---|
| Unconstrained | 0.9074 |
| Long-Only | 0.8111 |
| Equal-Weight | 0.6286 |
Problem 2 Re-run the long-only optimization with a stricter diversification constraint: no single ETF can receive more than 30% of the portfolio. Which ETFs hit the 30% ceiling, and how do the resulting weights compare to the unconstrained long-only portfolio?
Solution
bounds_cap = [(0, 0.30)] * n
res_cap = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
constraints=cons, bounds=bounds_cap)
weights_cap = 100 * pd.DataFrame({'Weight (%)': res_cap.x}, index=tickers).round(4)
weights_cap.index.name = 'Ticker'
weights_cap| Weight (%) | |
|---|---|
| Ticker | |
| XLF | 0.00 |
| XLE | 0.00 |
| XLU | 0.00 |
| XLI | 30.00 |
| XLP | 29.00 |
| XLV | 21.96 |
| XLK | 19.04 |
weights_long) exceeded 30% is now capped at 30%, with the freed-up weight redistributed to other assets by the optimizer. The 30% ceiling forces a more diversified allocation and prevents any single sector from dominating the portfolio — at the cost of a somewhat lower in-sample Sharpe ratio compared to the unconstrained long-only solution.