Mean-Variance Optimization with Multiple Assets

Extending the Analysis to N-Risky Assets

In an earlier notebook we derived the investment opportunity set with two risky assets. We now extend this to any number of risky assets and add a risk-free asset.

In $(\mu, \sigma)$ space, the investment opportunity set spanned by the risk-free asset and any risky portfolio is a CAL with intercept $r_f$ and slope equal to the portfolio’s Sharpe ratio. Any rational risk-averse investor therefore prefers the CAL with the maximum Sharpe ratio.

This is a classical problem in the investments literature, originating with Markowitz (1952). The typical solution requires estimating the full $N \times N$ covariance matrix of asset returns and then inverting it — a step that involves matrix calculus and can be hard to follow if you are not fluent in linear algebra. We avoid the matrix inversion by pre-computing the covariance matrix and letting a numerical optimizer evaluate the Sharpe ratio at each candidate weight vector.

Markowitz, Harry. 1952. “Portfolio Selection.” Journal of Finance 7 (1): 77–91.

Loading the Libraries

We load the same libraries as before, plus scipy. From scipy.optimize we import minimize, which we will use to search for the portfolio with the highest Sharpe ratio.

import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.optimize import minimize

sns.set_theme()

Loading the Data

We use seven popular sector ETFs as candidates for the portfolio.

Ticker	Name
XLF	Financial Select Sector SPDR Fund
XLE	Energy Select Sector SPDR Fund
XLU	Utilities Select Sector SPDR Fund
XLI	Industrial Select Sector SPDR Fund
XLP	Consumer Staples Select Sector SPDR Fund
XLV	Health Care Select Sector SPDR Fund
XLK	Technology Select Sector SPDR Fund

These funds were selected to provide a representative cross-section of sectors. You could change the list and the time-frame as you like.

The training window spans twenty years (December 2004 – December 2024). The long horizon is deliberate: average returns are notoriously difficult to estimate precisely. A standard result in the empirical finance literature is that, even with several decades of data, the standard error on an annualized mean return is large enough to make the estimate nearly uninformative on its own. Using a short window would make the estimated $\boldsymbol{\mu}$ so noisy that the optimizer would be fitting to luck rather than to any genuine signal in expected returns.

start_date = '2004-12-01'
end_date = '2025-01-01'

tickers = ['XLF', 'XLE', 'XLU', 'XLI', 'XLP', 'XLV', 'XLK']

ret = (yf
      .download(tickers, start=start_date, end=end_date, auto_adjust=False, progress=False)['Adj Close']
      .resample('ME')
      .last()
      .pct_change()
      .dropna()
      )

Computing the Sharpe Ratio

A portfolio of $N$ risky assets with weights $w_1, \ldots, w_N$ (summing to one) earns return $r_P = w_1 r_1 + \cdots + w_N r_N.$ Its expected return and standard deviation are: \[ \mu_P = \mathbf{w}^\top \boldsymbol{\mu}, \qquad \sigma_P = \sqrt{12\,\mathbf{w}^\top \Sigma \mathbf{w}}, \] where $\boldsymbol{\mu}$ collects annualized per-asset expected returns, $\Sigma$ is the covariance matrix of monthly returns, and $\sqrt{12}$ scales the standard deviation to an annual basis. Both $\mu_P$ and $\sigma_P$ depend on the data only through $\boldsymbol{\mu}$ and $\Sigma$, so we pre-compute those two objects once and let the optimizer search over $\mathbf{w}$.

We proxy $r_f$ with the 13-week Treasury Bill yield (^IRX from Yahoo Finance), reported as an annualized percentage, averaged over the training window and divided by 100 to convert to a decimal. $\boldsymbol{\mu}$ is the vector of sample monthly means scaled by 12 and $\Sigma$ is the sample covariance matrix of monthly returns.

irx = (yf
    .download(['^IRX'], start=start_date, end=end_date, auto_adjust=False,
               progress=False, multi_level_index=False)['Adj Close']
    .dropna())
r_f = irx.mean() / 100

cov = ret.cov()
mu  = ret.mean() * 12

The Sharpe ratio is $\mathit{SR} = (\mu_P - r_f)/\sigma_P.$ All three quantities are on an annual scale: mu is the vector of monthly means scaled by 12, r_f is an annualized decimal rate, and the factor of 12 inside the square root scales the monthly portfolio variance $\mathbf{w}^\top \Sigma \mathbf{w}$ to an annual scale.

def Sharpe(w, cov, mu, r_f):
    mu_P    = w @ mu
    sigma_P = np.sqrt(12 * w @ cov @ w)
    return (mu_P - r_f) / sigma_P

Finding the Optimal Portfolio

The portfolio with the maximum Sharpe ratio is found by minimizing -Sharpe. We consider two scenarios: one with no restrictions on individual weights and one that rules out short positions entirely. When bounds are imposed, bounds is a list of (min, max) pairs — one per asset — that tells the optimizer the allowable range for each weight.

Both scenarios share a common equality constraint: the weights must sum to one. We define it once here and reuse it in each call to minimize.

n = len(tickers)
x0 = np.ones(n) / n
cons = {'type': 'eq', 'fun': lambda w: w.sum() - 1}

Why We Restrict Weights

Unconstrained mean-variance optimization has a well-known practical weakness: extreme sensitivity to its inputs. Small errors in estimated expected returns get amplified into large, offsetting long and short positions — a phenomenon Michaud (1989) called error maximization. Because expected returns and covariances must be estimated from finite samples, they are inevitably noisy. The optimizer treats genuine signal and statistical artifacts identically, so the resulting weights can look extreme even when the underlying data are fairly ordinary.

Michaud, Richard O. 1989. “The Markowitz Optimization Enigma: Is ‘Optimized’ Optimal?” Financial Analysts Journal 45 (1): 31–42.

Jagannathan, Ravi, and Tongshu Ma. 2003. “Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps.” Journal of Finance 58 (4): 1651–83.

Weight constraints are the standard remedy. By limiting how large any position can be, we prevent the optimizer from over-expressing its confidence in noisy signals. In this sense constraints act as a regularizer, much like ridge regression in statistics: they accept a small loss of in-sample optimality in exchange for a much more stable, diversified portfolio that is less likely to blow up when estimates turn out to be wrong. Jagannathan and Ma (2003) formalize this intuition: they show that imposing the short-sale constraint is mathematically equivalent to shrinking extreme elements of the sample covariance matrix toward zero, which explains why the constraint improves out-of-sample performance even in cases where the true optimal portfolio requires some shorting. There are also practical reasons to impose constraints: many institutional investors are legally prohibited from shorting, short positions carry borrowing costs and margin requirements, and unconstrained portfolios tend to generate high turnover because small changes in inputs can produce large swings in optimal weights.

Scenario 1 — unconstrained: individual weights are unrestricted, so the optimizer is free to build large long and short positions.

res_unc = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
                   constraints=cons)

weights_unc = 100*pd.DataFrame({'Weight (%)': res_unc.x}, index=tickers).round(4)
weights_unc.index.name = 'Ticker'
weights_unc

	Weight (%)
Ticker
XLF	4.06
XLE	-34.67
XLU	-13.04
XLI	59.99
XLP	39.91
XLV	13.49
XLK	30.26

Scenario 2 — no short-selling: each weight is bounded to $[0,\, 1]$, ruling out short positions entirely.

bounds_long = [(0, 1)] * n

res_long = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
                    constraints=cons, bounds=bounds_long)

weights_long = 100*pd.DataFrame({'Weight (%)': res_long.x}, index=tickers).round(4)
weights_long.index.name = 'Ticker'
weights_long

	Weight (%)
Ticker
XLF	0.00
XLE	0.00
XLU	0.00
XLI	43.74
XLP	21.59
XLV	22.28
XLK	12.39

Out-of-Sample Performance

We now compare both portfolios against SPY over a period entirely outside the training window. This lets us see whether imposing the short-sale constraint — moving from unconstrained to long-only — helps or hurts realized performance.

test_start = '2025-01-01'
test_end = '2026-02-26'

ret_test = (yf
    .download(tickers, start=test_start, end=test_end, auto_adjust=False, progress=False)['Adj Close']
    .pct_change()
    .dropna()
    )

spy_test = (yf
    .download(['SPY'], start=test_start, end=test_end, auto_adjust=False, progress=False, multi_level_index=False)['Adj Close']
    .pct_change()
    .dropna()
    )

irx_test = (yf
    .download(['^IRX'], start=test_start, end=test_end, auto_adjust=False,
               progress=False, multi_level_index=False)['Adj Close']
    .dropna())
r_f_test = irx_test.mean() / 100

We use daily (not monthly) test returns to get finer resolution on the performance path. We apply each set of fixed weights to each day’s return and chain the results into cumulative growth of $\$1.$

w_ew = np.ones(n) / n

cum_unc = (1 + (ret_test[tickers] * res_unc.x).sum(axis=1)).cumprod()
cum_long  = (1 + (ret_test[tickers] * res_long.x).sum(axis=1)).cumprod()
cum_ew    = (1 + (ret_test[tickers] * w_ew).sum(axis=1)).cumprod()
cum_spy   = (1 + spy_test).cumprod()

fig, ax = plt.subplots()
cum_unc.plot(ax=ax, label='Unconstrained')
cum_long.plot(ax=ax, label='Long-Only')
cum_ew.plot(ax=ax, label='Equal-Weight (1/N)')
cum_spy.plot(ax=ax, label='SPY')
ax.set_title('Out-of-Sample Performance: Jan 2025 – Feb 2026')
ax.set_ylabel('Growth of $1')
ax.legend()
plt.show()

Since the test returns are daily, we scale the mean by 252 and the standard deviation by $\sqrt{252}$ to put both on an annual basis.

def oos_stats(daily_rets, rf):
    ann_ret = daily_rets.mean() * 252
    sigma   = daily_rets.std() * np.sqrt(252)
    sr      = (ann_ret - rf) / sigma
    tot     = (1 + daily_rets).prod() - 1
    return {'Ann. Return': f'{ann_ret:.1%}', 'Ann. Vol': f'{sigma:.1%}',
            'Sharpe': f'{sr:.2f}', 'Total Return': f'{tot:.1%}'}

rets = {
    'Unconstrained': (ret_test[tickers] * res_unc.x).sum(axis=1),
    'Long-Only':     (ret_test[tickers] * res_long.x).sum(axis=1),
    'Equal-Weight':  (ret_test[tickers] * w_ew).sum(axis=1),
    'SPY':           spy_test,
}

pd.DataFrame({k: oos_stats(v, r_f_test) for k, v in rets.items()}).T

	Ann. Return	Ann. Vol	Sharpe	Total Return
Unconstrained	20.0%	17.7%	0.90	23.2%
Long-Only	21.8%	14.7%	1.21	26.5%
Equal-Weight	20.0%	14.5%	1.10	24.0%
SPY	17.8%	18.7%	0.74	20.0%

Interpreting the Results

The test window is the honest report card: weights are fixed at their in-sample values and applied to returns the optimizer never saw. Both portfolios were optimized on the training sample, so in-sample performance tells us little. The real question is whether the constraints imposed for stability actually help when markets do not cooperate with our estimates.

In our test window the long-only portfolio outperforms the unconstrained one, consistent with the weight-constraint intuition: the unconstrained optimizer entered the test period holding large leveraged positions built on estimated parameters, and when conditions shifted, those concentrated bets weighed on performance. Two caveats apply. First, fourteen months is a short window — standard errors on Sharpe ratios over such a horizon are large enough that the ranking could easily reverse in another period. Second, with a different sector universe or a different training cutoff the result can go the other way. DeMiguel et al. (2009) make this precise: they evaluate fourteen mean-variance strategies across multiple datasets and find that none consistently beats the naive $1/N$ portfolio on a risk-adjusted basis. That finding holds on average across many settings, not in any single test window. We include $1/N$ precisely because it is such a robust benchmark — though equal-weighting requires periodic rebalancing as prices drift, generating turnover and transaction costs that SPY, being market-cap weighted, largely avoids.

DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. 2009. “Optimal Versus Naive Diversification: How Inefficient Is the $1/N$ Portfolio Strategy?” Review of Financial Studies 22 (5): 1915–53.

SPY is a useful reality check. Passive index funds set a high bar: they require no estimation, incur minimal costs, and capture the long-run equity premium without the risk of misspecification. The difficulty of beating a passive index out of sample is one of the strongest empirical arguments for low-cost passive investing as the default recommendation for most investors.

Practice Problems

Problem 1 Compute the in-sample Sharpe ratio of the equal-weight portfolio and compare it to the unconstrained and long-only optimized portfolios. Which portfolio has the highest in-sample Sharpe ratio, and why is this ranking not surprising?

Solution

w_ew = np.ones(n) / n

pd.DataFrame(
    {'In-Sample Sharpe': {
        'Unconstrained': Sharpe(res_unc.x,  cov, mu, r_f),
        'Long-Only':     Sharpe(res_long.x, cov, mu, r_f),
        'Equal-Weight':  Sharpe(w_ew,       cov, mu, r_f),
    }}
).round(4)

	In-Sample Sharpe
Unconstrained	0.9074
Long-Only	0.8111
Equal-Weight	0.6286

The unconstrained portfolio has the highest in-sample Sharpe ratio — by construction, since that is exactly what was optimized. The long-only portfolio is slightly lower because the short-sale constraint prevents it from reaching the global optimum. Equal-weight has the lowest Sharpe ratio. This ranking is expected and uninformative: in-sample, the optimizer always wins. It was tuned to the training data and is effectively fitting to historical noise as much as to genuine signal. This is precisely why out-of-sample evaluation is necessary — training performance does not predict test performance.

Problem 2 Re-run the long-only optimization with a stricter diversification constraint: no single ETF can receive more than 30% of the portfolio. Which ETFs hit the 30% ceiling, and how do the resulting weights compare to the unconstrained long-only portfolio?

Solution

bounds_cap = [(0, 0.30)] * n

res_cap = minimize(lambda w: -Sharpe(w, cov, mu, r_f), x0,
                   constraints=cons, bounds=bounds_cap)

weights_cap = 100 * pd.DataFrame({'Weight (%)': res_cap.x}, index=tickers).round(4)
weights_cap.index.name = 'Ticker'
weights_cap

	Weight (%)
Ticker
XLF	0.00
XLE	0.00
XLU	0.00
XLI	30.00
XLP	29.00
XLV	21.96
XLK	19.04

Any ETF whose weight in the long-only solution (from weights_long) exceeded 30% is now capped at 30%, with the freed-up weight redistributed to other assets by the optimizer. The 30% ceiling forces a more diversified allocation and prevents any single sector from dominating the portfolio — at the cost of a somewhat lower in-sample Sharpe ratio compared to the unconstrained long-only solution.