Downloading Financial Data

The Package yfinance

We’ll use the yfinance package to download stock price data from Yahoo! Finance. This Python library provides a simple interface for retrieving historical market data for stocks, indices, and other financial instruments. Before using it, you’ll need to install it first.

Anaconda

If you are using Anaconda you need to install yfinance first before you use it for the first time. In an Anaconda command prompt type:

pip install yfinance --upgrade

or in a Jupyter cell you can type:

!pip install yfinance --upgrade --quiet

Google Colab

In Google Colab, yfinance is pre-installed. If you ever need to upgrade to the latest version run this in a code cell:

!pip install yfinance --upgrade --quiet

Importing the Libraries

You can import the libraries as follows.

import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf

sns.set_theme()

Data

In order to estimate asset pricing models, we need to get data on stock returns, for both a particular security and the market. An easy way to do the is to download this data from the web. We start by defining the security that we want to analyze and from which date we want to start the analysis.

To get the data, we use the function download() in yfinance. For example, to get the data for Microsoft (ticker: MSFT) we type:

yf.download(tickers='MSFT', auto_adjust=False, progress=False, multi_level_index=False)

	Adj Close	Close	High	Low	Open	Volume
Date
2026-02-27	392.739990	392.739990	396.820007	389.880005	390.880005	51367200
2026-03-02	398.549988	398.549988	401.190002	390.630005	392.859985	35474900
2026-03-03	403.929993	403.929993	406.700012	392.670013	393.140015	38199200
2026-03-04	405.200012	405.200012	411.029999	400.309998	401.269989	35808000
2026-03-05	410.679993	410.679993	411.609985	404.399994	404.420013	39001300
2026-03-06	408.959991	408.959991	413.049988	408.510010	409.200012	31123900
2026-03-09	409.410004	409.410004	410.209991	403.500000	404.920013	30131900
2026-03-10	405.760010	405.760010	410.200012	402.929993	410.029999	31706400
2026-03-11	404.880005	404.880005	409.010010	401.589996	405.570007	25512100
2026-03-12	401.859985	401.859985	406.119995	401.709991	404.630005	27263900
2026-03-13	395.549988	395.549988	404.799988	394.250000	401.000000	26848000
2026-03-16	399.950012	399.950012	400.630005	394.790009	398.070007	27733700
2026-03-17	399.410004	399.410004	404.399994	397.750000	400.269989	26228300
2026-03-18	391.790009	391.790009	398.000000	391.000000	397.130005	25908500
2026-03-19	389.019989	389.019989	392.489990	387.059998	390.100006	25138800
2026-03-20	381.869995	381.869995	387.000000	380.119995	386.790009	50853200
2026-03-23	383.000000	383.000000	387.209991	381.679993	383.899994	29680100
2026-03-24	372.739990	372.739990	382.470001	371.850006	382.359985	42733600
2026-03-25	371.040009	371.040009	377.059998	369.630005	376.920013	31181200
2026-03-26	365.970001	365.970001	374.719910	365.190002	370.815002	36436874

The output of the function download() is a Pandas dataframe. Therefore, all operations available on Pandas work on the dataframe just imported.

The yfinance library was recently updated so that the Close column now shows split and dividend-adjusted prices by default. To get both the unadjusted price and the adjusted price separately, we need to set auto_adjust=False when downloading data.

I use the option progress=False to suppress the status bar when downloading the data. The code runs just as fine without it.

The option multi_level_index=False flattens the column names to a single level. Without it, yfinance returns columns with two levels (price type and ticker), so you would need to write df['Close']['MSFT'] instead of simply df['Close'].

Also, note that by default, the dataset only includes data for the last month. We can specify an arbitrary starting date by adding, for example, start='2012-01-01'.

yf.download(tickers='MSFT', start='2012-01-01', auto_adjust=False, progress=False, multi_level_index=False)

	Adj Close	Close	High	Low	Open	Volume
Date
2012-01-03	20.917698	26.770000	26.959999	26.389999	26.549999	64731500
2012-01-04	21.409966	27.400000	27.469999	26.780001	26.820000	80516100
2012-01-05	21.628754	27.680000	27.730000	27.290001	27.379999	56081400
2012-01-06	21.964752	28.110001	28.190001	27.530001	27.530001	99455500
2012-01-09	21.675644	27.740000	28.100000	27.719999	28.049999	59706800
...	...	...	...	...	...	...
2026-03-20	381.869995	381.869995	387.000000	380.119995	386.790009	50853200
2026-03-23	383.000000	383.000000	387.209991	381.679993	383.899994	29680100
2026-03-24	372.739990	372.739990	382.470001	371.850006	382.359985	42733600
2026-03-25	371.040009	371.040009	377.059998	369.630005	376.920013	31181200
2026-03-26	365.970001	365.970001	374.719910	365.190002	370.815002	36436874

3578 rows × 6 columns

Close v/s Adjusted Close Price

The data query produces six columns. When estimating asset pricing models we will be interested in using the adjusted close (Adj Close) column since the price series is adjusted for stock splits and dividends. Let’s compare how the Adj Close differs from the Close price.

For this, let’s extract these two series and store them in a dataframe called df. The method df.loc[:, [column_label_1, column_label_2, ..., column_label_n]] selects multiple columns from the dataframe. In our example we want to select loc[:, ['Close', 'Adj Close']].

df = yf.download(tickers='MSFT', start='2012-01-01', auto_adjust=False, progress=False, multi_level_index=False).loc[:, ['Close', 'Adj Close']]

Sometimes in Python lines might get really long. A suggested way to split a long line of code into several smaller lines is to use parentheses around the code that you want to split and split whenever there is a dot.

df = (yf
      .download(tickers='MSFT', start='2012-01-01', auto_adjust=False, progress=False, multi_level_index=False)
      .loc[:, ['Close', 'Adj Close']]
      )

The previous line of code does exactly the same as the one we wrote originally, but might be easier to read and understand.

We can now compute cumulative percentage changes (cum_pct) for both series by applying to each column in df a function that divides everything by the first price. The apply() method applies a function along an axis of a DataFrame or a Series. The function can be a built-in Python function or a custom function that you define.

First, we define a function that takes a series x as input and divides the series by its initial value x[0].¹

¹ In the original code, I wrote x / x[0]. However, that syntax is being deprecated and the suggestion is to access the row number of the pandas dataframe using iloc.

def normalize(x):
    return x / x.iloc[0]

Then, we apply this function to our dataframe and plot it.

cum_pct = df.apply(normalize)
cum_pct.plot()
plt.title('Close v/s Adjusted Close of Microsoft')
plt.ylabel('Cumulative Return')
plt.show()

According to the graph, the impact of dividends is significant. Another way to see this is by actually computing the dividends from the two series. We will do this later.

Practice Problems

Problem 1 Plot the Close price for Nvidia Corporation (Ticker: NVDA) from July 1, 2015 until May 1, 2023.

Solution

df = yf.download(tickers='NVDA', start='2015-07-01', end='2023-05-01', auto_adjust=False, progress=False, multi_level_index=False).loc[:,['Close']]
df.plot()
plt.title('NVDA Stock Price')
plt.show()

Problem 2 Plot the S&P 500 (Ticker: ^GSPC) from January 1, 2018 until May 1, 2023. Since the S&P 500 is an index you can use either the Close or Adj Close, it does not make a difference.

Solution

df = yf.download(tickers='^GSPC', start='2018-01-01', end='2023-05-01', auto_adjust=False, progress=False, multi_level_index=False).loc[:,['Close']]
df.plot()
plt.title('S&P 500')
plt.show()

Problem 3 Imagine that you invest $100 in NVDA and AMD in January 1, 2015. Plot the evolution of each investment until May 1, 2023.

Solution

To plot the evolution of each investment, all we need to do is normalize the Adj Close evolution to 100 in January 1, 2015 for both stocks. We can use the same function we built before and multiply the result by 100. We use Adj Close instead of Close to account for stock splits and dividends.

df = (yf
      .download(tickers=['NVDA', 'AMD'], start='2015-01-01', end='2023-05-01', auto_adjust=False, progress=False)
      .loc[:, 'Adj Close']
      .apply(normalize)
      )*100
df.plot()
plt.title('Evolution of a $100 Investment in NVDA and AMD')
plt.ylabel('Portfolio Value ($)')
plt.show()