Brownian Motion

Brownian motion is the central object of continuous-time stochastic calculus. It arises naturally as the limit of a symmetric random walk when the step size and time increment are both shrunk to zero in a coordinated way, and it serves as the building block for stochastic differential equations and derivative pricing models. These notes follow Shreve (2004) closely.

Shreve, Steven E. 2004. Stochastic Calculus for Finance II: Continuous-Time Models. Springer Finance. Springer.

These notes construct Brownian motion from first principles. We begin by revisiting the symmetric random walk and introducing a scaled version whose variance over any interval [s, t] equals t - s. A characteristic function argument shows that, as the number of steps grows, the terminal value of this scaled walk converges in distribution to a normal random variable. Taking this limit for all times simultaneously yields a continuous-time process—Brownian motion—whose formal definition we then state and explore.

A key theme is that Brownian paths are far rougher than the smooth functions encountered in ordinary calculus. We make this precise by studying total variation and quadratic variation. For smooth functions both are well-behaved: total variation is finite and quadratic variation is zero. Brownian motion has the opposite character: its quadratic variation over [0, t] equals t almost surely, its total variation is almost surely infinite, and its paths are nowhere differentiable. These properties are not pathologies—they are exactly what makes Brownian motion the right model for cumulative noise, and they are the source of the corrections that appear in Itô’s lemma.

The Simple Random Walk Again

In this section, we revisit the symmetric random walk discussed in previous lectures. This process models the outcome of repeatedly flipping a fair coin: you gain $1 for heads and lose $1 for tails. Each flip is independent and has expected value zero, so the cumulative gains form a martingale.

Let \{S_{n}\} denote the position after n steps, defined recursively by S_{n+1} = S_{n} + X_{n+1}, where each X_{n+1} is an independent random variable taking values +1 or -1 with probability 1/2 each. Thus, the sequence \{X_{n}\} is iid with \operatorname{E}(X_{n}) = 0.5 \times 1 + 0.5 \times (-1) = 0, and \operatorname{V}(X_{n}) = 0.5 \times (1 - 0)^2 + 0.5 \times (-1 - 0)^2 = 1.

For any 0 \le m < n, we have S_{n} = S_{m} + \sum_{i = m + 1}^{n} X_{i}, so the increment S_{n} - S_{m} depends only on the coin flips between times m+1 and n. Moreover, increments over disjoint intervals are independent.

The process \{S_{n}\} is a martingale: for m < n, \operatorname{E}(S_{n} \mid \mathcal{F}_{m}) = S_{m} + \operatorname{E}\left(\sum_{i = m + 1}^{n} X_{i} \mid \mathcal{F}_{m}\right) = S_{m}, since the future coin flips are independent of the past.

The variance of the increment is \operatorname{V}(S_{n} - S_{m}) = \operatorname{V}\left(\sum_{i = m + 1}^{n} X_{i}\right) = \sum_{i = m + 1}^{n} \operatorname{V}(X_{i}) = n - m.

Finally, the quadratic variation of the simple symmetric random walk up to time n is [S, S]_{n} = \sum_{i = 1}^{n} (S_{i} - S_{i -1})^2 = \sum_{i = 1}^{n} X_{i}^2 = n, since each X_{i}^2 = 1. Unlike variance, quadratic variation is computed path-by-path, not as an average over many realizations.

A Scaled Random Walk

We now embed the symmetric random walk into a finite time interval and scale it so that the variance over any interval [0, t] equals t.

Let \Delta t = T / n, and define discrete time points: t_{0} = 0, t_{1} = \Delta t, t_{2} = 2 \Delta t, …, so that t_{n} = T. At each step, instead of moving by +1 or -1, we move by +\sqrt{\Delta t} or -\sqrt{\Delta t}. Thus, we define the scaled random walk as B_{t_{m}}^{(n)} = \sum_{j = 1}^{m} \sqrt{\Delta t} \, X_{j}, with B_{t_{0}}^{(n)} = 0, and each X_{i} is independent and takes values +1 or -1 with probability 1 / 2.

The expected value and variance are: \begin{aligned} \operatorname{E}(B_{t_{m}}^{(n)}) &= \sum_{j = 1}^{m} \sqrt{\Delta t} \, \operatorname{E}(X_{j}) = 0, \\ \operatorname{V}(B_{t_{m}}^{(n)}) &= \sum_{j = 1}^{m} \Delta t \, \operatorname{V}(X_{j}) = m \Delta t = t_{m}. \end{aligned}

The quadratic variation up to time 0 \le t_{m} \le T is [B^{(n)}, B^{(n)}]_{t_{m}} = \sum_{j = 0}^{m - 1} (B_{t_{j + 1}}^{(n)} - B_{t_{j}}^{(n)})^2 = m \Delta t = t_{m}.

This scaling ensures that as n increases, the process has variance proportional to elapsed time, matching the behavior of a Brownian motion.

Limiting Distribution of the Scaled Random Walk

Remember that T = t_{n} = n \Delta t. To analyze the limiting distribution of B_{T}^{(n)} as n \to \infty (i.e., as \Delta t \to 0), we use the characteristic function, which uniquely determines the distribution of a random variable. The characteristic function of X is \phi_{X}(u) = \operatorname{E}[e^{i u X}], where i is the imaginary unit (i^{2} = -1). For a normal random variable X \sim \mathcal{N}(\mu, \sigma^{2}), the characteristic function is \phi_{X}(u) = e^{i u \mu - \frac{1}{2} u^{2} \sigma^{2}}.

Now, consider the scaled random walk B_{T}^{(n)} = \sum_{k = 1}^{n} \sqrt{\Delta t} \, X_{k}, where each X_{k} is independent and takes values \pm 1 with probability 1 / 2. Its characteristic function is \begin{aligned} \operatorname{E}(e^{i u B_{T}}) & = \operatorname{E}\left(e^{i u \sum_{j = 1}^{n} \sqrt{\Delta t}\, X_{j}}\right) = \prod_{j = 1}^{n} \operatorname{E}\left(e^{i u \sqrt{\Delta t}\, X_{j}}\right) = \left(\frac{e^{i u \sqrt{\Delta t}} + e^{-i u \sqrt{\Delta t}}}{2}\right)^{n} \\ & \approx \left(\frac{1 + i u \sqrt{\Delta t} - \frac{1}{2} u^{2} \Delta t + 1 - i u \sqrt{\Delta t} - \frac{1}{2} u^{2} \Delta t}{2}\right)^{n} \\ & = \left(1 + \frac{- \frac{1}{2} u^{2} T}{n}\right)^{n}. \end{aligned}

Therefore, \lim_{n \to \infty} \operatorname{E}[e^{i u B_{T}}] = \lim_{n \to \infty} \left(1 + \frac{- \frac{1}{2} u^{2} T}{n}\right)^{n} = e^{-\frac{1}{2} u^2 T}, which is the characteristic function of a normal distribution with mean 0 and variance T. Thus, as n \to \infty, the scaled random walk B_{T}^{(n)} converges in distribution to \mathcal{N}(0, T).

Brownian Motion

We obtain Brownian motion as the limit of the scaled random walks as n \to \infty. The Brownian motion inherits properties from these random walks. This leads to the following definition.

WarningBrownian Motion

A continuous-time stochastic process \{B_{t} \colon t \geq 0\} is called a Brownian motion if it has the following four properties:

  1. B_{0} = 0.

  2. For any finite set of times 0 \le t_{0} < t_{1} < t_{2} < \ldots < t_{n}, the random variables B_{t_{1}} - B_{t_{0}}, B_{t_{2}} - B_{t_{1}}, B_{t_{3}} - B_{t_{2}}, \ldots, B_{t_{n}} - B_{t_{n-1}} are independent.

  3. For any 0 \le s \le t the increment B_{t} - B_{s} \sim \mathcal{N}(0, t - s).

  4. For all \omega in a set of probability one, B_{t}(\omega) is a continuous function of t.

For any t \geq 0 we have \operatorname{E}(B_{t}) = \operatorname{E}(B_{t} - B_{0}) = 0, and for 0 \le s \le t we have \begin{aligned} \operatorname{Cov}(B_{s}, B_{t}) & = \operatorname{Cov}(B_{s}, B_{t} - B_{s} + B_{s}) \\ & = \operatorname{Cov}(B_{s}, B_{t} - B_{s}) + \operatorname{Cov}(B_{s}, B_{s}) \\ & = 0 + s = s. \end{aligned} Thus, for any s, t \geq 0 it must be the case that \operatorname{Cov}(B_{s}, B_{t}) = \operatorname{E}(B_{s} B_{t}) = \min(s, t). This covariance structure is enough to characterize the Brownian motion as well. It is in fact one way to rigorously construct the Brownian motion process by noting that for 0 \le s \le 1 and 0 \le t \le 1 we have \operatorname{E}(B_{s} B_{t}) = \int_{0}^{1} \mathbf{1}_{\{[0, s]\}}(u) \mathbf{1}_{\{[0, t]\}}(u) du = \min(s, t). The previous relationship establishes an isometry between the Hilbert space of normal random variables \mathcal{H} \in \mathcal{L}^{2}(\operatorname{P}) with inner product \langle X, Y\rangle = \operatorname{E}[XY], and the Hilbert space \mathcal{L}^{2}[0, 1] with inner product \langle f, g \rangle = \int_{0}^1 f(u) g(u) du.

We can then show that for any 0 \le t \le 1 we have B_{t} = \sum_{k = 0}^{\infty} \langle \phi_{k}, \mathbf{1}_{\{[0, t]\}} \rangle Z_{k}, where \{Z_{k}: 0 \le k < \infty\} is a sequence of independent \mathcal{N}(0, 1) random variables, and \phi_{k}: 0 \le k < \infty is an orthonormal basis of \mathcal{L}^{2}[0, 1]. By using the Haar basis in the previous expression, it is possible to show that the series representation on the right generates a Brownian motion.

Total and Quadratic Variation

Total Variation of a Function

To study the variation of a function over an interval, we begin by dividing it into smaller pieces. Consider a partition of the time interval [0, T] given by \Pi = \{t_{0}, t_{1}, \ldots, t_{n}\}, where the partition points satisfy 0 = t_{0} < t_{1} < \ldots < t_{n} = T. The mesh (or norm) of the partition \Pi is defined as the length of the longest subinterval: \|\Pi\| = \max_{j = 0, 1, \ldots, n-1} (t_{j + 1} - t_{j}). As we refine the partition by adding more points, the mesh \|\Pi\| decreases and approaches zero.

The total variation of a function f over the interval [0, T] measures the total “distance traveled” by the function. It is defined as V_{T}(f) = \lim_{\|\Pi\| \to 0} \sum_{j = 0}^{n - 1} |f(t_{j + 1}) - f(t_{j})|. In words, we partition the interval into smaller pieces, sum the absolute changes in f across each piece, and then take the limit as the partition becomes arbitrarily fine.

For differentiable functions, the total variation has a simple integral representation. By the mean-value theorem, for each subinterval [t_{j}, t_{j+1}] there exists a point t_{j}^{*} \in [t_{j}, t_{j + 1}] where the derivative equals the average rate of change: \frac{f(t_{j + 1}) - f(t_{j})}{t_{j + 1} - t_{j}} = f'(t_{j}^{*}). Rearranging and taking absolute values gives \sum_{j = 0}^{n - 1} |f(t_{j + 1}) - f(t_{j})| = \sum_{j = 0}^{n - 1} |f'(t_{j}^{*})| (t_{j + 1} - t_{j}). The right-hand side is a Riemann sum for the integral of |f'(t)|. Taking the limit as the mesh goes to zero yields V_{T}(f) = \lim_{\|\Pi\| \to 0} \sum_{j = 0}^{n - 1} |f'(t_{j}^{*})| (t_{j + 1} - t_{j}) = \int_{0}^{T} |f'(t)| dt. Thus, for differentiable functions, total variation equals the integral of the absolute value of the derivative—a quantity that is always finite when f' is integrable.

Quadratic Variation of a Function

While total variation measures the absolute distance traveled by a function, quadratic variation measures the sum of squared changes. For a function f over the interval [0, T], the quadratic variation is defined as [f, f]_{T} = \lim_{\|\Pi\| \to 0} \sum_{j = 0}^{n - 1} (f(t_{j + 1}) - f(t_{j}))^{2}.

For continuously differentiable functions, the quadratic variation vanishes. To see why, we again apply the mean-value theorem: for each subinterval there exists a point t_{j}^{*} \in [t_{j}, t_{j+1}] such that f(t_{j + 1}) - f(t_{j}) = f'(t_{j}^{*}) (t_{j + 1} - t_{j}). Squaring both sides and summing over all subintervals gives \sum_{j = 0}^{n - 1} (f(t_{j + 1}) - f(t_{j}))^{2} = \sum_{j = 0}^{n - 1} (f'(t_{j}^{*}))^{2} (t_{j + 1} - t_{j})^{2}. Since (t_{j + 1} - t_{j}) \le \|\Pi\| for all j, we can factor out one power of the mesh: \sum_{j = 0}^{n - 1} (f(t_{j + 1}) - f(t_{j}))^{2} \le \|\Pi\| \sum_{j = 0}^{n - 1} (f'(t_{j}^{*}))^{2} (t_{j + 1} - t_{j}). The sum on the right is a Riemann sum for \int_{0}^{T} (f'(t))^{2} dt, which converges to this integral as \|\Pi\| \to 0. Therefore, [f, f]_{T} \le \lim_{\|\Pi\| \to 0} \|\Pi\| \lim_{\|\Pi\| \to 0} \sum_{j = 0}^{n - 1} (f'(t_{j}^{*}))^{2} (t_{j + 1} - t_{j}) = \lim_{\|\Pi\| \to 0} \|\Pi\| \int_{0}^{T} (f'(t))^{2} dt = 0. In other words, for smooth functions the quadratic variation is always zero because the mesh shrinks faster than the Riemann sum can accumulate.

Quadratic Variation of Brownian Motion

Unlike smooth functions whose quadratic variation is zero, Brownian motion has non-trivial quadratic variation that accumulates linearly over time. We now prove that the quadratic variation of Brownian motion over [0, T] converges to T in mean square.

Analysis

Consider a partition \Pi of [0, T] and define the quadratic variation sum: Q_{\Pi} = \sum_{j = 0}^{n - 1} (B_{t_{j + 1}} - B_{t_{j}})^{2}.

Our strategy is to show two things: first, that the expected value of Q_{\Pi} equals T for any partition; second, that the variance of Q_{\Pi} vanishes as the mesh size goes to zero. Together, these facts imply convergence in L^{2}(\operatorname{P}).

Step 1: Expected value of Q_{\Pi}.
Since each increment B_{t_{j+1}} - B_{t_j} is normally distributed with mean zero and variance t_{j+1} - t_j, we have \operatorname{E}(Q_{\Pi}) = \sum_{j = 0}^{n - 1} \operatorname{E}(B_{t_{j + 1}} - B_{t_{j}})^{2} = \sum_{j = 0}^{n - 1} (t_{j + 1} - t_{j}) = t_{n} - t_{0} = T. Thus, regardless of how we partition the interval, the expected quadratic variation is always T.

Step 2: Variance of Q_{\Pi}.
To show that Q_{\Pi} concentrates around its mean as \|\Pi\| \to 0, we compute its variance. For a standard normal random variable Z \sim \mathcal{N}(0,1), we have \operatorname{E}(Z^4) = 3. Since (B_{t_{j+1}} - B_{t_j})/\sqrt{t_{j+1} - t_j} \sim \mathcal{N}(0,1), it follows that \begin{aligned} \operatorname{V}(B_{t_{j + 1}} - B_{t_{j}})^{2} & = \operatorname{E}(B_{t_{j + 1}} - B_{t_{j}})^{4} - \left(\operatorname{E}(B_{t_{j + 1}} - B_{t_{j}})^{2}\right)^{2} \\ & = 3 (t_{j + 1} - t_{j})^{2} - (t_{j + 1} - t_{j})^{2} = 2 (t_{j + 1} - t_{j})^{2}. \end{aligned} Because Brownian increments over disjoint intervals are independent, the variance of their sum equals the sum of their variances: \operatorname{V}(Q_{\Pi}) = \sum_{j = 0}^{n - 1} \operatorname{V}(B_{t_{j + 1}} - B_{t_{j}})^{2} = 2 \sum_{j = 0}^{n - 1} (t_{j + 1} - t_{j})^{2}.

Each difference (t_{j+1} - t_j) is bounded by the mesh \|\Pi\|, so we can write \operatorname{V}(Q_{\Pi}) = 2 \sum_{j = 0}^{n - 1} (t_{j + 1} - t_{j})^{2} \le 2 \|\Pi\| \sum_{j = 0}^{n - 1} (t_{j + 1} - t_{j}) = 2 \|\Pi\| \, T. As the partition is refined and \|\Pi\| \to 0, the variance \operatorname{V}(Q_{\Pi}) also goes to zero: \lim_{\|\Pi\| \to 0} \operatorname{V}(Q_{\Pi}) \le 2 \lim_{\|\Pi\| \to 0} \|\Pi\| \, T = 0.

Step 3: Convergence in L^{2}(\operatorname{P}) and in probability.
Since \operatorname{E}(Q_{\Pi}) = T and \operatorname{V}(Q_{\Pi}) \to 0, the mean-squared error converges to zero: \lim_{\|\Pi\| \to 0} \operatorname{E}(Q_{\Pi} - T)^{2} = \lim_{\|\Pi\| \to 0} \operatorname{V}(Q_{\Pi}) = 0. This is L^2 convergence: Q_{\Pi} \to T in L^{2}(\operatorname{P}).

Moreover, by Chebyshev’s inequality, for any \varepsilon > 0, \operatorname{P}(|Q_{\Pi} - T| > \varepsilon) \le \frac{\operatorname{V}(Q_{\Pi})}{\varepsilon^{2}} \to 0 \quad \text{as } \|\Pi\| \to 0. This shows that Q_{\Pi} converges to T in probability: Q_{\Pi} \xrightarrow{\operatorname{P}} T.

The Main Results

We see that the quadratic variation of Brownian motion over [0, t] for any 0 \le t \le T equals t: [B, B]_{t} = t.

The result [B,B]_t = t is summarized by the shorthand differential notation (dB_t)^2 = dt. This is the first entry of the Itô multiplication table: (dB)(dB) = dt, \qquad (dB)(dt) = 0, \qquad (dt)(dt) = 0. This notation is heuristic and will be made precise when we study Itô’s lemma. These equalities hold in the mean-square sense, not path by path: each squared increment (B_{t_{j+1}} - B_{t_j})^2 is a random variable with mean t_{j+1} - t_j, not a deterministic quantity equal to it.

To justify the remaining two entries, let M_\Pi = \max_{0 \le j \le n-1}|B_{t_{j+1}} - B_{t_j}| denote the maximum absolute increment over the partition. Since Brownian motion is uniformly continuous on [0,T], we have M_\Pi \to 0 as \|\Pi\| \to 0. It follows that \left|\sum_{j = 0}^{n - 1} (B_{t_{j + 1}} - B_{t_{j}})(t_{j + 1} - t_{j})\right| \le M_\Pi \sum_{j=0}^{n-1}(t_{j+1}-t_j) = M_\Pi\,T \xrightarrow{\|\Pi\| \to 0} 0, and \sum_{j = 0}^{n - 1} (t_{j + 1} - t_{j})^{2} \le \|\Pi\| \sum_{j=0}^{n-1}(t_{j+1}-t_j) = \|\Pi\|\,T \xrightarrow{\|\Pi\| \to 0} 0. These limits confirm (dB)(dt) = 0 and (dt)(dt) = 0.

Brownian Motion is Nowhere Differentiable

Recall that M_\Pi = \max_{0 \le j \le n-1}|B_{t_{j+1}} - B_{t_j}| \to 0 as \|\Pi\| \to 0, and let Q_\Pi = \sum_{j=0}^{n-1}(B_{t_{j+1}} - B_{t_j})^2. For each increment we have the inequality (B_{t_{j+1}} - B_{t_j})^2 \le M_\Pi\,|B_{t_{j+1}} - B_{t_j}|. Summing over all subintervals gives Q_\Pi \le M_\Pi \sum_{j=0}^{n-1}|B_{t_{j+1}} - B_{t_j}|, or equivalently, \sum_{j = 0}^{n - 1} \big|B_{t_{j+1}} - B_{t_{j}}\big| \;\ge\; \frac{Q_{\Pi}}{M_{\Pi}}. As \|\Pi\| \to 0, the right-hand side diverges since Q_\Pi \to T > 0 while M_\Pi \to 0. The total variation of Brownian motion over [0, T] is therefore almost surely infinite.

Since the total variation is infinite on every interval, Brownian paths cannot be continuously differentiable on any subinterval of [0, T]: if a path were C^1 on [a, b], its total variation would equal \int_a^b |B'(t)|\,dt, which is finite. In fact, a stronger result holds: Brownian paths are almost surely nowhere differentiable. Establishing this in full rigor requires a separate argument, but the infinite total variation already reveals how irregular these trajectories are.