Stochastic Calculus

Introduction

Let (\Omega, \mathcal{F}, \operatorname{P}) be a probability space. Remember that \Omega is the set of all the possible outcomes and \mathcal{F} contains all the events A \subset \Omega that we can assert if they happen or not.

In continuous time we define a stochastic process X_{t}(\omega) as a collection of random variables such that, given an outcome \omega \in \Omega, we can determine the path of the stochastic process over time. We can also think about the stochastic process in the opposite way. That is, for a given time t \leq T, how does the random variable X_{t}(\omega) behave.

A filtration \{\mathcal{F}_{t}\} determines how information is disseminated as we observe a stochastic process. At the very least, we want the filtration to remember what has happened before so that \mathcal{F}_{s} \subset \mathcal{F}_{t} when 0 \leq s < t.

Throughout these notes, \{B_{t}\}_{t \ge 0} denotes a standard Brownian motion on (\Omega, \mathcal{F}, \operatorname{P}), as constructed in the Brownian Motion notes. Recall in particular that B_{t} has independent and stationary increments, that B_{t} - B_{s} \sim \mathcal{N}(0, t - s) for s < t, and that its paths are continuous but nowhere differentiable.

Itô Processes

An Itô process \{X_{t}\} is a continuous-time stochastic process that can be written as the sum of an ordinary (pathwise) Lebesgue time integral and an Itô stochastic integral: X_{t}(\omega) = X_{0} + \int_{0}^{t} a(s, \omega) \mathop{}\!\mathrm{d}s + \int_{0}^{t} b(s, \omega) \mathop{}\!\mathrm{d}B_{s}(\omega), \tag{1} where the coefficient functions a(t,\omega) and b(t,\omega) are \mathcal{F}_{t}-adapted processes such that \int_{0}^{t} |a(s,\omega)| \mathop{}\!\mathrm{d}s < \infty, and \int_{0}^{t} b(s,\omega)^{2} \mathop{}\!\mathrm{d}s < \infty almost surely. For compact notation we commonly write \mathop{}\!\mathrm{d}X = a \mathop{}\!\mathrm{d}t + b \mathop{}\!\mathrm{d}B, with the understanding that this notation represents the integral representation in (1).

The stochastic integral is constructed by approximating the integrand with step processes on partitions. For a partition \Pi = \{t_{0}, t_{1}, \ldots, t_{n}\} of [0,T] with 0 = t_{0} < t_{1} < \dots < t_{n} = t, and \|\Pi\| \to 0 as n \to \infty, the Itô integral is defined as the mean-square (L^{2}) limit of Riemann-type sums: I_{t}(\omega) = \int_{0}^{t} b(s, \omega) \mathop{}\!\mathrm{d}B_{s}(\omega) = \lim_{n \to \infty} \sum_{j = 0}^{n - 1} b(t_{j}, \omega) \Delta B_{t_{j}}(\omega), \tag{2} where the limit is taken in L^{2} and, in particular, requires \operatorname{E}(I_{t}^{2}) < \infty.

The stochastic integral I_{t}(\omega) is a random process: its value at each time t depends on the sample point \omega. Under the usual measurability and square-integrability conditions on the integrand b(t, \omega), the integral admits a modification that is continuous in t for almost every \omega (i.e., there exists an indistinguishable version with continuous sample paths). Hence, without loss of generality, we may take I_{t} to have continuous sample paths.

Itô Isometry

The Itô isometry expresses the second moment of the stochastic integral: \operatorname{E}(I_{t}^{2}) = \operatorname{E}\left(\int_{0}^{t} b(s, \omega)^{2} \mathop{}\!\mathrm{d}s\right), i.e., the mean square of the integral equals the expectation of the time-integral of the squared integrand.

Consider a simple, adapted integrand b(s, \omega) = \sum_{j = 0}^{n - 1} b(t_{j}, \omega)\mathbf{1}_{(t_{j}, t_{j + 1}]}(s), for which I_{t} = \sum_{j = 0}^{n - 1} b(t_{j})\left(B_{t_{j + 1}} - B_{t_{j}}\right). Expanding the square and interchanging expectation with the finite double sum, \operatorname{E}(I_{t}^{2}) = \sum_{j=0}^{n-1}\sum_{k=0}^{n-1} \operatorname{E}\!\left[b(t_{j})b(t_{k})\left(B_{t_{j+1}}-B_{t_{j}}\right)\!\left(B_{t_{k+1}}-B_{t_{k}}\right)\right]. For j < k, condition on \mathcal{F}_{t_{k}}: the factor b(t_{j})b(t_{k})(B_{t_{j+1}}-B_{t_{j}}) is \mathcal{F}_{t_{k}}-measurable, while B_{t_{k+1}}-B_{t_{k}} is independent of \mathcal{F}_{t_{k}} with mean zero. By the tower property each cross-term vanishes: \operatorname{E}\!\left[b(t_{j})b(t_{k})(B_{t_{j+1}}-B_{t_{j}})(B_{t_{k+1}}-B_{t_{k}})\right] = \operatorname{E}\!\left[b(t_{j})b(t_{k})(B_{t_{j+1}}-B_{t_{j}})\underbrace{\operatorname{E}\!\left[B_{t_{k+1}}-B_{t_{k}}\mid\mathcal{F}_{t_{k}}\right]}_{=\,0}\right] = 0. Only the diagonal terms j = k survive. For each, condition on \mathcal{F}_{t_{j}} and use the independence of B_{t_{j+1}}-B_{t_{j}} from \mathcal{F}_{t_{j}}: \begin{aligned} \operatorname{E}(I_{t}^{2}) &= \sum_{j=0}^{n-1} \operatorname{E}\!\left[b(t_{j})^{2}\,\operatorname{E}\!\left[(B_{t_{j+1}}-B_{t_{j}})^{2}\mid\mathcal{F}_{t_{j}}\right]\right] \\ &= \sum_{j=0}^{n-1} \operatorname{E}\!\left[b(t_{j})^{2}(t_{j+1}-t_{j})\right] = \operatorname{E}\!\left(\int_{0}^{t} b(s, \omega)^{2} \mathop{}\!\mathrm{d}s\right). \end{aligned} The general result follows by approximating a square-integrable predictable integrand by such simple processes and passing to the limit in L^{2}. The isometry therefore gives an isometric linear map from the space of square-integrable predictable integrands (with norm given by \operatorname{E}\int_{0}^{t} b(s)^2 \mathop{}\!\mathrm{d}s) into L^{2} of the resulting stochastic integrals.

Itô Integrals are Martingales

The Itô isometry also underpins the proof that stochastic integrals are martingales. Consider first a simple, adapted integrand of the form b(t,\omega) = \sum_{j = 0}^{n - 1} b(t_{j}, \omega) \mathbf{1}_{(t_{j}, t_{j + 1}]}(t), so that the Itô integral on this partition is I_{t} = \sum_{j = 0}^{n - 1} b(t_{j}, \omega)\left(B_{t_{j + 1}} - B_{t_{j}}\right). Take s = t_{k} to be a partition point, so the sum splits exactly as I_{t} = \underbrace{\sum_{j = 0}^{k - 1} b(t_{j})\left(B_{t_{j + 1}} - B_{t_{j}}\right)}_{=\,I_{s}} \;+\; \sum_{j = k}^{n - 1} b(t_{j})\left(B_{t_{j + 1}} - B_{t_{j}}\right). The first sum equals I_s and is \mathcal{F}_{s}-measurable. For each j \ge k, since s = t_{k} \le t_{j} we have \mathcal{F}_{s} \subset \mathcal{F}_{t_{j}}, so the tower property gives \operatorname{E}\!\left[b(t_{j})\!\left(B_{t_{j+1}}-B_{t_{j}}\right)\mid\mathcal{F}_{s}\right] = \operatorname{E}\!\left[b(t_{j})\underbrace{\operatorname{E}\!\left[B_{t_{j+1}}-B_{t_{j}}\mid\mathcal{F}_{t_{j}}\right]}_{=\,0}\,\middle|\,\mathcal{F}_{s}\right] = 0, since b(t_j) is \mathcal{F}_{t_j}-measurable and \operatorname{E}[B_{t_{j+1}}-B_{t_j}\mid\mathcal{F}_{t_j}] = 0. Taking the full conditional expectation of I_t yields \operatorname{E}(I_{t}\mid\mathcal{F}_{s}) = I_{s}, so I_{t} is a martingale. (For general s between partition points the result follows by L^{2} continuity of t \mapsto I_t.)

The result for general square-integrable adapted integrands follows by approximating arbitrary predictable integrands by simple ones and using the Itô isometry established above to pass to the limit in L^{2}.

Conversely, for the Brownian filtration there is the martingale representation theorem: any \mathcal{F}_{t}-adapted martingale \{M_{t}\} with \operatorname{E}(M_{t}^{2})<\infty for all t can be written as M_{t} = M_{0} + \int_{0}^{t}\varphi(s, \omega) \mathop{}\!\mathrm{d}B_{s}, for a predictable process \varphi satisfying \operatorname{E}\!\left(\int_{0}^{t}\varphi(s, \omega)^{2}\mathop{}\!\mathrm{d}s\right)<\infty. This gives the converse: every square-integrable martingale adapted to the Brownian filtration admits a stochastic integral representation.

Quadratic Variation

The quadratic variation of a continuous semimartingale X is the (pathwise) limit of squared increments along a refining sequence of partitions \Pi = \{0 = t_{0} < t_{1} < \dots < t_{n} = t\}: [X, X]_{t} = \lim_{n \to \infty} \sum_{j = 0}^{n - 1} \left(\Delta X_{t_{j}}\right)^{2}, whenever the limit exists in probability (or almost surely for continuous local martingales).

Quadratic Variation of the Stochastic Integral

For the Itô integral I_{t} = \int_{0}^{t} b(s, \omega) \mathop{}\!\mathrm{d}B_{s}, one has the explicit expression for its quadratic variation: [I, I]_{t} = \int_{0}^{t} b^{2}(s, \omega) \mathop{}\!\mathrm{d}s. We usually summarize this infinitesimally as d[I, I]_{t} = (dI)^{2} = b^{2} \mathop{}\!\mathrm{d}t.

Sketch of proof. Take a simple adapted integrand b(s, \omega) = \sum_{j} b(t_{j}, \omega) \mathbf{1}_{(t_{j}, t_{j + 1}]}(s), so that I_{t} = \sum_{j = 0}^{n - 1} b(t_{j}) \Delta B_{t_{j}}. Since the integrand is constant on each sub-interval, the increment of I over [t_{j}, t_{j+1}] is \Delta I_{t_{j}} = b(t_{j})\Delta B_{t_{j}}, and therefore (\Delta I_{t_{j}})^{2} = b^{2}(t_{j})(\Delta B_{t_{j}})^{2}. Thus the quadratic variation along the partition is [I, I]_{t} = \lim_{n \to \infty} \sum_{j = 0}^{n - 1} b^{2}(t_{j}) (\Delta B_{t_{j}})^{2}.

To show this limit equals \int_{0}^{t} b^{2}(s,\omega)\,ds in mean square, consider the mean-square difference \begin{aligned} \operatorname{E}\left[\left(\sum_{j = 0}^{n - 1} b(t_{j})^{2}(\Delta B_{t_{j}})^{2} - \sum_{j = 0}^{n - 1} b(t_{j})^{2} \Delta t_{j}\right)^{2}\right] & = \operatorname{E}\left[\left(\sum_{j = 0}^{n - 1} b(t_{j})^{2}((\Delta B_{t_{j}})^{2} - \Delta t_{j})\right)^{2}\right] \\ & = \operatorname{E}\left[\sum_{j = 0}^{n - 1} \sum_{k = 0}^{n - 1} b(t_{j})^{2} b(t_{k})^{2} ((\Delta B_{t_{j}})^{2} - \Delta t_{j}) ((\Delta B_{t_{k}})^{2} - \Delta t_{k})\right] \\ & = \sum_{j = 0}^{n - 1} \sum_{k = 0}^{n - 1} \operatorname{E}[b(t_{j})^{2} b(t_{k})^{2} ((\Delta B_{t_{j}})^{2} - \Delta t_{j}) ((\Delta B_{t_{k}})^{2} - \Delta t_{k})]. \end{aligned} For j < k, condition on \mathcal{F}_{t_{k}}: the factor b(t_{j})^{2}b(t_{k})^{2}((\Delta B_{t_{j}})^{2}-\Delta t_{j}) is \mathcal{F}_{t_{k}}-measurable, while \operatorname{E}[(\Delta B_{t_{k}})^{2}-\Delta t_{k}\mid\mathcal{F}_{t_{k}}]=0 since \Delta B_{t_{k}} is independent of \mathcal{F}_{t_{k}} with variance \Delta t_{k}. By the tower property each cross-term vanishes: \operatorname{E}\left[b(t_{j})^{2} b(t_{k})^{2} ((\Delta B_{t_{j}})^{2} - \Delta t_{j}) ((\Delta B_{t_{k}})^{2} - \Delta t_{k})\right] = 0. Using \Delta B_{t_{j}} \sim \mathcal{N}(0, \Delta t_{j}) we have \operatorname{E}[(\Delta B_{t_{j}})^{2}] = \Delta t_{j} and \operatorname{V}[(\Delta B_{t_{j}})^{2}] = 2 (\Delta t_{j})^{2}. For each diagonal term, b(t_{j}) is \mathcal{F}_{t_{j}}-measurable and \Delta B_{t_{j}} is independent of \mathcal{F}_{t_{j}}, so b(t_{j})^{4} and (\Delta B_{t_{j}})^{2}-\Delta t_{j} are independent and their expectations factor: \begin{aligned} \operatorname{E}\left[\left(\sum_{j = 0}^{n - 1} b(t_{j})^{2}(\Delta B_{t_{j}})^{2} - \sum_{j = 0}^{n - 1} b(t_{j})^{2} \Delta t_{j}\right)^{2}\right] & = \sum_{j = 0}^{n - 1} \operatorname{E}[b(t_{j})^{4} ((\Delta B_{t_{j}})^{2} - \Delta t_{j})^{2}] \\ & = \sum_{j = 0}^{n - 1} \operatorname{E}[b(t_{j})^{4}] \operatorname{E}[((\Delta B_{t_{j}})^{2} - \Delta t_{j})^{2}] \\ & = \sum_{j = 0}^{n - 1} \operatorname{E}[b(t_{j})^{4}] 2 (\Delta t_{j})^{2} \\ & \le 2\lVert{\Pi}\rVert \sum_{j = 0}^{n - 1} \operatorname{E}[b^{4}(t_{j})] \Delta t_{j}. \end{aligned}

Under the integrability assumption \operatorname{E}\!\left(\int_{0}^{t} b^{4}(s) \mathop{}\!\mathrm{d}s\right) < \infty, the sum \sum_{j} \operatorname{E}[b^{4}(t_{j})] \Delta t_{j} is bounded, so the right-hand side tends to zero as \lVert{\Pi}\rVert \to 0. Therefore \sum_{j = 0}^{n - 1} b(t_{j})^{2}(\Delta B_{t_{j}})^{2} \xrightarrow[\,L^{2}\,]{} \int_{0}^{t} b^{2}(s, \omega) \mathop{}\!\mathrm{d}s, which yields the desired quadratic variation identity. \square

Quadratic Variation of the Itô Process

For the general Itô process defined in (1), only the stochastic integral part contributes to the quadratic variation; the drift term, being of bounded variation, contributes nothing. Thus: [X, X]_{t} \;=\; \int_{0}^{t} b(s,\omega)^{2}\mathop{}\!\mathrm{d}s. To verify this, consider simple adapted integrands: a(s, \omega) = \sum_{j} a(t_{j}, \omega) \mathbf{1}_{(t_{j}, t_{j + 1}]}(s), \qquad b(s, \omega) = \sum_{j} b(t_j, \omega) \mathbf{1}_{(t_{j}, t_{j + 1}]}(s). The increment of X over the interval [t_{j}, t_{j+1}] is given by: \Delta X_{t_{j}}(\omega) = a(t_{j}, \omega) \Delta t_{j} + b(t_{j}, \omega) \Delta B_{t_{j}}(\omega). Consequently, we have: \left(\Delta X_{t_{j}}\right)^{2} = a(t_{j})^{2}(\Delta t_{j})^{2} + 2 a(t_{j}) b(t_{j})(\Delta t_{j})(\Delta B_{t_{j}}) + b(t_{j})^{2}(\Delta B_{t_{j}})^{2}.

In the limit as n \to \infty, we analyze the expression: \lim_{n \to \infty} \sum_{j = 0}^{n - 1} \left(\Delta X_{t_{j}}\right)^{2}. The first term vanishes because: \sum_{j = 0}^{n - 1} a(t_{j}, \omega)^{2}(\Delta t_{j})^{2} \leq \lVert{\Pi}\rVert \int_{0}^{t} a(s, \omega)^{2} \mathop{}\!\mathrm{d}s \to 0 \quad \text{as } n \to \infty, assuming that \int_{0}^{t} a(s, \omega)^{2} \mathop{}\!\mathrm{d}s < \infty almost surely.

Next, we analyze the second term by considering the sum: S_{n} = \sum_{j = 0}^{n - 1} 2 a(t_{j}, \omega) b(t_{j}, \omega) \Delta t_{j} \Delta B_{t_{j}}. We can express the expected value as: \operatorname{E}(S_{n}^{2}) = \operatorname{E}\left[\left( \sum_{j = 0}^{n - 1} 2 a(t_{j}) b(t_{j}) \Delta t_{j} \Delta B_{t_{j}} \right)^{2}\right] = \operatorname{E}\left[\sum_{j = 0}^{n - 1} \sum_{k = 0}^{n - 1} 4 a(t_{j}) b(t_{j}) a(t_{k}) b(t_{k}) \Delta t_{j} \Delta t_{k} \Delta B_{t_{j}} \Delta B_{t_{k}}\right]. For j < k, the factor a(t_{j})b(t_{j})a(t_{k})b(t_{k})\Delta t_{j}\Delta t_{k}\Delta B_{t_{j}} is \mathcal{F}_{t_{k}}-measurable, while \Delta B_{t_{k}} is independent of \mathcal{F}_{t_{k}} with mean zero. By the tower property all cross-terms vanish, leaving only diagonal terms: \begin{aligned} \operatorname{E}(S_{n}^{2}) & = \operatorname{E}\left[\sum_{j = 0}^{n - 1} 4 a^{2}(t_{j}) b^{2}(t_{j}) (\Delta t_{j})^{2} (\Delta B_{t_{j}})^{2} \right] \\ & = \sum_{j = 0}^{n - 1} \operatorname{E}\left[4 a^{2}(t_{j}) b^{2}(t_{j}) (\Delta t_{j})^{2}\right] \operatorname{E}\left[(\Delta B_{t_{j}})^{2}\right] && \text{($a(t_j),b(t_j)$ are $\mathcal{F}_{t_j}$-measurable; $\Delta B_{t_j} \mathrel\bot\mathcal{F}_{t_j}$)}\\ & = \sum_{j = 0}^{n - 1} \operatorname{E}\left[4 a^{2}(t_{j}) b^{2}(t_{j}) (\Delta t_{j})^{3}\right] \\ & \le \lVert{\Pi}\rVert^{2} \sum_{j = 0}^{n - 1} \operatorname{E}\left[4 a^{2}(t_{j}) b^{2}(t_{j}) \Delta t_{j}\right]. \end{aligned} As a result, we find: \lim_{n \to \infty} \operatorname{E}(S_{n}^{2}) \le \lim_{n \to \infty} \lVert{\Pi}\rVert^{2} \int_{0}^{t} \operatorname{E}\left[4 a^{2}(s) b^{2}(s)\right] ds = 0, which implies that \lim_{n \to \infty} S_{n} = 0 in L^{2} (and thus in probability).

Finally, we have already established that: \lim_{n \to \infty} \sum_{j = 0}^{n - 1} b^{2}(s, \omega) (\Delta B_{t_{j}})^{2} = \int_{0}^{t} b^{2}(s, \omega) \mathop{}\!\mathrm{d}s in L^{2}. Therefore, we conclude: \sum_{j} \left(\Delta X_{t_{j}}\right)^{2} \xrightarrow[\,L^{2}\,]{} \int_{0}^{t} b(s, \omega)^{2} ds. This result can be extended to general square-integrable adapted coefficients through approximation with simple processes.

Itô’s Formula

Itô’s formula generalizes the chain rule to stochastic processes. It provides the precise way to compute the differential of a smooth function of an Itô process.

Itô’s Formula (General Form)

Let X_{t} be an Itô process satisfying \mathop{}\!\mathrm{d}X_{t} = a(t, \omega) \mathop{}\!\mathrm{d}t + b(t, \omega) \mathop{}\!\mathrm{d}B_{t}, and let f(x, t) be a twice continuously differentiable function in x and once continuously differentiable in t. Then Y_{t} = f(X_{t}, t) is also an Itô process with \mathop{}\!\mathrm{d}Y_{t} = \left(a(t, \omega) \frac{\partial f}{\partial x} + \frac{1}{2}b^{2}(t, \omega) \frac{\partial^{2} f}{\partial x^{2}} + \frac{\partial f}{\partial t}\right) \mathop{}\!\mathrm{d}t + b(t, \omega) \frac{\partial f}{\partial x} \mathop{}\!\mathrm{d}B_{t}.

Proof. Fix a partition \Pi = \{0 = t_{0} < t_{1} < \cdots < t_{n} = t\} and write \Delta f_{j} = f(X_{t_{j+1}}, t_{j+1}) - f(X_{t_{j}}, t_{j}). Telescoping gives f(X_{t}, t) - f(X_{0}, 0) = \sum_{j=0}^{n-1} \Delta f_{j}. Apply a second-order Taylor expansion of each increment around (X_{t_{j}}, t_{j}), with all partial derivatives evaluated there: \Delta f_{j} = \frac{\partial f}{\partial t}\Delta t_{j} + \frac{\partial f}{\partial x}\Delta X_{t_{j}} + \frac{1}{2}\frac{\partial^{2} f}{\partial x^{2}}(\Delta X_{t_{j}})^{2} + R_{j}, where R_{j} collects terms of order (\Delta t_{j})^{2}, \Delta X_{t_{j}}\Delta t_{j}, and (\Delta X_{t_{j}})^{3}. Under the regularity conditions on f and X, one can verify that \sum_{j} R_{j} \to 0 in L^{2} as \|\Pi\| \to 0.

Substitute \Delta X_{t_{j}} = a(t_{j})\Delta t_{j} + b(t_{j})\Delta B_{t_{j}} to split the first-order term, and use the quadratic variation result of Section 3—which gives (\Delta X_{t_{j}})^{2} = b^{2}(t_{j})(\Delta B_{t_{j}})^{2} + o(\Delta t_j) terms that vanish after summation—to handle the second-order term. Summing over j and passing to the limit as \|\Pi\|\to 0, the four resulting sums converge: \begin{aligned} \sum_{j} \frac{\partial f}{\partial t}(X_{t_{j}}, t_{j})\,\Delta t_{j} &\;\longrightarrow\; \int_{0}^{t}\frac{\partial f}{\partial s}(X_{s},s)\,\mathop{}\!\mathrm{d}s, \\[4pt] \sum_{j} \frac{\partial f}{\partial x}(X_{t_{j}}, t_{j})\,a(t_{j})\,\Delta t_{j} &\;\longrightarrow\; \int_{0}^{t}\frac{\partial f}{\partial x}\,a\,\mathop{}\!\mathrm{d}s, \\[4pt] \sum_{j} \frac{\partial f}{\partial x}(X_{t_{j}}, t_{j})\,b(t_{j})\,\Delta B_{t_{j}} &\;\xrightarrow{L^{2}}\; \int_{0}^{t}\frac{\partial f}{\partial x}\,b\,\mathop{}\!\mathrm{d}B_{s}, \\[4pt] \frac{1}{2}\sum_{j} \frac{\partial^{2} f}{\partial x^{2}}(X_{t_{j}}, t_{j})\,(\Delta X_{t_{j}})^{2} &\;\xrightarrow{L^{2}}\; \frac{1}{2}\int_{0}^{t}\frac{\partial^{2} f}{\partial x^{2}}\,b^{2}\,\mathop{}\!\mathrm{d}s. \end{aligned} The first two are Riemann integrals (using continuity of f and the paths of X); the third is the Itô integral by definition (2); the fourth uses the quadratic variation result. Combining yields the integral form of Itô’s formula, f(X_t, t) - f(X_0, 0) = \int_0^t \!\left(\frac{\partial f}{\partial s} + a\frac{\partial f}{\partial x} + \frac{1}{2}b^2\frac{\partial^2 f}{\partial x^2}\right)\mathop{}\!\mathrm{d}s + \int_0^t b\frac{\partial f}{\partial x}\mathop{}\!\mathrm{d}B_s, from which the differential form stated above follows immediately.

The additional term \frac{1}{2}b^{2}\frac{\partial^{2} f}{\partial x^{2}}\mathop{}\!\mathrm{d}t arises because of the non-zero quadratic variation (\mathop{}\!\mathrm{d}X_{t})^{2} = b^{2}\mathop{}\!\mathrm{d}t. This term has no classical analogue—in ordinary calculus where paths are smooth, the chain rule contains no such correction. The presence of this drift term is the defining feature of stochastic calculus and reflects the fractal, rough nature of Brownian motion. \square