10.6 Root-finding: bisection and Newton’s method

Many numerical tasks reduce to finding a root of a function: given $f$ , find $x^*$ such that $f(x^*) = 0$ . The eigenvalues of a matrix are the roots of its characteristic polynomial. The natural frequencies of a damped oscillator are the roots of $\lambda^2 + 2\gamma \lambda + \omega_0^2 = 0$ . The intersection of two curves is the root of their difference. The implicit equations that arise in numerical PDE-solvers (every step of every implicit-time-stepping scheme) need a root-finder.

This lesson covers the two basic algorithms — bisection (robust, slow) and Newton’s method (fast, fragile) — and points at the suite of safeguarded methods that combine both in production code.

Bisection: robust and slow

Bisection requires a function $f$ that is continuous on an interval $[a, b]$ with opposite signs at the endpoints: $f(a)\, f(b) < 0$ . The intermediate value theorem guarantees a root somewhere in $(a, b)$ . The algorithm:

Evaluate $f$ at the midpoint $m = (a + b)/2$ .
If $f(m)$ has the same sign as $f(a)$ , the root is in $[m, b]$ — set $a \to m$ .
Otherwise the root is in $[a, m]$ — set $b \to m$ .
Stop when $b - a < \text{tolerance}$ .

Each step halves the interval, so after $k$ steps the root is bracketed to within $(b - a) / 2^k$ . To reduce the bracket by a factor of $\varepsilon$ takes $\log_2(1/\varepsilon)$ iterations — about 50 to reach machine precision starting from any reasonable bracket.

Bisection is robust: it always converges, and the convergence is monotone — every step shrinks the bracket. The downside is that each step gains only one bit of precision. For high-precision work or for functions where evaluations are expensive, bisection alone is too slow.

Newton’s method: fast and fragile

Newton’s method uses the derivative to take much bigger steps. Start with a guess $x_0$ near the root. Linearise $f$ around $x_0$ — replace $f$ by its tangent line — and solve for where that tangent crosses zero. Use the result as the next guess. Repeat.

The tangent line at $x_n$ is $y = f(x_n) + f'(x_n)(x - x_n)$ . Setting $y = 0$ and solving for $x$ :

\boxed{\;x_{n+1} \;=\; x_n \;-\; \frac{f(x_n)}{f'(x_n)}.\;}

This is Newton’s iteration. Each step costs one $f$ evaluation and one $f'$ evaluation. If $f'$ is hard to compute analytically there’s a closely-related secant method that uses a finite-difference estimate of $f'$ .

function:

start: x₀ = 2.5000 iterations: 4 converged to x ≈ 1.414214

Click anywhere along the curve to set the starting point x₀. The red lines are the tangents at each iterate; Newton's method linearises f at x and jumps to where that tangent meets the x-axis. Where the iterates land on a true root they turn green. For x² − 2 the method converges quadratically from almost any start. For the cubic, a poorly-chosen start can oscillate or land on the wrong root. The logistic example has three roots and exhibits extreme sensitivity to x₀: tiny shifts in starting position send the iterates to different roots — the boundary between basins of attraction is fractal.

Click anywhere on the curve to set the starting point. The red lines are the tangents at each iterate; Newton’s method linearises $f$ at $x$ and jumps to where that tangent meets the $x$ -axis. With a good starting point, Newton finds the root in a handful of iterations — but with a bad starting point, the iterates can wander to a different root, oscillate, or diverge. The logistic preset shows the most dramatic version: three roots close together, and tiny shifts in $x_0$ send the iterates to completely different roots.

Quadratic convergence

The reason Newton is fast: each step squares the error. If $x_n = x^* + e_n$ , then

x_{n+1} - x^* \;=\; \underbrace{x_n - x^*}_{e_n} - \frac{f(x_n)}{f'(x_n)}.

Taylor-expand $f$ around $x^*$ : $f(x_n) = f'(x^*)\, e_n + \tfrac12 f''(x^*) e_n^2 + \mathcal{O}(e_n^3)$ , and $f'(x_n) = f'(x^*) + f''(x^*) e_n + \mathcal{O}(e_n^2)$ . Plugging into the iteration and simplifying:

e_{n+1} \;\approx\; \frac{f''(x^*)}{2\, f'(x^*)}\, e_n^2.

The error at step $n+1$ is proportional to the square of the error at step $n$ . This is quadratic convergence. The number of correct digits doubles with each iteration. Starting from an error of $10^{-1}$ , you get errors $10^{-2}, 10^{-4}, 10^{-8}, 10^{-16}$ — machine precision after four iterations.

By contrast, bisection’s convergence is linear: each step multiplies the error by $1/2$ , so errors go $10^{-1}, 5 \times 10^{-2}, 2.5 \times 10^{-2}, \ldots$ — needing $50$ iterations to reach $10^{-16}$ .

Worked example

▶ Worked example: Newton's method on f(x) = x² − 2 Derivation

The problem. Find $\sqrt{2}$ by solving $f(x) = x^2 - 2 = 0$ .

Newton’s iteration: $f'(x) = 2x$ , so

x_{n+1} \;=\; x_n - \frac{x_n^2 - 2}{2 x_n} \;=\; \frac{x_n}{2} + \frac{1}{x_n}.

Start with $x_0 = 1$ .

| $n$ | $x_n$ | $f(x_n)$ | error $|x_n - \sqrt{2}|$ | |---|---|---|---| | 0 | 1.000000000000 | $-1.000$ | 0.4142136 | | 1 | 1.500000000000 | $\phantom{-}0.250$ | 0.0857864 | | 2 | 1.416666666667 | $\phantom{-}0.00694$ | 0.0024531 | | 3 | 1.414215686275 | $\phantom{-}6.01 \times 10^{-6}$ | $2.124 \times 10^{-6}$ | | 4 | 1.414213562374 | $\phantom{-}4.5 \times 10^{-12}$ | $1.594 \times 10^{-12}$ | | 5 | 1.414213562373 | $\phantom{-}0$ | 0 |

Four iterations to 12-digit precision, five to machine precision. The error at each step is roughly the square of the previous error — classic quadratic convergence.

This iteration is the Babylonian method for square roots, known for ~4000 years. Newton’s method generalises it.

▶ Two Newton iterations on f(x) = x^3 - x - 1 Worked Example

Find the real root of $f(x) = x^3 - x - 1$ starting from $x_0 = 1.5$ .

The iteration is $x_{n+1} = x_n - f(x_n)/f'(x_n)$ with $f'(x) = 3x^2 - 1$ .

Iteration 1: $f(1.5) = 3.375 - 1.5 - 1 = 0.875$ , $f'(1.5) = 6.75 - 1 = 5.75$ .

x_1 = 1.5 - \frac{0.875}{5.75} = 1.5 - 0.15217 = 1.34783.

Iteration 2: $f(1.34783) = 2.4467 - 1.3478 - 1 = 0.0989$ , $f'(1.34783) = 5.450 - 1 = 4.450$ .

x_2 = 1.34783 - \frac{0.0989}{4.450} = 1.34783 - 0.02222 = 1.32561.

The true root is $x^* \approx 1.32472$ . After just two iterations the error has dropped from $0.175$ to $0.00089$ — the quadratic-convergence signature where each step roughly squares the relative error.

When Newton fails

Newton’s quadratic convergence is real but only kicks in near the root, with a smooth $f$ that has $f'(x^*) \neq 0$ . Failure modes:

Bad starting point. Newton finds a root, but maybe not the one you want. Or the iterates jump out of the domain. Or they oscillate around an inflection point.
$f'$ near zero. The denominator in the update vanishes. The iterate flies off to infinity. Happens near local minima/maxima of $f$ .
Multiple roots ( $f(x^*) = f'(x^*) = 0$ ). Convergence degrades from quadratic to linear, and accuracy suffers.
Chaotic basins of attraction. When $f$ has multiple roots, the set of starting points that converges to each root can be fractal (Newton fractals). The logistic preset in the interactive demonstrates this.

The standard production-grade root-finder is safeguarded Newton: take a Newton step if it lands inside a bracket where the function changes sign, fall back to bisection if it doesn’t. Brent’s method is the standard implementation; it’s what scipy.optimize.brentq runs. It has Newton’s quadratic convergence when conditions are good and bisection’s guaranteed convergence when they aren’t.

Higher-dimensional Newton

For systems of nonlinear equations $\mathbf{F}(\mathbf{x}) = \mathbf{0}$ with $\mathbf{F} : \mathbb{R}^n \to \mathbb{R}^n$ , Newton generalises:

\mathbf{x}_{n+1} \;=\; \mathbf{x}_n - J^{-1}(\mathbf{x}_n)\, \mathbf{F}(\mathbf{x}_n),

where $J$ is the Jacobian matrix of partial derivatives. Each Newton step now requires (1) computing the Jacobian and (2) solving a linear system $J \Delta \mathbf{x} = -\mathbf{F}$ — which uses Gaussian elimination from Foundations 4.3. For large systems the linear-solve cost dominates, and quasi-Newton methods (BFGS and friends) approximate the Jacobian to avoid recomputing it from scratch each step.

This is the algorithm behind every implicit time-stepping scheme for stiff ODEs and parabolic PDEs: at each time step the recurrence is implicit in $\mathbf{x}_{n+1}$ , so a Newton solve happens inside the time loop. It’s also the algorithm behind much of nonlinear optimisation, because finding a local minimum of $g(\mathbf{x})$ is equivalent to solving $\nabla g(\mathbf{x}) = \mathbf{0}$ .

What we use this for

The bookshelf’s interactives invoke root-finding more often than is obvious:

CharacteristicTrickCarousel (MATH-28) — finds eigenvalues of the characteristic polynomial. For 2×2 this is the quadratic formula; for larger it would be a numerical polynomial-root algorithm like Jenkins–Traub or QR iteration on the companion matrix.
FixedPointGallery (MATH-29) — eigenvalues of 2×2 matrices, again by the quadratic formula.
Implicit time-stepping for stiff systems (not currently used in any interactive but the standard tool for real simulations).

Closing the chapter

This lesson closes Foundations 10 and the Math Foundations book in its current form. The chapter started with the question “how do simulations actually work?” and ended with the algorithms that produce numerical answers from continuous equations. The arc, in one paragraph: every continuous quantity becomes a finite array of numbers (10.1); time-marching schemes integrate ODEs and PDEs (10.2, 10.3); iterative averaging solves elliptic problems (10.4); the FFT computes frequency-domain transforms in $\mathcal{O}(N \log N)$ (10.5); and Newton’s method finds roots and ties everything together when nonlinear systems show up.

Every interactive on this site is one of these algorithms running inside the browser. The point of the chapter is that none of them are black boxes — they are all techniques you can re-implement yourself in fifty lines of code.