10.6 Root-finding: bisection and Newton’s method

Many numerical tasks reduce to finding a root of a function: given ff, find xx^* such that f(x)=0f(x^*) = 0. The eigenvalues of a matrix are the roots of its characteristic polynomial. The natural frequencies of a damped oscillator are the roots of λ2+2γλ+ω02=0\lambda^2 + 2\gamma \lambda + \omega_0^2 = 0. The intersection of two curves is the root of their difference. The implicit equations that arise in numerical PDE-solvers (every step of every implicit-time-stepping scheme) need a root-finder.

This lesson covers the two basic algorithms — bisection (robust, slow) and Newton’s method (fast, fragile) — and points at the suite of safeguarded methods that combine both in production code.

Bisection: robust and slow

Bisection requires a function ff that is continuous on an interval [a,b][a, b] with opposite signs at the endpoints: f(a)f(b)<0f(a)\, f(b) < 0. The intermediate value theorem guarantees a root somewhere in (a,b)(a, b). The algorithm:

  1. Evaluate ff at the midpoint m=(a+b)/2m = (a + b)/2.
  2. If f(m)f(m) has the same sign as f(a)f(a), the root is in [m,b][m, b] — set ama \to m.
  3. Otherwise the root is in [a,m][a, m] — set bmb \to m.
  4. Stop when ba<toleranceb - a < \text{tolerance}.

Each step halves the interval, so after kk steps the root is bracketed to within (ba)/2k(b - a) / 2^k. To reduce the bracket by a factor of ε\varepsilon takes log2(1/ε)\log_2(1/\varepsilon) iterations — about 50 to reach machine precision starting from any reasonable bracket.

Bisection is robust: it always converges, and the convergence is monotone — every step shrinks the bracket. The downside is that each step gains only one bit of precision. For high-precision work or for functions where evaluations are expensive, bisection alone is too slow.

Newton’s method: fast and fragile

Newton’s method uses the derivative to take much bigger steps. Start with a guess x0x_0 near the root. Linearise ff around x0x_0 — replace ff by its tangent line — and solve for where that tangent crosses zero. Use the result as the next guess. Repeat.

The tangent line at xnx_n is y=f(xn)+f(xn)(xxn)y = f(x_n) + f'(x_n)(x - x_n). Setting y=0y = 0 and solving for xx:

  xn+1  =  xn    f(xn)f(xn).  \boxed{\;x_{n+1} \;=\; x_n \;-\; \frac{f(x_n)}{f'(x_n)}.\;}

This is Newton’s iteration. Each step costs one ff evaluation and one ff' evaluation. If ff' is hard to compute analytically there’s a closely-related secant method that uses a finite-difference estimate of ff'.

x_0x_4 = rootf(x) = x² − 2 (roots: ±√2)
function:
start: x₀ = 2.5000 iterations: 4 converged to x ≈ 1.414214

Click anywhere along the curve to set the starting point x₀. The red lines are the tangents at each iterate; Newton's method linearises f at x and jumps to where that tangent meets the x-axis. Where the iterates land on a true root they turn green. For x² − 2 the method converges quadratically from almost any start. For the cubic, a poorly-chosen start can oscillate or land on the wrong root. The logistic example has three roots and exhibits extreme sensitivity to x₀: tiny shifts in starting position send the iterates to different roots — the boundary between basins of attraction is fractal.

Click anywhere on the curve to set the starting point. The red lines are the tangents at each iterate; Newton’s method linearises ff at xx and jumps to where that tangent meets the xx-axis. With a good starting point, Newton finds the root in a handful of iterations — but with a bad starting point, the iterates can wander to a different root, oscillate, or diverge. The logistic preset shows the most dramatic version: three roots close together, and tiny shifts in x0x_0 send the iterates to completely different roots.

Quadratic convergence

The reason Newton is fast: each step squares the error. If xn=x+enx_n = x^* + e_n, then

xn+1x  =  xnxenf(xn)f(xn).x_{n+1} - x^* \;=\; \underbrace{x_n - x^*}_{e_n} - \frac{f(x_n)}{f'(x_n)}.

Taylor-expand ff around xx^*: f(xn)=f(x)en+12f(x)en2+O(en3)f(x_n) = f'(x^*)\, e_n + \tfrac12 f''(x^*) e_n^2 + \mathcal{O}(e_n^3), and f(xn)=f(x)+f(x)en+O(en2)f'(x_n) = f'(x^*) + f''(x^*) e_n + \mathcal{O}(e_n^2). Plugging into the iteration and simplifying:

en+1    f(x)2f(x)en2.e_{n+1} \;\approx\; \frac{f''(x^*)}{2\, f'(x^*)}\, e_n^2.

The error at step n+1n+1 is proportional to the square of the error at step nn. This is quadratic convergence. The number of correct digits doubles with each iteration. Starting from an error of 10110^{-1}, you get errors 102,104,108,101610^{-2}, 10^{-4}, 10^{-8}, 10^{-16} — machine precision after four iterations.

By contrast, bisection’s convergence is linear: each step multiplies the error by 1/21/2, so errors go 101,5×102,2.5×102,10^{-1}, 5 \times 10^{-2}, 2.5 \times 10^{-2}, \ldots — needing 5050 iterations to reach 101610^{-16}.

Worked example

Worked example: Newton's method on f(x) = x² − 2

The problem. Find 2\sqrt{2} by solving f(x)=x22=0f(x) = x^2 - 2 = 0.

Newton’s iteration: f(x)=2xf'(x) = 2x, so

xn+1  =  xnxn222xn  =  xn2+1xn.x_{n+1} \;=\; x_n - \frac{x_n^2 - 2}{2 x_n} \;=\; \frac{x_n}{2} + \frac{1}{x_n}.

Start with x0=1x_0 = 1.

| nn | xnx_n | f(xn)f(x_n) | error xn2|x_n - \sqrt{2}| | |---|---|---|---| | 0 | 1.000000000000 | 1.000-1.000 | 0.4142136 | | 1 | 1.500000000000 | 0.250\phantom{-}0.250 | 0.0857864 | | 2 | 1.416666666667 | 0.00694\phantom{-}0.00694 | 0.0024531 | | 3 | 1.414215686275 | 6.01×106\phantom{-}6.01 \times 10^{-6} | 2.124×1062.124 \times 10^{-6} | | 4 | 1.414213562374 | 4.5×1012\phantom{-}4.5 \times 10^{-12} | 1.594×10121.594 \times 10^{-12} | | 5 | 1.414213562373 | 0\phantom{-}0 | 0 |

Four iterations to 12-digit precision, five to machine precision. The error at each step is roughly the square of the previous error — classic quadratic convergence.

This iteration is the Babylonian method for square roots, known for ~4000 years. Newton’s method generalises it.

When Newton fails

Newton’s quadratic convergence is real but only kicks in near the root, with a smooth ff that has f(x)0f'(x^*) \neq 0. Failure modes:

The standard production-grade root-finder is safeguarded Newton: take a Newton step if it lands inside a bracket where the function changes sign, fall back to bisection if it doesn’t. Brent’s method is the standard implementation; it’s what scipy.optimize.brentq runs. It has Newton’s quadratic convergence when conditions are good and bisection’s guaranteed convergence when they aren’t.

Higher-dimensional Newton

For systems of nonlinear equations F(x)=0\mathbf{F}(\mathbf{x}) = \mathbf{0} with F:RnRn\mathbf{F} : \mathbb{R}^n \to \mathbb{R}^n, Newton generalises:

xn+1  =  xnJ1(xn)F(xn),\mathbf{x}_{n+1} \;=\; \mathbf{x}_n - J^{-1}(\mathbf{x}_n)\, \mathbf{F}(\mathbf{x}_n),

where JJ is the Jacobian matrix of partial derivatives. Each Newton step now requires (1) computing the Jacobian and (2) solving a linear system JΔx=FJ \Delta \mathbf{x} = -\mathbf{F} — which uses Gaussian elimination from Foundations 4.3. For large systems the linear-solve cost dominates, and quasi-Newton methods (BFGS and friends) approximate the Jacobian to avoid recomputing it from scratch each step.

This is the algorithm behind every implicit time-stepping scheme for stiff ODEs and parabolic PDEs: at each time step the recurrence is implicit in xn+1\mathbf{x}_{n+1}, so a Newton solve happens inside the time loop. It’s also the algorithm behind much of nonlinear optimisation, because finding a local minimum of g(x)g(\mathbf{x}) is equivalent to solving g(x)=0\nabla g(\mathbf{x}) = \mathbf{0}.

What we use this for

The bookshelf’s interactives invoke root-finding more often than is obvious:

Closing the chapter

This lesson closes Foundations 10 and the Math Foundations book in its current form. The chapter started with the question “how do simulations actually work?” and ended with the algorithms that produce numerical answers from continuous equations. The arc, in one paragraph: every continuous quantity becomes a finite array of numbers (10.1); time-marching schemes integrate ODEs and PDEs (10.2, 10.3); iterative averaging solves elliptic problems (10.4); the FFT computes frequency-domain transforms in O(NlogN)\mathcal{O}(N \log N) (10.5); and Newton’s method finds roots and ties everything together when nonlinear systems show up.

Every interactive on this site is one of these algorithms running inside the browser. The point of the chapter is that none of them are black boxes — they are all techniques you can re-implement yourself in fifty lines of code.