4.3 The wave on the membrane

In 4.2 I confessed that treating each place on the basilar membrane as an independent oscillator was a lie. The places are coupled: by the perilymph above the membrane (in scala vestibuli) and below it (in scala tympani). When the stapes pushes at the oval window, the fluid carries a pressure disturbance to every place along the BM at once, and the BM responds along its whole length together. The result is one of the most beautiful objects in classical mechanics: a wave that travels along a structure whose properties change as it goes, slows down as it approaches its preferred place, builds to a peak there, and dies out beyond.

Georg von Békésy was the first to see it directly, by carefully sectioning cadaveric cochleas and watching how the basilar membrane moved under controlled acoustic stimulation. He won the Nobel Prize in Physiology or Medicine for the work in 1961. What he saw — and what we are about to model — is the cochlear traveling wave.

The basilar-membrane impedance

To get a handle on the math, we begin by re-expressing the local oscillator equation of 4.2 in frequency-domain form. The relationship between local pressure p(x,ω)p(x, \omega) across the membrane and local transverse velocity v(x,ω)=iωηv(x, \omega) = -i\omega\,\eta is

p(x,ω)=ZBM(x,ω)v(x,ω),ZBM(x,ω)=b(x)+i ⁣(ωm(x)k(x)ω).p(x, \omega) = Z_{\text{BM}}(x, \omega)\, v(x, \omega), \qquad Z_{\text{BM}}(x, \omega) = b(x) + i\!\left(\omega\, m(x) - \frac{k(x)}{\omega}\right).

This ZBMZ_{\text{BM}} is the specific acoustic impedance of the basilar membrane — pressure per unit velocity.

Derivation: from the SHM equation to the BM impedance

Start with the time-domain equation from 4.2:

mη¨+bη˙+kη=p.m\,\ddot\eta + b\,\dot\eta + k\,\eta = p.

Go to the frequency domain by assuming sinusoidal time dependence η(t)=η~eiωt\eta(t) = \tilde\eta\, e^{-i\omega t} (we use the eiωte^{-i\omega t} convention so that the velocity v=η˙=iωη~eiωtv = \dot\eta = -i\omega \tilde\eta\, e^{-i\omega t} leads to clean expressions; the choice doesn’t matter for the final magnitude). Then η˙=iωη\dot\eta = -i\omega\eta and η¨=ω2η\ddot\eta = -\omega^2\eta.

Substituting:

(ω2miωb+k)η~=p~.(-\omega^2 m - i\omega b + k)\,\tilde\eta = \tilde p.

Now express in terms of velocity instead of displacement, using v~=iωη~\tilde v = -i\omega\tilde\eta, so η~=v~/(iω)=iv~/ω\tilde\eta = \tilde v/(-i\omega) = i\tilde v/\omega:

(ω2miωb+k)iv~ω=p~.(-\omega^2 m - i\omega b + k)\,\frac{i\tilde v}{\omega} = \tilde p.

Multiply out the i/ωi/\omega:

p~=(iωm+bikω)v~=[b+i ⁣(kωωm)]v~.\tilde p = \left(-i\omega m + b - \frac{ik}{\omega}\right)\,\tilde v = \left[b + i\!\left(\frac{k}{\omega} - \omega m\right)\right]\,\tilde v.

Wait — there’s a sign convention to pin down. With eiωte^{-i\omega t}, what we usually call “the impedance” requires the imaginary part to flip relative to what I just wrote (the two time-conventions differ by complex conjugation throughout). With the alternative e+iωte^{+i\omega t} convention, the impedance is

ZBM(x,ω)=b+i ⁣(ωmkω),Z_\text{BM}(x, \omega) = b + i\!\left(\omega m - \frac{k}{\omega}\right),

which is the form I’ll use throughout. (The two conventions are equally valid; pick one and stick with it. Acoustics texts commonly use e+iωte^{+i\omega t}, so I’ll match that.)

The imaginary part of ZBMZ_\text{BM} is the reactance. At ω<ω0=k/m\omega < \omega_0 = \sqrt{k/m}, the term k/ωk/\omega dominates and the reactance is negative — the membrane is stiffness-controlled (it resists like a spring). At ω=ω0\omega = \omega_0, the reactance vanishes — the membrane is resistive, with Z=bZ = b. At ω>ω0\omega > \omega_0, ωm\omega m dominates and the reactance is positive — the membrane is mass-controlled (it resists like a heavy load).

The imaginary part of ZBMZ_\text{BM} is stiffness-dominated when ω<ω0(x)\omega < \omega_0(x) (basal of the characteristic place), purely damping when ω=ω0(x)\omega = \omega_0(x), and mass-dominated when ω>ω0(x)\omega > \omega_0(x) (apical of the characteristic place). This is the same impedance you derived from the SHM equation in 4.2, just written in pressure-and-velocity form.

The long-wave cochlear wave equation

Now we couple the local impedance to the fluid. Take the perilymph in scala vestibuli and scala tympani to be incompressible with density ρ\rho, and treat each scala as having a roughly constant cross-sectional area AA. The two scalae are connected to each other across the basilar membrane (and at the helicotrema at the apex, but we’ll handle that as a boundary condition). The cochlear long-wave equation for the pressure difference p(x,t)p(x, t) across the membrane is

2px2=2ρAvBMt.\frac{\partial^2 p}{\partial x^2} = -\frac{2\rho}{A}\, \frac{\partial v_{\text{BM}}}{\partial t}.

Substituting vBM=p/ZBMv_\text{BM} = p/Z_\text{BM} and going to the frequency domain by writing p(x,t)=P(x)eiωtp(x, t) = P(x)\, e^{i\omega t} gives

d2Pdx2+κ2(x,ω)P=0,κ2(x,ω)=2iωρA  ZBM(x,ω).\frac{d^2 P}{dx^2} + \kappa^2(x, \omega)\, P = 0, \qquad \kappa^2(x, \omega) = \frac{2 i \omega\, \rho}{A\; Z_{\text{BM}}(x, \omega)}.

(We use κ\kappa for the wavenumber to avoid clashing with the stiffness kk.)

Derivation: the cochlear long-wave equation from mass conservation

The cochlea is a tube. Two parallel chambers (scala vestibuli above, scala tympani below) run side by side along the cochlea, separated by the basilar membrane. The fluid in each is incompressible.

Let u(x,t)u(x, t) be the longitudinal volume velocity in scala vestibuli (m³/s) — fluid flowing rightward, toward the apex. Mass conservation in an incompressible fluid says that any fluid entering an element of the chamber must leave; if the chamber’s cross-section AA is constant, the longitudinal flow can only change if fluid is leaving the chamber sideways — that is, if the basilar membrane is moving.

Consider an element of scala vestibuli between xx and x+dxx + dx, of width w(x)w(x) (the basilar-membrane width at this point). The rate at which fluid crosses the BM out of scala vestibuli, per unit length, is w(x)vBM(x,t)w(x)\, v_\text{BM}(x, t) (BM transverse velocity times its width). Conservation of mass in the chamber gives

ux=w(x)vBM(x,t).\frac{\partial u}{\partial x} = -w(x)\, v_\text{BM}(x, t).

By symmetry, the volume velocity in scala tympani has the opposite sign. The pressure gradient drives the volume velocity through Newton’s second law applied to fluid (the Euler equation for inviscid fluid):

ρ0u/At=pVx,\rho_0\, \frac{\partial u/A}{\partial t} = -\frac{\partial p_V}{\partial x},

where pVp_V is the pressure in scala vestibuli and ρ0\rho_0 is the perilymph density. Doing the same for scala tympani (with pTp_T) and subtracting gives an equation for the pressure difference p=pVpTp = p_V - p_T:

ρ02Aut=px,\rho_0\, \frac{2}{A}\, \frac{\partial u}{\partial t} = -\frac{\partial p}{\partial x},

where the factor of 2 comes from the two scalae both contributing.

Differentiating with respect to xx and substituting u/x=wvBM\partial u/\partial x = -w v_\text{BM}:

2px2=2ρ0At(wvBM)(1)=2ρ0wAvBMt.\frac{\partial^2 p}{\partial x^2} = -\frac{2\rho_0}{A}\, \frac{\partial}{\partial t}\left(-w v_\text{BM}\right) \cdot (-1) = \frac{2 \rho_0\, w}{A}\, \frac{\partial v_\text{BM}}{\partial t}.

If ww is roughly constant or absorbed into the definition of impedance, and we drop the explicit width, we recover

2px2=2ρAvBMt\frac{\partial^2 p}{\partial x^2} = -\frac{2\rho}{A}\, \frac{\partial v_\text{BM}}{\partial t}

(the sign convention is set by the direction of vBMv_\text{BM} relative to pressure increase). Substituting vBM=p/ZBMv_\text{BM} = p/Z_\text{BM} and going to frequency-domain (/tiω\partial/\partial t \to i\omega with the e+iωte^{+i\omega t} convention):

d2Pdx2=2ρAiωPZBM.\frac{d^2 P}{dx^2} = -\frac{2\rho}{A}\, \frac{i\omega P}{Z_\text{BM}}.

This is the wave equation d2P/dx2+κ2P=0d^2 P/dx^2 + \kappa^2 P = 0 with

κ2(x,ω)=2iωρA  ZBM(x,ω).\kappa^2(x, \omega) = \frac{2 i \omega\, \rho}{A\; Z_\text{BM}(x, \omega)}.

This is the cochlear dispersion relation. ∎

This derivation is simplified — it elides the detailed treatment of the basilar-membrane width w(x)w(x), the geometry of the scalae, and the precise boundary conditions at the stapes and helicotrema. For a careful treatment, see Lighthill, Waves in Fluids (1978), chapters 4–5; or Steele & Taber (1979) for the original modern long-wave derivation; or chapters 6–8 of Pickles’ Introduction to the Physiology of Hearing.

This is the cochlear dispersion relation, and reading it carefully tells the entire story of the traveling wave.

Look at what κ2\kappa^2 does in three regions of xx:

WKB and the traveling wave

In the WKB approximation (valid because the cochlea’s properties vary slowly on the wavelength scale, at least in the long-wave region), the solution to the wave equation looks like

P(x)P0κ(x,ω)exp ⁣(i0xκ(x,ω)dx).P(x) \approx \frac{P_0}{\sqrt{\kappa(x, \omega)}}\, \exp\!\left( i \int_0^x \kappa(x', \omega)\, dx' \right).
Derivation: the WKB approximation

The exact wave equation d2P/dx2+κ2(x)P=0d^2 P/dx^2 + \kappa^2(x) P = 0 has no closed-form solution when κ\kappa varies with xx. But when κ\kappa varies slowly — meaning dκ/dxκ2|d\kappa/dx| \ll \kappa^2 — we can find an approximate solution.

Guess P(x)=A(x)eiΦ(x)P(x) = A(x)\, e^{i\Phi(x)} where A(x)A(x) is a slowly varying amplitude and Φ(x)\Phi(x) is the phase. Take derivatives:

dPdx=(A+iAΦ)eiΦ,\frac{dP}{dx} = (A' + i A \Phi')\, e^{i\Phi},d2Pdx2=(A+2iAΦ+iAΦA(Φ)2)eiΦ.\frac{d^2 P}{dx^2} = (A'' + 2i A' \Phi' + i A \Phi'' - A (\Phi')^2)\, e^{i\Phi}.

Substitute into the wave equation:

A+2iAΦ+iAΦA(Φ)2+κ2A=0.A'' + 2i A' \Phi' + i A \Phi'' - A (\Phi')^2 + \kappa^2 A = 0.

Separate real and imaginary parts:

Real: AA(Φ)2+κ2A=0A'' - A (\Phi')^2 + \kappa^2 A = 0 Imag: 2AΦ+AΦ=02 A' \Phi' + A \Phi'' = 0

For slowly-varying AA, drop AA'' compared to Aκ2A\kappa^2. The real part gives (Φ)2=κ2(\Phi')^2 = \kappa^2, so Φ=κ\Phi' = \kappa (choosing the right-going branch), and Φ(x)=0xκ(x)dx\Phi(x) = \int_0^x \kappa(x') dx'.

The imaginary part can be rewritten as (A2Φ)=0(A^2 \Phi')' = 0, so A2ΦA^2 \Phi' is constant. Since Φ=κ\Phi' = \kappa, A2κA^2 \kappa is constant, so A1/κA \propto 1/\sqrt{\kappa}.

Combining, the WKB approximation is

P(x)P0κ(x)exp ⁣(i0xκ(x)dx).P(x) \approx \frac{P_0}{\sqrt{\kappa(x)}}\, \exp\!\left(i \int_0^x \kappa(x') dx'\right).

The phase κdx\int \kappa dx tracks the cumulative phase accumulation; the prefactor 1/κ1/\sqrt{\kappa} comes from energy conservation (where the wave slows, the amplitude of pressure grows in just the right way to keep the energy flux constant). ∎

The exponential carries a phase that accumulates along the wave’s journey from the stapes; the 1/κ1/\sqrt{\kappa} prefactor is the slowly-varying envelope. The basilar-membrane displacement is then η=vBM/(iω)=P/(iωZBM)\eta = v_{\text{BM}}/(i\omega) = P/(i\omega\, Z_{\text{BM}}), and the displacement amplitude η(x)|\eta(x)| peaks sharply where ZBM|Z_{\text{BM}}| is smallest — at xCFx_{\text{CF}}. The peak’s height scales with QQ. Its width scales with 1/Q1/Q. We are back to the resonance curve of 4.2, but now it has been painted spatially along the cochlea.

The interactive below renders the traveling wave for one input frequency at a time, using a simplified version of the cochlear long-wave model. Drag the slider. Watch the peak migrate along the basilar membrane; watch the wavelength shrink as the wave approaches the characteristic place; watch the wave die past it.

xCF15.2 mmbase / 0 mm (stapes)apex / 35 mmη(x, t)
1 kHz
characteristic place
15.2 mm from stapes
Q (fixed)
8

A few things worth flagging.

First, the model in the interactive is the passive cochlear wave — what you would observe in a dead cochlea, or in any cochlea at very high stimulus levels where the active feedback I will introduce in 4.5 has saturated. Real living cochleas show a much sharper, taller, more asymmetric peak at much lower stimulus levels, because of the outer hair cells. The passive wave tells us where the peak will be and roughly what shape it has. 4.5 will tell us how that peak gets sharpened.

Second, the wave’s phase accumulates as it travels. At the characteristic place, the phase has lagged the input by several full cycles — perhaps five to ten, depending on frequency. This will matter in movement 6: when the auditory nerve fires in phase-locked synchrony to the BM motion, the spikes inherit this phase lag, and the brain can in principle use the inter-channel timing for fine localization and pitch.

Third, and most importantly, this is the engine of place coding for real sounds. Send in a complex stimulus like “Hey Dr. Miles!” and the cochlea performs a near-instantaneous Fourier analysis: each frequency component sets up its own traveling wave with its own characteristic place. The /h/ aspiration’s broadband energy lights up the basal half of the BM. The /m/ nasal resonance lights up a band around 1 kHz. The diphthong /aɪ/ in Miles has formants in the range 600 to 2200 Hz that paint stripes further along the membrane. By the time the phrase has played, the entire basilar membrane has been mapped — in space, in time, and in amplitude — into a 35-mm-long picture of what was said. We will look at that picture, in full, at the end of movement 5.

Section 4.4 will write down the precise map from frequency to characteristic place — the Greenwood function. With that, we will know exactly what the cochlea sends downstream.