7.2 Wide-dynamic-range compression

The acoustic world has a dynamic range of about 80 dB: a whispered consonant arriving at the listener’s ear is ~30 dB SPL; an upcoming speaker projecting from across a room is ~70 dB SPL; a loud sound in a restaurant is ~85 dB SPL; a power tool a few metres away can hit 100 dB SPL. A normal listener accommodates this 80 dB range comfortably — the cochlear amplifier (see Hearing 4.5) is a compressive nonlinear element that maps 80 dB of input intensity onto roughly 30 dB of basilar-membrane displacement, with soft sounds receiving the largest amount of mechanical gain.

A patient with cochlear hearing loss has lost this compressive amplifier. Their audiometric threshold is elevated (they can no longer detect the softest sounds), but their upper comfort limit (UCL) typically rises by less — sometimes not at all (full recruitment) — so the residual dynamic range between threshold and UCL is narrower than normal. A patient with 50 dB HL of cochlear loss may have:

Threshold: 50 dB HL ≈ 60 dB SPL
UCL: 95 dB SPL
Residual dynamic range: 35 dB

The hearing aid’s first job is to compress the 80 dB acoustic range of the world into the patient’s 35 dB residual range — applying lots of gain to soft sounds (which are below threshold) and little or no gain to loud sounds (which already approach the UCL). This is wide-dynamic-range compression (WDRC).

compression threshold CT = 45 dB SPL compression ratio CR = 2.5 : 1

low-level gain = 30 dB MPO = 105 dB SPL

The audiogram patient has a *narrowed dynamic range*: thresholds elevated by the hearing loss, but the upper limit of comfortable loudness (UCL) elevated by less, or even unchanged (recruitment). The hearing aid's job is to fit the 80-dB acoustic range of typical speech (20–100 dB SPL) into the patient's residual dynamic range. WDRC achieves this with a kneepoint (CT) below which gain is constant (linear region), and above which the compression ratio limits gain growth. Soft speech gets the most help (high gain); loud sounds get the least (compression). Above the MPO (peak clipping ceiling) the output is hard-limited to prevent loudness discomfort and acoustic trauma. NAL-NL2 and DSL v5 are the two standard prescription algorithms that compute (CT, CR, gain) for each audiometric channel from the patient's audiogram.

The CT / CR / MPO triplet

The defining parameters of a WDRC channel are:

CT — compression threshold (or “kneepoint”): the input level above which compression begins. Below CT the channel is linear (constant gain).
CR — compression ratio: the ratio of input level change to output level change above CT. A 3:1 ratio means a 3 dB increase in input produces only a 1 dB increase in output. CR = 1:1 is linear; CR = ∞ is hard limiting.
MPO — maximum power output: the absolute ceiling on output level. Above MPO the channel hard-clips (or aggressively limits) to prevent loudness discomfort.

The input/output curve is piecewise linear:

L_\text{in} + G_0 & L_\text{in} < \text{CT} \\ \text{CT} + G_0 + \dfrac{L_\text{in} - \text{CT}}{\text{CR}} & \text{CT} \le L_\text{in} \le L_\text{MPO} \\ \text{MPO} & L_\text{in} > L_\text{MPO} \end{cases}$$ where $G_0$ is the low-level (linear-region) gain. A typical mild-moderate sensorineural fit might have $\text{CT} = 45$ dB SPL, $\text{CR} = 2.5$, $G_0 = 25$ dB, $\text{MPO} = 105$ dB SPL, which gives: - 30 dB SPL input → 55 dB SPL output (+25 dB, linear) - 60 dB SPL input → 70 + (15/2.5) = 76 dB SPL output (+16 dB, compressed) - 80 dB SPL input → 70 + (35/2.5) = 84 dB SPL output (+4 dB, compressed) - 100 dB SPL input → 70 + (55/2.5) = 92 dB SPL output (−8 dB) - Above 105 dB SPL output, hard limited. The result: 70 dB of input dynamic range gets compressed to ~30 dB of output dynamic range, sliding into the patient's residual hearing. ## Multichannel WDRC Single-channel WDRC is rare in modern devices — almost all contemporary hearing aids use *multichannel* WDRC, with 12 to 24 frequency channels each having its own (CT, CR, $G_0$, MPO) tuple. This matters because: - The patient's audiogram is not flat. A typical aging-loss patient has near-normal thresholds at 250–1000 Hz and 50–70 dB HL at 4000 Hz. The high-frequency channels need more low-level gain than the low-frequency channels. - The dynamic range varies across frequency. The patient's UCL at 250 Hz may be normal (~100 dB SPL) but at 4000 Hz it may be reduced by recruitment to perhaps 90 dB SPL. The high-frequency MPO is lower. - Speech information density varies across frequency. The articulation index (Ch 3) tells us which frequency bands carry the most speech-intelligibility information per dB of audibility; modern prescription algorithms (NAL-NL2 in particular) prescribe gain that *maximises* the SII for typical speech at typical SNRs, given the patient's audiogram. ## Attack and release The level estimator that drives the compressor is not instantaneous: it tracks a *smoothed* envelope of the input signal with a fast **attack time** (the time constant when the envelope rises) and a slower **release time** (the time constant when it falls). Typical values: | Mode | Attack (ms) | Release (ms) | Effect | |---|---|---|---| | **Fast (syllabic)** | 5–10 | 30–100 | Compresses syllable-by-syllable; preserves loudness contrast within words; can introduce pumping/artefacts | | **Slow (automatic volume control)** | 100–1000 | 1000–5000 | Compresses across utterances/scenes; preserves natural loudness contour within sentences; behaves more like a manual volume knob | | **Dual** | varies | varies | Many vendors use dual time constants — fast for sudden loud events, slow for sustained changes | The clinical evidence on syllabic vs slow compression is mixed and largely vendor-driven. Fast compression maximises soft-speech audibility at the cost of some natural-loudness distortion; slow compression preserves natural dynamics at the cost of slower adaptation to changing acoustic environments. Most current devices ship with a moderate default (attack ~10 ms, release ~150 ms) and offer adjustment for specific clinical issues. <Derivation title="Why compression ratios are limited above ~3:1" defaultOpen={false}> The compression ratio is bounded above by an artefact tradeoff. Consider a soft consonant (say, $/s/$ at 40 dB SPL) immediately followed by a loud vowel ($/aa/$ at 75 dB SPL) — typical conversational speech. With a fast-attack syllabic compressor with $\text{CT} = 45$ dB SPL, $\text{CR} = 5{:}1$: - The $/s/$ is below kneepoint, gets linear gain of $G_0 = 30$ dB → 70 dB SPL output. - The $/aa/$ excursion above kneepoint = $75 - 45 = 30$ dB. Compressed: $30/5 = 6$ dB. Output: $45 + 30 + 6 = 81$ dB SPL. The original 35 dB intensity difference between consonant and vowel has been compressed to 11 dB. This sounds *unnatural* — speech with abnormally low contrast between consonants and vowels has the characteristic "pumping" or "all-loud" quality that listeners describe as fatiguing. Empirically, listener preference studies (Souza *et al.* 2005, Moore *et al.* 2008) consistently find compression ratios above about 3:1 reduce both subjective preference and (in some studies) speech intelligibility for the consonant-rich portions of speech. Above 5:1 the device essentially becomes a slow compressor with output level pinned near MPO. Modern practice: compression ratios of 1.5:1 to 3:1 in audiometric bands with mild-moderate loss; up to 4:1 in bands with severe loss; hard limiting (effectively CR = ∞) only at the MPO ceiling. </Derivation> ## NAL-NL2 and DSL v5 The two dominant prescriptive algorithms are NAL-NL2 (National Acoustic Laboratories, Australia) and DSL v5 (Desired Sensation Level, University of Western Ontario). Both take the patient's audiogram as input and output the (CT, CR, $G_0$, MPO) tuple for each channel. - **NAL-NL2 (2010 update)** is designed to *maximise speech intelligibility* given typical speech-in-noise listening conditions, while keeping the overall loudness of the amplified signal equal to that perceived by a normal-hearing listener. It tends to give *less* gain than DSL v5 for the same audiogram, especially for low frequencies, on the principle that low-frequency gain disproportionately raises perceived loudness without improving intelligibility. - **DSL v5 (2005)** is designed to make the long-term average speech spectrum (LTASS) *audible* across the full audiometric range, while keeping peaks at comfortable listening level and avoiding excessive amplification at low input levels. DSL v5 generally gives *more* low-frequency gain than NAL-NL2. The DSL pediatric extension prescribes more aggressive low-frequency gain than the adult version, reflecting the evidence that audibility-driven prescriptions are more important during language acquisition. In US clinical practice, NAL-NL2 dominates for adult fittings; DSL v5 dominates for pediatric fittings. Both are computed in the hearing-aid fitting software from the audiogram, then verified by real-ear measurement (Ch 8 next). ## What WDRC cannot do WDRC restores *audibility* — it brings soft sounds back into the patient's residual dynamic range and prevents loud sounds from exceeding comfort. It does not restore the *frequency selectivity* of a damaged cochlea (broadened cochlear filters remain broadened); it does not restore the *temporal fine-structure* coding that supports speech-in-noise perception; it does not restore the cochlear amplifier's ability to *enhance* the signal at the expense of background noise. This is why patients with audiograms that look identical can have wildly different outcomes from hearing aids: the aid restores audibility but cannot restore the cochlea's other functions. A "well-fit" hearing aid that achieves the prescribed gain at every audiometric frequency, verified by real-ear measurement, is the *necessary* but not *sufficient* condition for good outcomes. Speech-in-noise benefit, in particular, depends heavily on factors WDRC cannot address — pushing audiologists toward the additional technologies of the next lesson. Next lesson: the two-microphone directional beamformer and the algorithms that improve SNR in noisy listening environments.