7.3 Directional microphones and noise reduction

WDRC restores audibility but does nothing for the signal-to-noise ratio the cochlea receives. A hearing-impaired listener in a noisy restaurant, listening to a conversation across the table, has the same SNR through their well-fit hearing aid as without it — and that SNR is the dominant limiter of speech intelligibility for sensorineural-loss patients. The two algorithmic interventions that do improve SNR are directional microphones (a beamforming technique that exploits spatial separation of signal and noise) and single-channel noise reduction (a statistical technique that exploits temporal differences between speech and noise within a channel). This lesson covers both.

The two-microphone first-order beamformer

A modern hearing aid contains two omnidirectional microphones, separated by roughly 10–15 mm and aligned along the front-back axis of the body. The microphones individually have no spatial selectivity. Spatial selectivity arises from combining the two outputs with an electronic delay and subtraction, producing a first-order differential pattern whose general form is

$p(\theta) = A + B \cos \theta,$

where $\theta$ is the angle from the front of the head and $(A, B)$ are coefficients set by the combination weights. The pattern shape is determined by the ratio $A/B$ .

pattern:

noise azimuth = 180°

A hearing aid's two omnidirectional microphones, separated by 10–15 mm and combined with an electronic delay-and-subtract, form a first-order directional pattern p(θ) = A + B·cos(θ). The cardioid (A=B=0.5) nulls directly behind. The hypercardioid (A=0.25, B=0.75) and supercardioid (A=0.37, B=0.63) trade null position for maximum directivity. The directivity index (DI) is the SNR improvement over omni for diffuse (everywhere-equal) noise — about 4.8 dB for cardioid and 6.0 dB for hypercardioid. Modern adaptive directional hearing aids continuously steer the null toward the dominant noise source — when the noise moves, the null follows it. This gives 3–6 dB of SNR improvement in restaurant-noise conditions, equivalent to roughly halving the difficulty of speech-in-noise listening.

The standard families:

| Pattern | $A$ | $B$ | $A/B$ | Null at | DI (dB) | |---|---|---|---|---|---| | Omni | 1.0 | 0.0 | ∞ | — | 0.0 | | Cardioid | 0.5 | 0.5 | 1.0 | 180° | 4.8 | | Supercardioid | 0.37 | 0.63 | 0.59 | 125° | 5.7 | | Hypercardioid | 0.25 | 0.75 | 0.33 | 109° | 6.0 | | Bidirectional (figure-8) | 0.0 | 1.0 | 0.0 | 90° | 4.8 |

The directivity index (DI) is the SNR improvement for a diffuse (everywhere-equal) noise field over an omnidirectional reference, given a front-arriving signal. The hypercardioid achieves the highest theoretical DI of any first-order pattern (6.0 dB) but at the cost of having its nulls off the rear axis at ±109°. The cardioid has a null directly behind, simpler to implement and intuitive, at a small DI cost.

How the beamformer is implemented

The first-order pattern is built from two omni mic signals $m_1(t)$ (front) and $m_2(t)$ (rear) by:

Delaying $m_2$ by $\tau$ samples, where $\tau$ corresponds to the inter-microphone acoustic delay for the target null direction. For a null at the rear of a 12 mm front-back-separated array, $\tau = 12 \text{ mm} / 340 \text{ m/s} \approx 35\ \mu$ s = 1.7 samples at 48 kHz.
Subtracting: $y(t) = m_1(t) - m_2(t - \tau)$ .

A signal arriving from the null direction reaches mic 1 at $t = 0$ and mic 2 at $t = -\tau$ (mic 2 is behind the source), so $m_2(t - \tau) = m_1(t)$ and the difference is zero. A signal from the front reaches mic 1 at $t = 0$ and mic 2 at $t = +\tau$ , so $m_2(t - \tau) = m_1(t - 2\tau)$ and the difference is approximately $m_1(t) - m_1(t - 2\tau) \approx 2\tau \cdot \dot m_1(t)$ — a high-pass-filtered version of the source. This intrinsic high-pass response is corrected by an equaliser downstream so the directional output has flat frequency response.

The null direction (and therefore the pattern shape) is controlled simply by changing $\tau$ . Setting $\tau = 0$ gives a bidirectional pattern (signals from front and back both pass, sides cancel); setting $\tau$ to the rear-source delay gives the cardioid; intermediate values give supercardioid and hypercardioid.

Adaptive directionality

Static directional patterns are useful but coarse: a fixed cardioid nulls the rear, but in a real noisy environment the dominant noise may be coming from the side, or from multiple sources at once. Adaptive directional processing continuously estimates the dominant noise direction and steers the null toward it.

The algorithm: compute, in short windows (10–100 ms), the cross-correlation between $m_1$ and $m_2$ as a function of $\tau$ . The $\tau$ at which the cross-correlation peaks corresponds to the direction of arrival of the dominant signal. The fastest method classifies this direction as either “target” (within ±30° of the front, presumed to be a person the user is looking at) or “noise” (everything else), and steers the null toward the strongest non-target direction.

Performance: in fixed noise environments (a TV in the corner), adaptive directionality buys 4–6 dB of SNR improvement over omni at the listener’s ear. In rapidly-changing noise (a busy restaurant with multiple talkers), the adaptive system updates fast enough to track the dominant interferer in real time, providing roughly 3–4 dB of SNR improvement on average.

Limitations:

Single null only. A first-order pattern has at most one null direction. Multi-talker babble in a restaurant arrives from all directions at once; nulling one talker doesn’t help against the diffuse field.
Speech-source assumption. The system assumes the target is at the front. A patient looking directly at the talker benefits; a patient looking away (e.g., to read a menu) loses the benefit. Some modern systems offer “follow-the-gaze” via head-tracking sensors.
Low-frequency limitation. The 12 mm microphone separation is small compared to long wavelengths; below ~500 Hz the array becomes essentially omnidirectional. This is acceptable because most noise has substantial low-frequency content that isn’t dominated by any single direction anyway, but it limits the SNR improvement on low-frequency interferers.

Multi-microphone arrays and binaural beamforming

Two contemporary developments push the array beyond the first-order limit:

Bilateral binaural beamforming treats the four microphones across the two hearing aids (two per ear, one ear with the other ear’s signal streamed via the inter-aural radio link) as a single 4-element array. The effective array spans the entire head width (~15 cm), letting the beamformer exploit the head shadow and low-frequency time-of-arrival cues that single-aid arrays cannot. Practical SNR improvements over single-aid adaptive directionality are 1–3 dB additional in noisy restaurants.
DNN-based beamforming and source separation, increasingly common in 2024–2026 devices, replaces the linear-combination beamformer with a learned non-linear separation network. The DNN takes the multi-microphone input and outputs a target-speech-only signal, having learned to suppress noise sources based on training data rather than direction-of-arrival. Published evaluations of recent commercial DNN-based products (Phonak Sphere with its on-device deep learning, Starkey Edge AI, Widex Sheer with its “Sound Class 360”) show 4–8 dB SNR improvements in restaurant scenarios — substantially better than first-order linear beamformers, though the cost is increased computational load and battery drain.

Single-channel noise reduction

Distinct from directional microphones is single-channel statistical noise reduction, which operates per-band on the temporal characteristics of the signal rather than its spatial origin. The principle: speech has rapid amplitude modulations (~5 Hz syllable rate, 50 Hz envelope fluctuations); background noise (HVAC hum, traffic, distant babble) is much more steady-state. In each filterbank channel:

Estimate the long-term envelope statistics (mean, variance, modulation depth).
Classify the channel as “speech-dominant” (high modulation depth) or “noise-dominant” (low modulation depth, steady amplitude).
Attenuate noise-dominant channels by 6–18 dB; leave speech-dominant channels at full gain.

Modulation-based noise reduction does not actually improve speech intelligibility in most controlled tests — the attenuation is applied to bands where speech information is sparse anyway, so the audible speech is unchanged. What it does improve is listening comfort. Patients report less fatigue and more pleasant listening in steady-noise environments. Comfort effects are clinically real and significantly affect long-term hearing-aid use rates.

Newer single-channel noise reduction is increasingly DNN-based, using spectrogram-domain masks predicted by a small recurrent or convolutional network. DNN noise reduction can apply more aggressive attenuation than statistical methods without introducing audible artefacts, and recent evaluations suggest small but real intelligibility benefits (1–2 dB SNR equivalent) on top of the comfort benefit.

Why hearing aids still struggle in restaurants

Even with adaptive directional microphones, bilateral beamforming, and aggressive noise reduction, hearing-aid SNR improvements in restaurant-noise scenarios cap at about 6–10 dB over omni. A typical patient with moderate cochlear loss has 5–10 dB of SNR loss relative to normal hearing (Ch 3); the hearing aid’s 6–10 dB of SNR improvement brings them roughly to normal-hearing SNR — not to better than normal. Restaurants and group conversations remain hard.

This is the gap that motivates the technologies covered next: real-ear verification (Ch 8), accessory remote microphones, and cochlear implants when the cochlear damage exceeds what amplification can compensate.

Next lesson: feedback cancellation (without which RIC open-fit aids would not exist) and frequency lowering (the strategy for dealing with audiometric “dead regions” where amplification fails).