3.2 The speech banana and the audibility map

The audiogram shows what the patient can detect; the speech banana shows what speech is. Overlaying the two lets you answer a question that pure tones alone cannot: which phonemes is this listener missing? A patient whose threshold curve dips into the banana — even by 10 dB — loses portions of the speech signal entirely. Those portions are systematic: low-frequency loss costs vowels; high-frequency loss costs fricatives; mid-frequency loss costs voiced consonants and formant transitions.

This lesson develops the long-term average speech spectrum (LTASS), the banana shape it produces on the audiogram, and the phoneme map that lives inside it.

The long-term average speech spectrum

A long recording of conversational speech, analysed for its average spectral content over many seconds, produces a characteristic shape. The long-term average speech spectrum (LTASS) at conversational level (about 65 dB SPL overall) has:

The spectrum is not a single line; speech is dynamic, so any given moment of speech has energy spanning a range above and below the long-term average. The standard audiometric convention is to plot the LTASS as a band — typically the 1st-to-99th percentile envelope of the moment-to-moment spectrum — rather than a single curve. That band, replotted in audiometric coordinates (frequency on log-x, dB HL on inverted-y), is the speech banana.

What lives in the banana

Different phonemes occupy different regions of the banana:

The high-frequency-low-intensity nature of fricatives is the central audiological problem of presbycusis and noise-induced hearing loss. Both losses preferentially affect the high frequencies (typically 2–8 kHz), exactly where the softest and most-informationally-important phonemes live. The classic patient complaint — “I can hear, I just can’t understand” — is the audiological signature of this mismatch.

-100204060801001252505001k2k4k8kspeech bananauoairmnʃsfθphonemes inaudible3 of 11legendvowelsvoiced consonantsfricativesinaudible
preset:

The shaded "speech banana" is the long-term average spectrum of conversational speech at about 65 dB SPL, replotted in audiometric coordinates. Phonemes are scattered inside it: vowels concentrate at low frequencies with high energy; voiced consonants are mid-frequency; fricatives like /s/, /f/, /ʃ/, /θ/ are high-frequency and low-energy. A listener's threshold curve is overlaid in blue. Any phoneme that sits below the threshold (i.e., quieter than threshold at its frequency) is grayed out — inaudible. High-frequency sloping losses cut fricatives first; the resulting "I can hear, I just can't understand" complaint is what brings most adults to the clinic.

The interactive overlays a sample listener’s threshold curve (blue X markers) on the speech banana. Phonemes that fall above the listener’s threshold remain audible (coloured by group); phonemes that fall below are grayed out — inaudible. The counter on the side tracks how many phonemes are inaudible. Switch presets to see how different loss patterns selectively erase different parts of the speech signal:

The “count the dots” heuristic

A clinical shorthand: at any given threshold curve, count how many phoneme markers the listener can still hear. A WRS in quiet correlates roughly with the count — losing 1-2 phonemes drops WRS by ~10%; losing 4-5 drops it by 30-40%. The map is imprecise (because it doesn’t capture contextual / linguistic redundancy), but it makes the configuration of the loss vivid in a way the audiogram alone does not.

This is also the most useful patient-counselling tool. Showing a patient that their high-frequency loss specifically cuts these specific phonemes makes the abstract audiogram concrete in a way that “you have a sloping sensorineural loss” never will.

When the banana fails: audibility ≠ intelligibility

The banana picture has limits. It treats speech as a spectrum — but speech is also a time-varying signal with rapid formant transitions, voice-onset-time cues, and prosodic patterns. A listener whose audibility is perfect according to the banana may still have:

This is why the speech banana is one of several tools, not a substitute for the SRT/WRS measurements of 3.1 or the speech-in-noise tests of 3.3. Audibility is necessary for intelligibility but is not sufficient — and intelligibility itself depends on signal, brain, and context together.

What’s next

The next lesson, 3.3 — Speech in noise, extends speech audiometry to the listening conditions patients actually live in. Real-world environments contain noise, reverberation, and competing talkers. The audiogram and the quiet WRS poorly predict performance in those conditions; modern speech-in-noise tests (HINT, QuickSIN) and the articulation index / speech intelligibility index address the gap.