7.4 The DFT, sampling, and aliasing

Real-world signals live in computers. An audio buffer is a finite array of numbers; a sampled microphone signal is a sequence of measurements at evenly-spaced times. The continuous Fourier transform of 7.2 is the right mathematical object for thinking about signals abstractly, but the computational object is the discrete Fourier transform (DFT), which acts on a finite sequence of samples.

This lesson develops the bridge from continuous to discrete. Three connected ideas: sampling theory (when do samples suffice to represent a signal?), aliasing (what goes wrong when they don’t?), and the DFT itself (how to compute the spectrum of a sampled signal).

Sampling a continuous signal

To represent a continuous signal $f(t)$ in a computer, we sample it at evenly spaced times $t_n = n \Delta t$ :

f_n \;\equiv\; f(n \Delta t), \qquad n = 0, 1, 2, \ldots, N - 1.

The sample rate is $f_s = 1/\Delta t$ (in samples per second, often denoted Hz). For CD-quality audio, $f_s = 44100$ Hz; for telephony, 8000 Hz; for high-end pro audio, 96000 Hz or 192000 Hz.

The fundamental question: under what conditions can we recover the continuous signal $f(t)$ from the discrete samples $\{f_n\}$ ? The answer is Shannon’s sampling theorem.

The Shannon sampling theorem

Theorem. If a signal $f(t)$ contains no frequencies above $\omega_{\max}$ (i.e. $\tilde f(\omega) = 0$ for $|\omega| > \omega_{\max}$ ), then $f$ can be uniquely reconstructed from samples taken at any rate

f_s \;\geq\; \frac{\omega_{\max}}{\pi} \;=\; 2\, f_{\max},

where $f_{\max} = \omega_{\max} / (2\pi)$ is the maximum frequency in Hz.

The critical sampling rate $f_s = 2 f_{\max}$ is the Nyquist rate. Half of $f_s$ is the Nyquist frequency. Sample any signal containing only frequencies below the Nyquist frequency, and the samples carry all the information needed to reconstruct the continuous signal.

Reconstruction is by the Whittaker–Shannon interpolation formula:

f(t) \;=\; \sum_{n} f_n\, \mathrm{sinc}\!\left( \frac{t - n \Delta t}{\Delta t} \right),

where $\mathrm{sinc}(x) = \sin(\pi x) / (\pi x)$ is the normalised sinc function. This is the exact reconstruction when the signal is band-limited and the sampling theorem is satisfied. In practice approximate sinc-interpolation is used, since perfect sinc has infinite support.

Aliasing

What happens when the sampling rate is below the Nyquist rate — i.e. when the signal contains frequencies above $f_s / 2$ ? The high-frequency content aliases down to a lower apparent frequency that the samples can represent.

Concretely: a sinusoid at frequency $f_0 > f_s / 2$ produces samples that are indistinguishable from samples of a sinusoid at the aliased frequency

f_\text{alias} \;=\; |f_0 - n f_s|

for whichever integer $n$ makes the result fall in $[0, f_s/2]$ . The two signals — high-frequency original, low-frequency alias — produce identical sample sequences. Without additional information, the system has no way to distinguish them, and any reconstruction from the samples will produce the alias.

signal frequency f₀ = 3.00 Hz sample rate fs = 10.00 Hz (Nyquist = 5.00 Hz)

Sampling is adequate. The signal at f₀ = 3.0 Hz is below the Nyquist frequency fs/2 = 5.0 Hz. The samples uniquely determine the signal — the Shannon sampling theorem guarantees perfect reconstruction. Push f₀ above 5.0 Hz and watch the reconstructed alias diverge from the truth.

Drag the signal frequency $f_0$ past the Nyquist frequency $f_s / 2$ . The original (light) and the aliased reconstruction (heavy) diverge; the dots (samples) sit on both curves equally. The bottom strip shows the frequency-domain picture: the spectral content at $f_0$ “folds” around the Nyquist line and reappears at $f_\text{alias}$ .

Aliasing is not recoverable by any algorithm. Information about frequencies above the Nyquist is permanently lost in the sampling. The only defence is an anti-aliasing filter: an analog low-pass filter applied before sampling that removes content above the Nyquist frequency. Every analog-to-digital converter contains one.

▶ Nyquist frequency and aliasing for a speech recording Worked Example

A microphone digitises at $f_s = 16{,}000\,\text{Hz}$ (telephone-quality). The Nyquist frequency is $f_N = f_s/2 = 8{,}000\,\text{Hz}$ . Frequencies below 8 kHz are faithfully captured.

Suppose a metallic clink at $f_0 = 11{,}000\,\text{Hz}$ is present. Without an anti-aliasing filter, it aliases to:

f_\text{alias} = |f_0 - f_s| = |11{,}000 - 16{,}000| = 5{,}000\,\text{Hz}.

The 11 kHz tone appears in the recording as a spurious 5 kHz whine — well inside the audible speech band and indistinguishable from a genuine 5 kHz signal. This is why every ADC applies a low-pass filter at the Nyquist frequency before sampling.

The discrete Fourier transform (DFT)

The DFT acts on a finite sequence of $N$ samples $\{x_n\}_{n=0}^{N-1}$ and produces $N$ complex numbers $\{X_k\}_{k=0}^{N-1}$ :

\boxed{\;X_k \;=\; \sum_{n=0}^{N-1} x_n\, e^{-i\, 2\pi k n / N}, \qquad k = 0, 1, \ldots, N-1.\;}

The inverse:

x_n \;=\; \frac{1}{N} \sum_{k=0}^{N-1} X_k\, e^{i\, 2\pi k n / N}.

The DFT is the Fourier transform restricted to a finite, sampled, periodic context. Notice the parallels:

The continuous transform integrates over all time; the DFT sums over $N$ samples.
The continuous transform produces a continuous function of $\omega$ ; the DFT produces $N$ values at discrete frequencies $\omega_k = 2\pi k / (N \Delta t)$ .
The continuous transform applies to signals that go to zero at infinity; the DFT implicitly assumes the signal is periodic with period $N$ .

That last point is important. The DFT treats the $N$ -sample buffer as one period of a periodic signal. If you apply the DFT to a signal that isn’t really periodic (e.g. a sound clip with non-matching endpoints), the implicit periodisation produces a jump at the seam, which produces spurious high-frequency content in the spectrum — spectral leakage.

Windowing and spectral leakage

The standard fix for spectral leakage is to multiply the time-domain signal by a window function $w(t)$ that smoothly tapers to zero at the buffer endpoints before applying the DFT. Common windows:

Rectangular — the unwindowed case. Maximum spectral leakage.
Hann $w(n) = \tfrac{1}{2}[1 - \cos(2\pi n / N)]$ — smooth cosine taper. Standard for spectral analysis.
Hamming $w(n) = 0.54 - 0.46 \cos(2\pi n / N)$ — minor variant of Hann with slightly better far-field rejection.
Blackman $w(n) = 0.42 - 0.5 \cos(2\pi n / N) + 0.08 \cos(4\pi n / N)$ — more aggressive taper for very-low-leakage spectra.
Tukey, Kaiser, Gaussian, Bartlett — various trade-offs of main-lobe width against side-lobe level.

Choosing a window is the central design decision of spectrogram analysis. A narrow window gives sharp time resolution but blurry frequency; a wide window gives sharp frequency but blurry time. This is the uncertainty principle from 7.2, expressed in the design choice of $w$ .

The FFT

A naive DFT computation costs $\mathcal{O}(N^2)$ operations — for $N = 10^6$ , about $10^{12}$ complex multiplications. The Fast Fourier Transform (FFT) algorithm computes the same DFT in $\mathcal{O}(N \log N)$ operations, a factor-of- $N$ speedup. For $N = 10^6$ that’s $2 \times 10^7$ operations — six orders of magnitude faster.

The FFT is the algorithm behind every spectrogram, every audio plug-in, every MRI reconstruction, every JPEG-2000 compression. We develop the Cooley–Tukey radix-2 FFT in detail in Numerical Methods 10.5, where the history of its independent discovery by Gauss in 1805 is also told. Here the relevant fact is that the FFT computes the DFT exactly (modulo floating-point roundoff), just faster.

What we use this for

Discrete-time Fourier analysis powers nearly all signal-processing on the bookshelf:

Spectrograms — short-time DFTs with overlapping windows. Sound 8.2.
Sound spectrum analysis — pitch, timbre, formants, every spectral plot in Sound 8 and Hearing 4. The display is always the magnitude of a windowed DFT.
Cochleagrams — the auditory analogue of a spectrogram, using a gammatone filterbank instead of a sliding DFT but with the same time-frequency-uncertainty trade-off. Hearing 4.7.
Audio compression (MP3, AAC, Opus) — uses the modified discrete cosine transform (MDCT), a close cousin of the DFT.
Digital filter design — design filter $\tilde h(\omega)$ in the frequency domain, inverse-DFT to get $h(n)$ , convolve with input. The convolution theorem of 7.3 implemented via FFT.

Closing the chapter

That closes Foundations 7. The Fourier transform is one mathematical idea applied at four different scales: as a series for periodic signals, as a transform for transients, as a convolution-theorem identity for linear-systems theory, and as a DFT for digital implementation. The four pictures are different views of the same underlying object — the eigenbasis of the differentiation operator, lifted to a function space — and the practical signal-processing world runs on all four simultaneously.