Initiatives
Learning note for STFT.
Scope
Part I
- Intro to STFT
- Limitation, time-frequency resolution
- Optimization, why Gaussian ?
Part II
- Other windows
- Accelarated implementation
STFT
The short time Fourier transform is an advanced technique aims to solve the problem that traditional Fourier transform is unable to capture temporal information of frequency components. With window sliding over the signal, the signal is represented in 2D time-frequency domain.
This process allows for the detection of frequency components and their variations over time, making it particularly useful in signal processing applications such as music analysis, speech recognition, and biomedical signal processing. By using a sliding window to partition the signal, the STFT provides a more localized view of the signal’s frequency content, enabling the extraction of meaningful patterns and features that may not be apparent through other analysis techniques.
Math Expression
Continous
$$\mathbf{STFT}\left\{ x(t)\right\} (t,f)\equiv X(t,f)=\int_{-\infty}^{\infty}x(t)w(t-\tau)e^{-j2\pi f \tau}\ d\tau$$
- The window $w(t-\tau)$ centered at $\tau$ comes in many forms such as hanning, bessel… The commonly used one is gaussian type with total energy = 1
- The spectrogram is given by $|\mathbf{STFT}(t,f)|^2$, meaining energy distribution on t-f domain.
Discrete
$$X[m,k]=\sum_{n=0}^{N-1}{x[n+mH]\cdot w[n]\cdot e^{-j2\pi kn/N}}$$
- $N$ is the points for FFT, the total output points for each window.
- $H$ is STFT window hop size, the $\tau$ in continuous form.
- If infinite signal length (ST-DTFT)
$\rightarrow X[m,k]=\sum_{-\infty}^{\infty}{x[n]\cdot w[n-mH]\cdot e^{-j2\pi kn/N}}$
Implementation
Function
|
|
Note that X_win = np.fft.fft(x_win) / Fs is to normalize the output.
Plot
- In the implement bellow
coor_has data in time domain inaxis=0
|
|
Note that
- Time Index. - Window stride in unit
H, for each hop it takesH / Fsseconds, and total hopsX.shape[1]times, counted bynp.arange() - F. Index. - Frequency bin from 0 to
Fs / 2represented byN / 2points (=X.shape[0]). Each bin equalsFs / NHz, counted bynp.arange()
Time-frequency Properties
Localization
Some facts for Fourier Transform
$$\hat{x}(\omega)=\int_{-\infty}^{\infty}x(t)e^{-j\omega t}\ dt$$
- Inner product between signal $x(t)$ and kernel function $e^{-j\omega t}$
- Kernel vector spins around complex unit circle at angular velocity $\omega$.
- The transform evaluates Relation between signal $x(t)$ and kernel based on orthogonality. (fourier series coef != 0)
- Orthogonal means they are irrelative. (fourier series coef = 0)
Consider FT of a signal measured on a duration
$$\hat{x}(\omega)=\frac{1}{T}\int_{-\frac{T}{2}}^{\frac{T}{2}}x(t)e^{\frac{-j\omega t}{T}}\ dt$$
- For measuring duration $T$, it’s uncertain if a signal with period $T^{\prime}>T$ exists. In this case, given measuring time $T$, it’s not possible to distinguish 2 signals whose frequencies differ less than $\frac{1}{T}$ . ( signals with frequency smaller than $\frac{1}{T}$ are regarded the same )
- Furthermore, for duration $T$, in order to distinguish 2 signal, a minimal $\Delta f$ between 2 signals exists.
- It is to say, given $T$, the desired frequency $\hat{x}(f)$ is in fact $\hat{x}(f\pm\Delta f)$, where $f$ has a uncertainty spread over $f\pm\Delta f$
- $\hat{x}(f)$ is de facto Averaged Instananeous Frequencies over the uncertainty spread $f\pm\Delta f$. ( IFs could vary widely)
- Similar for inverse transform, for measuring bandwidth $F$, there is an uncertainty spread in time domain such that $\hat{x}(t)$ is in fact $\hat{x}(t\pm\Delta t)$
Given the facts, consider a nonnegtive signal $x(t)$ whose FT is $X(f)$, $x(t)$ has a nominal duration $\Delta t$ while $X(f)$ has a nominal bandwidth $\Delta f$ and both can be defined as the duration/bandwidth of a rectangular pulse having the same area as the continuous signal/spectrum and height equals to its amplitude at origin. (most cases fourier coefficient has max at n=0)
This gives the equation for nonnegative $x(t)$ $$\Delta t\cdot x(0)=\int_{-\infty}^{\infty}x(t)\ dt$$ Left side gives the area of regtangle, righthand side is the area of the signal over time. Now we can say nominal duration $\Delta t$ describe the “Spread” of signal on time domain.
Similarly in frequency domain $$\Delta f\cdot X(0)=\int_{-\infty}^{\infty}X(f)\ df$$ while $$X(0)=\int_{-\infty}^{\infty}x(t)e^{-j2\pi ft}\ dt\biggm\vert_{f=0}=\int_{-\infty}^{\infty}x(t)\ dt=\Delta t\cdot x(0)$$ And get $$\Delta t=\frac{X(0)}{x(0)},\ \Delta f=\frac{x(0)}{X(0)} \rightarrow \Delta f\cdot \Delta t=1$$
For general cases, consider signals with absolute value $$\Delta t\cdot |x(t)|=\int_{-\infty}^{\infty}|x(t)|\ dt\ge\left|\int_{-\infty}^{\infty}x(t)\ dt\right|=\left|X(0)\right|$$ And finally get the inequality $$\Delta t\ge\frac{|X(0)|}{|x(0)|},\ \Delta f\ge\frac{|x(0)|}{|X(0)|} \rightarrow \Delta f\cdot \Delta t\ge1$$
Physical meanings
- Whenever we want simultaneous measurement of a function and its Fourier tranform, the uncertainty principle applies.
- To further put it on time-frequency domain, for a measured signal, the principle gives that we have best localization of an area equals to 1
Time-frequency domain The resolution has a quantum cell or say basic information cell
- A simple example. For a noiseless channel $\Delta t\cdot \Delta f = 1$, if two signals differ in frequency by
0.05, then measuring time must greater than1 / 0.05 = 20sfor STFT to resolve the signals on time-frequency domain.
Limitation on Spectrogram
Following STFT, the common practice is look into energy distribution on time-frequency domain.
- The STFT output is complex, it turns real in terms of energy (self inner product).
- Better efficiency on further computation & visualization.
- Convenience for analysis.
To Migrate, Starting with
-
Consider a signal $x(t)$ whose continuous fourier transform $X(f)$ and Total Signal Energy=1 $\rightarrow \int_\mathbb{R} |x(t)|^2dt=1$
-
Expected energy in time and frequency
$\begin{cases} E[|x(t)|^2]=\mu_{t}=\int_\mathbb{R} t\cdot|x(t)|^2dt \\ E[|X(f)|^2]=\mu_{f}=\int_\mathbb{R} f\cdot|X(f)|^2df \end{cases}$
-
Uncertainty of energy distribution is then evaluated in terms of Variance.
$\begin{cases} \sigma_t^2=\int_\mathbb{R} (t-\mu_t)^2|x(t)|^2dt \\ \sigma_f^2=\int_\mathbb{R} (f-\mu_f)^2|X(f)|^2df \end{cases}$
-
Begin with playing property Total Energy=1
$$1=\int_\mathbb{R}|x(t)|^2\ dt$$ with integrate by part $$\begin{align}&=-\int_\mathbb{R}x\frac{d|x(t)|^2}{dt}\ dt\\&=-\int_\mathbb{R}x\frac{d}{dx}\left\{ x(t)\cdot \overline{x(t)}\right\}\ dt\\ &=-\int_\mathbb{R}x\left\{ x^{\prime}(t)\cdot \overline{x(t)}+x(t)\cdot \overline{x^{\prime}(t)}\right\}\ dt \\ & = -2\int_\mathbb{R}t\cdot \Re[x(t)\cdot x^{\prime}(t)]\ dt\end{align}$$ By Cauchy-Schwaz Inequality $$\begin{split} 1 & \leq \left(2\int_\mathbb{R}|(t\cdot\Re[x(t)\cdot x^{\prime}(t)])|\right)^2 \\ & \leq 4\left(\int_\mathbb{R}|t\cdot x(t)|^2\ dt\right)\cdot \left(\int_\mathbb{R}|x^{\prime}(t)|^2\ dt\right) \end{split}$$
with Parseval’s Theorem $X^{\prime}(f)=j2\pi fX(f)$
and Parseval Identity
$\rightarrow \int_\mathbb{R}|x^{\prime}(t)|^2\ dt=\int_\mathbb{R}|X^{\prime}(f)|^2\ df$
$=\int_\mathbb{R}|j2\pi fX(f)|^2\ df=4\pi ^2\int_\mathbb{R}f^2|X(f)|^2\ df$
Right hand side of inequality gives $$=4\left(\int_\mathbb{R}t^2|x(t)|^2\ dt\right)\cdot \left(4\pi^2\int_\mathbb{R}f^2|X(f)|^2\ df\right)$$ The inequality then gives $$1 \leq 16\pi^2\left(\int_\mathbb{R}t^2|x(t)|^2\ dt\right)\cdot \left(\int_\mathbb{R}f^2|X(f)|^2\ df\right)$$
This shows the product of 2nd order raw moment on instantaneous power (instantaneous energy density) $|x(t)|^2$ and PSD (power spectrum density) $|X(f)|^2$ has minimum of $\frac{1}{16\pi^2}$. To show in terms of variance, input below signal:
$$x(t+t_0)e^{-j2\pi f_0t}=\mathcal{F}^{-1}\left\{ X(f+f_0)e^{j2\pi(f+f_0)t_0} \right\}$$ Substitude into inequality gives $$\begin{split}\frac{1}{16\pi^2}&\leq\left( \int_\mathbb{R}t^2|x(t+t_0)e^{-j2\pi f_0t}|^2\ dt\right)\cdot \left(\int_\mathbb{R}f^2|X(f+f_0)e^{j2\pi (f+f_0)t_0}|^2\ df \right)\\&=\left(\int_\mathbb{R}t^2|x(t+t_0)|^2\ dt\right)\cdot \left(\int_\mathbb{R}f^2|X(f+f_0)|^2\ df \right) \end{split}$$
Note that
$|x(t)|^2=x(t)\cdot x^{*}(t) \rightarrow |x(t)e^{-j2\pi f_0t}|^2=x(t)e^{j2\pi f_0t}x(t)e^{-j2\pi f_0t}$
And finally with change of variable gives $$\frac{1}{16\pi^2}\leq\left(\int_\mathbb{R}(t-t_0)|x(t)|^2\ dt\right)\cdot \left(\int_\mathbb{R}(f-f_0)|X(f)|^2\ df\right)$$ Now let $t_0=E[|x(t)|^2],\ f_0=E[|X(f)|^2]$ , we have 2nd order central moment a.k.a. Variance $$\begin{split}\frac{1}{16\pi^2}&\leq\sigma_{energy,t}^2\cdot \sigma_{energy,f}^2\\ \frac{1}{4\pi} &\leq \sigma_{energy,t}\cdot \sigma_{energy,f}\end{split}$$
Conclusions
- Given any energy-bounded signal, the spread of both signal and signal energy distribution are inversely proportional against the spread in its Fourier transform.
- To further manipulate the resolution on spectrogram, refers to the imbalanced form and window functions.
Optimization for Equality to Hold
Above process assumed window sliding preserves signal’s original probability distribution, meaning rectangular window applied.
Besides,
- A window could modulate signals in its probability distribution.
- A signal segment sampled by proper window could reach uncertainty low bound $\frac{1}{4\pi}$
Optimized Window
An optimized window need
- Minimal self information so it samples without bias.
- In other word, its power density reaches maximal entropy.
Now let a window function $w(t)$ has bounded energy $\int_{-\infty}^{\infty}|w(t)|^2dt=1$
The entropy
$\begin{cases} H_{t}=-\int_{-\infty}^{\infty}|w(t)|^2\ln|w(t)|^2dt \\ H_{f}=-\int_{-\infty}^{\infty}|W(f)|^2\ln|W(f)|^2df \end{cases}$
Maximize ($H_t$ & $H_f$) with constrains
- Bounded energy to 1
- Arbitrary finite length window with uncertainty (excluding $\delta(t)$)
$$ \mathcal{L_t} = H_t + \lambda_1 \left( \int_{-\infty}^{\infty}|w(t)|^2dt-1\right)+\lambda_2\cdot \sigma_{energy,t} $$
Find extreme
$$ \delta\mathcal{L_t}=0 $$ gives $$ \frac{\delta H_t}{\delta w(t)} + \frac{\delta}{\delta w(t)}\lambda_1 \int_{-\infty}^{\infty}|w(t)|^2dt + \frac{\delta}{\delta w(t)}\lambda_2\cdot \sigma_{energy,t}=0 $$
p.s.
Variational derivative is given by $ \frac{\delta J}{\delta y(t)}=\frac{\partial F}{\partial y(t)},\ J(y)=\int_{a}^{b}F(y(t))dt $
Walk through will be added.
Expand each terms
- Entropy term
$$\begin{align}
\frac{\delta H_t}{\delta w(t)} & = -\int_{-\infty}^{\infty}\frac{\delta}{\delta w(t)} \left[w(t)w^{*}(t) \ln{\left(w(t)w^* (t)\right)}\right]dt \\
& = -\int_{\infty}^{\infty}w(t)^*\ln|w(t)|^2+w^*(t)dt = -\int_{\infty}^{\infty}F(w(t))dt\\
& =\frac{\partial F}{\partial w(t)}= -w^*(t)\ln|w(t)|^2-w^*(t)
\end{align}$$
- Bounded energy $$ \frac{\delta}{\delta w(t)}\lambda_1 \int_{\infty}^{\infty}|w(t)|^2dt = \lambda_1 w^*(t) $$
p.s.
With similar technique in entropy term
- Uncertainty term
$let\ \ \sigma_{energy,t}^2 = \int_{-\infty}^{\infty}t^2|w(t)|^2dt=I \to \sigma_{energy,t} = \sqrt{I}$
then $$ \frac{\delta}{\delta w(t)}\lambda_2\cdot\sigma_{energy,t} = \lambda_2\left(\frac{1}{2\sqrt{I}}\cdot\frac{\delta I}{\delta w(t)}\right) = \frac{\lambda_2}{2\sigma_{energy,t}}\cdot t^2w^*(t) $$
Put Togather
$$
-w^*(t)\ln{|w(t)|^2}-w^*(t)+\lambda_1 w^*(t) + \frac{\lambda_2}{2\sigma_{energy,t}}\cdot t^2 w^*(t)=0 \\
$$
with
$w^*(t) \neq 0$
$$
\ln{|w(t)|^2}=-1+\lambda_1+ \frac{\lambda_2 \cdot t^2}{2\sigma_{energy,t}}
$$
$$
\rightarrow |w(t)|^2=\exp{(-1+\lambda_1+\frac{\lambda_2\cdot t^2}{2\sigma_{energy,t}})} = C_t^2 \cdot \exp{(-\alpha t^2)} \tag{9}
$$
Conclusions
- Equation $(9)$ shows the window power density is in Gaussian form.
- Restricting entropy with normalization and STD will naturally yield Gaussian.
Now to further verify the equality holds.
Verify Equality
The same applies to $|W(f)|^2$, and we have $$ |W(f)|^2 = \exp{(-1+\lambda_3 + \frac{\lambda_4\cdot f^2}{2\sigma_{energy,f}})} = C_f^2\cdot \exp{(-\beta f^2)} \tag{10} $$
The Fourier Tranform of $w(t)$
For $w(t) \in \mathbb{R}$ $$ w(t) = C_t\cdot \exp{(-\frac{\alpha}{2}t^2)} $$ $$ \mathcal{F}\{w(t)\} = \int_{-\infty}^{\infty}w(t)e^{-j\omega t}dt = C_t \int_{-\infty}^{\infty} \exp{(-\frac{\alpha}{2}t^2-j2\pi ft)}dt $$ Parse the exponential terms $$ -\frac{\alpha}{2}\left(t^2+\frac{2j2\pi ft}{\alpha}\right)=-\frac{\alpha}{2}\left[\left(t+j\frac{2\pi f}{\alpha}\right)^2 + \left(\frac{2\pi f}{\alpha}\right)^2\right] $$ Put back $$ \rightarrow W(f)=C_t\cdot\exp{(-\frac{2\pi^2 f^2}{\alpha})}\cdot \int_{-\infty}^{\infty} \exp{\left(-\frac{\alpha}{2}\left(t+j\frac{2\pi f}{\alpha}\right)^2\right)}dt $$ Integrate by Substitution $$ let\ \ u=t+j\frac{2\pi f}{\alpha},\ du=dt \\ \rightarrow \int_{-\infty}^{\infty}\exp{(-\frac{\alpha}{2}u^2)du} = \sqrt{\frac{2\pi}{\alpha}} $$
p.s.
for hard functions, apply integrate by Pythonimport sympy as sp # Variable and params u = sp.symbols('u', real=True) a = sp.symbols('alpha', positive=True) # Integral function f = sp.exp(-a/2 * u**2) r = sp.integrate(f, (u, -sp.oo, sp.oo)) # Result Prints: sqrt(2)*sqrt(pi)/sqrt(alpha)
Lastly put back gives $$ W(f)=C_t\cdot \exp{\left(-\frac{2\pi^2f^2}{\alpha}\right)}\cdot \sqrt{\frac{2\pi}{\alpha}} \tag{11} $$
Compare
For $W(f)\in \mathbb{R}$, by equation $(10)$ and $(11)$ yields $$ W(f) = C_f \cdot \exp{(-\frac{\beta}{2}f^2)} = C_t \cdot \sqrt{\frac{2\pi}{\alpha}} \cdot \exp{\left(-\frac{2\pi^2 f^2}{\alpha}\right)} $$ $$ \rightarrow \begin{cases} C_f =C_t \cdot \sqrt{\frac{2\pi}{\alpha}} \tag{12} \\ \alpha \beta = 4\pi^2 \end{cases} $$
Expand Constrains
- Variance $$ \sigma_{energy,t}^2 = \int_{-\infty}^{\infty}t^2|w(t)|^2dt = C_t^2\int_{-\infty}^{\infty}t^2\exp{\left(-\alpha t^2\right)}dt = \frac{\sqrt{\pi}}{2\alpha^{\frac{3}{2}}}C_t^2 $$ $$ \sigma_{energy,f}^2 = … = \frac{\sqrt{\pi}}{2\beta^{\frac{3}{2}}}C_f^2 $$
- Total energy is bounded 1 $$ \int_{-\infty}^{\infty}|w(t)|^2dt = C_t^2 \int_{-\infty}^{\infty} \exp{(-\alpha t^2)}dt = C_t^2 \sqrt{\frac{\pi}{\alpha}}=1 $$ $$ \rightarrow C_t^2 = \sqrt{\frac{\alpha}{\pi}} \tag{13} $$
Solve for Equality
$$ \sigma_{energy,t}^2\cdot \sigma_{energy,f}^2=\frac{\pi}{4\cdot(\alpha\beta)^{\frac{3}{2}}}\cdot C_t^2 \cdot C_f^2 $$ $$\rightarrow \begin{cases} \alpha \beta = 4\pi^2 \\ C_t^2 \cdot C_f^2 = C_t^2 \cdot C_t^2 \cdot \frac{2\pi}{\alpha} = 2 \tag{by\ 12,\ 13} \end{cases}$$ Bring back $$ \sigma_{energy,t}^2 \cdot \sigma_{energy,f}^2 = \frac{2\pi}{4\cdot (2\pi)^3}=\frac{1}{16\pi^2} $$ $$ \rightarrow \sigma_{energy,t}\cdot\sigma_{energy,f}=\frac{1}{4\pi} \quad \text{Q.E.D.} $$
Resolve multipliers for completeness
$$ \rightarrow \begin{cases} \lambda_1 & = 1 - \frac{1}{2}\left(\ln{\sigma_t}+\ln{\pi}+\ln{2}\right) \\ \lambda_2 & = -\frac{1}{\sigma_t} \\ \lambda_3 & = 1 - \frac{1}{2}\left(\ln{\sigma_f}+\ln{\pi}+\ln{2}\right) \\ \lambda_4 & = -\frac{1}{\sigma_f} \end{cases}$$
p.s.
solving withimport sympy as sp # Define variables and config nonzero lambda1, lambda2, lambda3, lambda4 = \ sp.symbols('lambda1 lambda2 lambda3 lambda4', real=True) sigma_t = sp.Symbol('sigma_t', real=True, positive=True) sigma_f = sp.Symbol('sigma_f', real=True, positive=True) pi = sp.Symbol('pi', real=True, positive=True) # Config placeholder alpha = -lambda2 / (2 * sigma_t) beta = -lambda4 / (2 * sigma_f) Ct_square = sp.sqrt(alpha / pi) Cf_square = sp.sqrt(beta / pi) conditions = [ sigma_t**2 - (sp.sqrt(pi) / (2 * (alpha)**(3/2))) * Ct_square, sigma_f**2 - (sp.sqrt(pi) / (2 * (beta)**(3/2))) * Cf_square, sp.exp(lambda1 - 1) - sp.sqrt(1 / (2 * pi * sigma_t)), sp.exp(lambda3 - 1) - sp.sqrt(1 / (2 * pi * sigma_f)) ] # Solve for lambda2, lambda4 sol_l2l4 = sp.solve(conditions[:2], [lambda2, lambda4], dict=True) if sol_l2l4: sub_cond = [cond.subs(sol_l2l4) for cond in conditions[2:]] sol_l1l3 = sp.solve(sub_cond, [lambda1, lambda3], dict=True) else: sol_l1l3 = [] print("lambda2 & 4:", sol_l2l4) print("lambda1 & 3:", sol_l1l3)
Substitude back multipliers
with $ \lambda_2 = -\frac{1}{\sigma_{t}},\ \lambda_4 = -\frac{1}{\sigma_{f}} $
$$ \lambda_2 \rightarrow w(t) = C_t\cdot \exp{\left(\frac{\lambda_2\cdot t^2}{2\cdot \sigma_{energy,t}}\right)} = C_t\cdot \exp{\left(-\frac{t^2}{2\sigma_{energy,t}^2}\right)} $$ $$ \lambda_4 \rightarrow W(f) = C_f\cdot \exp{\left(\frac{\lambda_4\cdot f^2}{2\cdot \sigma_{energy,f}}\right)} = C_f\cdot \exp{\left(-\frac{f^2}{2 \sigma_{energy,f}^2}\right)} $$
- This verifies the typical Gaussian kernel $e^{\frac{-x^2}{2\sigma^2}}$
- Also verifies that fourier transform of Gaussian is still Gaussian
Part I Conclusions
- A window with maximal entropy is naturally in Gaussian shape.
- Maximal entropy $\to$ hold equality for uncertainty principle.
- With maximal entropy (minimal bias) window, it provides best resolution at theory limit on spectrogram.