What is a Wiener Filter?

From Wikipedia: “The filter, used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal, noise spectra, and additive noise, was proposed by Norbert Wiener during the 1940s and published in 1949. The discrete-time equivalent of Wiener’s work was derived independently by Andrey Kolmogorov and published in 1941. Hence the theory is often called the Wiener–Kolmogorov filtering theory (cf. Kriging). “

The goal of the Wiener filter is to compute a statistical estimate of an unknown signal using a related signal in input and filtering that known signal to produce the estimate as an output. For example, the known signal might consist of an unknown signal of interest that has been corrupted by additive noise. The Wiener filter can be used to filter out the noise from the corrupted signal to provide an estimate of the underlying signal of interest. The Wiener filter is based on a statistical approach, and a more detailed account of the theory is given in the article by Welch Loyd: Wiener-Hopf Theory”

This filter is commonly used to denoise audio signals, especially voice recordings, as a preprocessor before speech recognition.

Let’s consider a filter system with Wiener Filter \(h_W(n)\):

\[x(n)=y(n)*h_W(n)\]

meaning we filter our distorted signal \(y(n)\) with our still unknown filter \(h_W(n)\).

The convolution of \(h_W(n)\) (with filter length L) with y(n) can be written as a matrix multiplication:

\[x(n)=\sum_{m=0}^{L-1}y(n-m)\cdot h_W(m)\]

Now let’s define 2 vectors. The first is a vector of the past L samples of our noisy signal y, up to the present sample at time n, (bold face font to indicate that it is a vector)

\[\boldsymbol y(n)=[y(n-L+1),...,y(n-1),y(n)]\]

The next vector contains the time-reversed impulse response,

\[\boldsymbol h_W=[h_W(L-1),...,h_W(1),h_W(0)]\]

Using those 2 vectors, we can rewrite our convolution equation above as a vector multiplication,

\[x(n)= \boldsymbol y(n) \cdot \boldsymbol {h_W}^T\]

Observe that \(\boldsymbol h_W\) has no time index because it already contains all the samples of the time-reversed impulse response, and is constant.

We can now also put the output signal x(n) into the row vector,

\[\boldsymbol x =[x(0),x(1),...]\]

To obtain this column vector, we simply assemble all the row vectors of our noisy signal :math:``boldsymbol y(n)` into a matrix \(\boldsymbol A\),

\[\begin{split}\boldsymbol A =\left[\matrix{\boldsymbol y(0) \\ \boldsymbol y(1) \\ \vdots } \right]\end{split}\]

With this matrix, we obtain the result of our convolution at all time steps n:

\[\boldsymbol A \cdot \boldsymbol {h_W}^T = \boldsymbol x^T\]

For the example of a filter length of $h_W$ of L=2 we have,

\[\begin{split}\begin{bmatrix}y(0) \& y(1) \\\ y(1) \& y(2) \\\ y(2) \& y(3)\\\ \vdots \& \vdots \end{bmatrix} \cdot \begin{bmatrix}h_W(1) \\\ h_W(0) \end{bmatrix} = \begin{bmatrix}x(0) \\\ x(1) \\\ x(2) \\\ \vdots \end{bmatrix}\end{split}\]

Observe again that the vector \(\boldsymbol h_w\) in this equation is the time-reversed impulse response of our filter. This is the matrix multiplication formulation of our convolution.

We can now obtain the minimum mean squared error solution of this matrix multiplication using the so-called Moore-Penrose Pseudo Inverse:

\[\boldsymbol A^T \cdot \boldsymbol A \cdot \boldsymbol {h_W}^T = \boldsymbol A ^T \cdot \boldsymbol x^T\]

Here, \(\boldsymbol A ^T \cdot \boldsymbol A\) is now a square matrix, and usually invertible, such that we obtain our solution

\[\boldsymbol {h_W}^T = (\boldsymbol A ^T \cdot \boldsymbol A) ^{-1} \boldsymbol A ^T \cdot \boldsymbol x^T\]

This pseudo-inverse finds the column vector \(\boldsymbol h^T\) which minimizes the distance to a given \(\boldsymbol x\) with the matrix \(\boldsymbol A\) (which contains our signal to be filtered). This \(\boldsymbol h_w\) is now the solution we where looking for. This solution has the minimum mean squared distance to the un-noisy version of all solutions.