diff --git a/papers/deepfir/audio/3db_LSTW_442c020q.wav b/papers/deepfir/audio/3db_LSTW_442c020q.wav new file mode 100644 index 0000000..02e6c2e Binary files /dev/null and b/papers/deepfir/audio/3db_LSTW_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_MP_442c020q.wav b/papers/deepfir/audio/3db_MP_442c020q.wav new file mode 100644 index 0000000..7331301 Binary files /dev/null and b/papers/deepfir/audio/3db_MP_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_mix_442c020q.wav b/papers/deepfir/audio/3db_mix_442c020q.wav new file mode 100644 index 0000000..1aaff6f Binary files /dev/null and b/papers/deepfir/audio/3db_mix_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_original_442c020q.wav b/papers/deepfir/audio/3db_original_442c020q.wav new file mode 100644 index 0000000..2ffa049 Binary files /dev/null and b/papers/deepfir/audio/3db_original_442c020q.wav differ diff --git a/papers/deepfir/audio/9db_LSTW_446c020t.wav b/papers/deepfir/audio/9db_LSTW_446c020t.wav new file mode 100644 index 0000000..ac08cd8 Binary files /dev/null and b/papers/deepfir/audio/9db_LSTW_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_MP_446c020t.wav b/papers/deepfir/audio/9db_MP_446c020t.wav new file mode 100644 index 0000000..d11892a Binary files /dev/null and b/papers/deepfir/audio/9db_MP_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_mix_446c020t.wav b/papers/deepfir/audio/9db_mix_446c020t.wav new file mode 100644 index 0000000..df535cf Binary files /dev/null and b/papers/deepfir/audio/9db_mix_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_original_446c020t.wav b/papers/deepfir/audio/9db_original_446c020t.wav new file mode 100644 index 0000000..47376ad Binary files /dev/null and b/papers/deepfir/audio/9db_original_446c020t.wav differ diff --git a/papers/deepfir/audio/m3db_LSTW_440c020b.wav b/papers/deepfir/audio/m3db_LSTW_440c020b.wav new file mode 100644 index 0000000..a397241 Binary files /dev/null and b/papers/deepfir/audio/m3db_LSTW_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_MP_440c020b.wav b/papers/deepfir/audio/m3db_MP_440c020b.wav new file mode 100644 index 0000000..9fc2bb7 Binary files /dev/null and b/papers/deepfir/audio/m3db_MP_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_mix_440c020b.wav b/papers/deepfir/audio/m3db_mix_440c020b.wav new file mode 100644 index 0000000..8b879f5 Binary files /dev/null and b/papers/deepfir/audio/m3db_mix_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_original_440c020b.wav b/papers/deepfir/audio/m3db_original_440c020b.wav new file mode 100644 index 0000000..41ee05f Binary files /dev/null and b/papers/deepfir/audio/m3db_original_440c020b.wav differ diff --git a/papers/deepfir/figs/graph_latency_v3.png b/papers/deepfir/figs/graph_latency_v3.png new file mode 100644 index 0000000..5fefb7f Binary files /dev/null and b/papers/deepfir/figs/graph_latency_v3.png differ diff --git a/papers/deepfir/index.html b/papers/deepfir/index.html new file mode 100644 index 0000000..1cc3219 --- /dev/null +++ b/papers/deepfir/index.html @@ -0,0 +1,143 @@ + +
+
+ |Paper|
+
+
+ Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon +
+
+ Google
+
+
+
+
+Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 644k parameters to generate FIR taps. We benchmark that our system can run on low-power DSP with 388 MIPS and mean end-to-end latency of 3.35 ms. We provide a comparison with baseline low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables. +
++Below is Deep FIR signal processing diagram, divided into synthesis and analysis. A new FIR filter is estimated every hop. +
+ ++The table below provides audio samples evaluated on the CHiME-2 WSJ0 test set. The minimum phase Deep FIR filter has a mean algorithmic latency of 0.38 ms, which can potentially allow for 1.6 ms end-to-end latency on hardware. +Notice that the proposed Deep FIR approach achieves comparable audio quality and more efficient computation in the very low latency range compared to the long short time window (LSTW) approach. +
++ | Example -3dB SNR | +Example +3 dB SNR | +Example +9 dB SNR | +
Noisy Mixture | ++ | + | + |
LSTW (0.5 ms alg. latency) | ++ | + | + |
Proposed: Deep FIR Minimum phase (0.38 ms alg. latency) |
+ + | + | + |
Ground-Truth Reference | ++ | + | + |
+ |