diff --git a/papers/deepfir/audio/3db_LSTW_442c020q.wav b/papers/deepfir/audio/3db_LSTW_442c020q.wav new file mode 100644 index 0000000..02e6c2e Binary files /dev/null and b/papers/deepfir/audio/3db_LSTW_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_MP_442c020q.wav b/papers/deepfir/audio/3db_MP_442c020q.wav new file mode 100644 index 0000000..7331301 Binary files /dev/null and b/papers/deepfir/audio/3db_MP_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_mix_442c020q.wav b/papers/deepfir/audio/3db_mix_442c020q.wav new file mode 100644 index 0000000..1aaff6f Binary files /dev/null and b/papers/deepfir/audio/3db_mix_442c020q.wav differ diff --git a/papers/deepfir/audio/3db_original_442c020q.wav b/papers/deepfir/audio/3db_original_442c020q.wav new file mode 100644 index 0000000..2ffa049 Binary files /dev/null and b/papers/deepfir/audio/3db_original_442c020q.wav differ diff --git a/papers/deepfir/audio/9db_LSTW_446c020t.wav b/papers/deepfir/audio/9db_LSTW_446c020t.wav new file mode 100644 index 0000000..ac08cd8 Binary files /dev/null and b/papers/deepfir/audio/9db_LSTW_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_MP_446c020t.wav b/papers/deepfir/audio/9db_MP_446c020t.wav new file mode 100644 index 0000000..d11892a Binary files /dev/null and b/papers/deepfir/audio/9db_MP_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_mix_446c020t.wav b/papers/deepfir/audio/9db_mix_446c020t.wav new file mode 100644 index 0000000..df535cf Binary files /dev/null and b/papers/deepfir/audio/9db_mix_446c020t.wav differ diff --git a/papers/deepfir/audio/9db_original_446c020t.wav b/papers/deepfir/audio/9db_original_446c020t.wav new file mode 100644 index 0000000..47376ad Binary files /dev/null and b/papers/deepfir/audio/9db_original_446c020t.wav differ diff --git a/papers/deepfir/audio/m3db_LSTW_440c020b.wav b/papers/deepfir/audio/m3db_LSTW_440c020b.wav new file mode 100644 index 0000000..a397241 Binary files /dev/null and b/papers/deepfir/audio/m3db_LSTW_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_MP_440c020b.wav b/papers/deepfir/audio/m3db_MP_440c020b.wav new file mode 100644 index 0000000..9fc2bb7 Binary files /dev/null and b/papers/deepfir/audio/m3db_MP_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_mix_440c020b.wav b/papers/deepfir/audio/m3db_mix_440c020b.wav new file mode 100644 index 0000000..8b879f5 Binary files /dev/null and b/papers/deepfir/audio/m3db_mix_440c020b.wav differ diff --git a/papers/deepfir/audio/m3db_original_440c020b.wav b/papers/deepfir/audio/m3db_original_440c020b.wav new file mode 100644 index 0000000..41ee05f Binary files /dev/null and b/papers/deepfir/audio/m3db_original_440c020b.wav differ diff --git a/papers/deepfir/figs/graph_latency_v3.png b/papers/deepfir/figs/graph_latency_v3.png new file mode 100644 index 0000000..5fefb7f Binary files /dev/null and b/papers/deepfir/figs/graph_latency_v3.png differ diff --git a/papers/deepfir/index.html b/papers/deepfir/index.html new file mode 100644 index 0000000..1cc3219 --- /dev/null +++ b/papers/deepfir/index.html @@ -0,0 +1,143 @@ + + + Towards sub-millisecond latency real-time speech enhancement models on hearables (Online Supplement) + + + + + + +
+
+

Towards sub-millisecond latency real-time speech enhancement models on hearables

+ +

+ |Paper| +
+

+ Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon +

+
+

+ Google +
+ +

+ +
+

+

Overview

+

+Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 644k parameters to generate FIR taps. We benchmark that our system can run on low-power DSP with 388 MIPS and mean end-to-end latency of 3.35 ms. We provide a comparison with baseline low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables. +

+

Deep FIR achitecture

+

+Below is Deep FIR signal processing diagram, divided into synthesis and analysis. A new FIR filter is estimated every hop. +

+ +
+ + + +
+
+ + + +

Results with 0.125 ms synthesis window Deep FIR model

+

+The table below provides audio samples evaluated on the CHiME-2 WSJ0 test set. The minimum phase Deep FIR filter has a mean algorithmic latency of 0.38 ms, which can potentially allow for 1.6 ms end-to-end latency on hardware. +Notice that the proposed Deep FIR approach achieves comparable audio quality and more efficient computation in the very low latency range compared to the long short time window (LSTW) approach. +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Example -3dB SNRExample +3 dB SNRExample +9 dB SNR
Noisy Mixture
LSTW (0.5 ms alg. latency)
Proposed: Deep FIR Minimum phase (0.38 ms alg. latency)
Ground-Truth Reference

+ + + +
+
+ + + + + +