EchoFilter: End-to-End Neural Network for Acoustic Echo Cancellation
Acoustic Echo Cancellation (AEC) whose aim is to suppress the echo originated from acoustic coupling between loudspeakers and microphones, plays a key role in voice interaction. Linear adaptive filter (AF) is always used for handling this problem. However, since there would be some severe effects in...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
30-05-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Acoustic Echo Cancellation (AEC) whose aim is to suppress the echo originated
from acoustic coupling between loudspeakers and microphones, plays a key role
in voice interaction. Linear adaptive filter (AF) is always used for handling
this problem. However, since there would be some severe effects in real
scenarios, such nonlinear distortions, background noises, and microphone
clipping, it would lead to considerable residual echo, giving poor performance
in practice. In this paper, we propose an end-to-end network structure for echo
cancellation, which is directly done on time-domain audio waveform. It is
transformed to deep representation by temporal convolution, and modelled by
Long Short-Term Memory (LSTM) for considering temporal property. Since time
delay and severe reverberation may exist at the near-end with respect to the
far-end, a local attention is employed for alignment. The network is trained
using multitask learning by employing an auxiliary classification network for
double-talk detection. Experiments show the superiority of our proposed method
in terms of the echo return loss enhancement (ERLE) for single-talk periods and
the perceptual evaluation of speech quality (PESQ) score for double-talk
periods in background noise and nonlinear distortion scenarios. |
---|---|
DOI: | 10.48550/arxiv.2105.14666 |