EchoFilter: End-to-End Neural Network for Acoustic Echo Cancellation

Acoustic Echo Cancellation (AEC) whose aim is to suppress the echo originated from acoustic coupling between loudspeakers and microphones, plays a key role in voice interaction. Linear adaptive filter (AF) is always used for handling this problem. However, since there would be some severe effects in...

Full description

Saved in:
Bibliographic Details
Main Authors: Ma, Lu, Yang, Song, Gong, Yaguang, Wang, Xintian, Wu, Zhongqin
Format: Journal Article
Language:English
Published: 30-05-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Acoustic Echo Cancellation (AEC) whose aim is to suppress the echo originated from acoustic coupling between loudspeakers and microphones, plays a key role in voice interaction. Linear adaptive filter (AF) is always used for handling this problem. However, since there would be some severe effects in real scenarios, such nonlinear distortions, background noises, and microphone clipping, it would lead to considerable residual echo, giving poor performance in practice. In this paper, we propose an end-to-end network structure for echo cancellation, which is directly done on time-domain audio waveform. It is transformed to deep representation by temporal convolution, and modelled by Long Short-Term Memory (LSTM) for considering temporal property. Since time delay and severe reverberation may exist at the near-end with respect to the far-end, a local attention is employed for alignment. The network is trained using multitask learning by employing an auxiliary classification network for double-talk detection. Experiments show the superiority of our proposed method in terms of the echo return loss enhancement (ERLE) for single-talk periods and the perceptual evaluation of speech quality (PESQ) score for double-talk periods in background noise and nonlinear distortion scenarios.
DOI:10.48550/arxiv.2105.14666