Improving extractive document summarization with sentence centrality

Extractive document summarization (EDS) is usually seen as a sequence labeling task, which extracts sentences from a document one by one to form a summary. However, extracting sentences separately ignores the relationship between the sentences and documents. One solution is to use sentence position...

Full description

Saved in:
Bibliographic Details
Published in:PloS one Vol. 17; no. 7; p. e0268278
Main Authors: Gong, Shuai, Zhu, Zhenfang, Qi, Jiangtao, Tong, Chunling, Lu, Qiang, Wu, Wenqing
Format: Journal Article
Language:English
Published: San Francisco Public Library of Science 22-07-2022
Public Library of Science (PLoS)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Extractive document summarization (EDS) is usually seen as a sequence labeling task, which extracts sentences from a document one by one to form a summary. However, extracting sentences separately ignores the relationship between the sentences and documents. One solution is to use sentence position information to enhance sentence representation, but this will cause the sentence-leading bias problem, especially in news datasets. In this paper, we propose a novel sentence centrality for the EDS task to address these two problems. The sentence centrality is based on directed graphs, while reflecting the sentence-document relationship, it also reflects the sentence position information in the document. We implicitly strengthen the relevance of sentences and documents by using sentence centrality to enhance sentence representation. Notably, we replaced the sentence position information with sentence centrality to reduce sentence-leading bias without causing model performance degradation. Experiments on the CNN/Daily Mail dataset showed that EDS models with sentence centrality significantly improved compared with baseline models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0268278