FarsNewsQA: a deep learning-based question answering system for the Persian news articles
Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tack...
Saved in:
Published in: | Information retrieval (Boston) Vol. 26; no. 1-2; p. 3 |
---|---|
Main Authors: | , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Dordrecht
Springer Netherlands
01-06-2023
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tackle this challenge. Due to the lack of studies on Persian QA systems and the importance and wild applications of QA systems in the news domain, this research aims to design and implement a QA system for the Persian news articles. This is the first attempt to develop a Persian QA system in the news domain to our best knowledge. We first create FarsQuAD: a Persian QA dataset for the news domain. We analyze the type and complexity of the users’ questions about the Persian news. The results show that What and Who questions have the most and Why and Which questions have the least occurrences in the Persian news domain. The results also indicate that the users usually raise complex questions about the Persian news. Then we develop FarsNewsQA: a QA system for answering questions about Persian news. We developed three models of the FarsNewsQA using BERT, ParsBERT, and ALBERT. The best version of the FarsNewsQA offers an F1 score of 75.61%, which is comparable with that of QA system on the English SQuAD dataset made by the Stanford university, and shows the new Bert-based technologies works well for Persian news QA systems. |
---|---|
ISSN: | 1386-4564 1573-7659 |
DOI: | 10.1007/s10791-023-09417-2 |