RARD: The Related-Article Recommendation Dataset
D-Lib Magazine, Vol. 23, No. 7/8. Publication date: July 2017 Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music,...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
11-06-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | D-Lib Magazine, Vol. 23, No. 7/8. Publication date: July 2017 Recommender-system datasets are used for recommender-system evaluations,
training machine-learning algorithms, and exploring user behavior. While there
are many datasets for recommender systems in the domains of movies, books, and
music, there are rather few datasets from research-paper recommender systems.
In this paper, we introduce RARD, the Related-Article Recommendation Dataset,
from the digital library Sowiport and the recommendation-as-a-service provider
Mr. DLib. The dataset contains information about 57.4 million recommendations
that were displayed to the users of Sowiport. Information includes details on
which recommendation approaches were used (e.g. content-based filtering,
stereotype, most popular), what types of features were used in content based
filtering (simple terms vs. keyphrases), where the features were extracted from
(title or abstract), and the time when recommendations were delivered and
clicked. In addition, the dataset contains an implicit item-item rating matrix
that was created based on the recommendation click logs. RARD enables
researchers to train machine learning algorithms for research-paper
recommendations, perform offline evaluations, and do research on data from Mr.
DLib's recommender system, without implementing a recommender system
themselves. In the field of scientific recommender systems, our dataset is
unique. To the best of our knowledge, there is no dataset with more (implicit)
ratings available, and that many variations of recommendation algorithms. The
dataset is available at http://data.mr-dlib.org, and published under the
Creative Commons Attribution 3.0 Unported (CC-BY) license. |
---|---|
DOI: | 10.48550/arxiv.1706.03428 |