Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?
Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model's input based on a documen...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
21-01-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Traditional information retrieval (IR) ranking models process the full text
of documents. Newer models based on Transformers, however, would incur a high
computational cost when processing long texts, so typically use only snippets
from the document instead. The model's input based on a document's URL, title,
and snippet (UTS) is akin to the summaries that appear on a search engine
results page (SERP) to help searchers decide which result to click. This raises
questions about when such summaries are sufficient for relevance estimation by
the ranking model or the human assessor, and whether humans and machines
benefit from the document's full text in similar ways. To answer these
questions, we study human and neural model based relevance assessments on 12k
query-documents sampled from Bing's search logs. We compare changes in the
relevance assessments when only the document summaries and when the full text
is also exposed to assessors, studying a range of query and document
properties, e.g., query type, snippet length. Our findings show that the full
text is beneficial for humans and a BERT model for similar query and document
types, e.g., tail, long queries. A closer look, however, reveals that humans
and machines respond to the additional input in very different ways. Adding the
full text can also hurt the ranker's performance, e.g., for navigational
queries. |
---|---|
DOI: | 10.48550/arxiv.2201.08721 |