Database selection in distributed information retrieval: A study of multi-collection information retrieval

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multi-collection environment. Multi-collection searching includes distributed searching as a special case but is more broadly defined here to incorporate searching partiti...

Full description

Saved in:
Bibliographic Details
Main Author: Powell, Allison Lane
Format: Dissertation
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multi-collection environment. Multi-collection searching includes distributed searching as a special case but is more broadly defined here to incorporate searching partitioned content independently from its physical storage. It is cast in three parts: collection selection (also referred to as database selection)-decide where should a query be sent; query processing-execute the query at each selected collection; and results merging-combine the results from individual collections into a single coherent list for the searcher. We focus our attention on collection selection. We compare a number of different collection selection approaches and examine the effect of collection selection on document retrieval performance. We consider multi-collection retrieval in six different test environments utilizing three document testbeds. Considering collection selection in isolation, we find that effective collection selection can be achieved using limited information about each collection. We then turn our attention from selection alone to data item retrieval in a multi-collection environment, considering retrieval performance in the same six test environments. First we find that good collection selection has the potential to result in better retrieval effectiveness than can be achieved in an equivalent single collection. Second we find that good performance can be achieved when only a few collections are selected and that the performance generally increases as more collections are selected. Finally we find that when collection selection is employed, it may not be necessary to maintain collection wide information (CWI), e.g., global idf. Local information can be used to achieve equivalent performance. This means that multi-collection systems can be engineered with more autonomy and less cooperation. This work demonstrates that improvements in collection selection can lead to broader improvements in document retrieval performance.
Bibliography:Adviser: James C. French.
Source: Dissertation Abstracts International, Volume: 62-01, Section: B, page: 0343.
ISBN:0493085343
9780493085340