Models and Algorithms for Automatic Detection of Language Evolution

With advances in technology and culture and through high impact events, our language changes. We invent new words, add or change meanings of existing words and rename existing things. This results in a dynamic language that progresses with our needs and provides us with the possibilities to express...

Full description

Saved in:

Bibliographic Details
Main Author:	Tahmasebi, Nina N
Format:	Dissertation
Language:	English
Published:	ProQuest Dissertations & Theses 01-01-2013
Subjects:	Algorithms Anthologies Computer science Digital archives Digital libraries Historic documents Language Multimedia National libraries Search engines Web archiving Word sense disambiguation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With advances in technology and culture and through high impact events, our language changes. We invent new words, add or change meanings of existing words and rename existing things. This results in a dynamic language that progresses with our needs and provides us with the possibilities to express ourselves and describe the world around us. This phenomenon is called language evolution. Unfortunately, our language does not carry a memory; words, expressions and meanings used in the past are forgotten over time. Therefore, language evolution limits us when we want to find and interpret information about the past from historical documents. The primary goals of this thesis are the following: (1) to provide deeper insight into the problems of language evolution; (2) to take the first steps towards fully automated methods of detecting language evolution; and (3) to discuss future directions to fully utilize language evolution. We begin by analyzing the problems language evolution causes on two high-level objectives; the finding and interpreting of content in long-term archives. We present a classification of language evolution and a model, called term concept graphs, to describe different types of evolution. We continue with an in-depth analysis of two specific types of evolution, namely word sense evolution and named entity evolution. The first step in finding word sense evolution is to discover word senses present in a collection of text. We do this using word sense discrimination and start by evaluating the applicability of such algorithms to historical data. We then continue by formally defining word sense evolution, and present models for finding evolution that build on iteratively merging term concept graphs. We evaluate using a set of terms with known sense changes, and find that the corresponding evolution can successfully be found for most of these terms. We can track evolution within specific senses, including narrowing and broadening, and group senses into concepts. In addition, the evolution is detected at the time of the actual change, or with a slight delay of 2–10 years. We then consider named entity evolution and go beyond existing methods for finding different names used for the same entity over time. Our methodology builds on the use of change periods with a high likelihood of name changes and searches for evolution only in these periods. Our method avoids comparing arbitrary term contexts and recurrent computations, and shows promising results. Because our problem deals with large datasets, long time spans and diverse domains, we opt for automatic methods that do not require human input or existing resources such as dictionaries. For our experiments, we make use of The Times Archive (1785-1985) and the New York Times Annotated Corpus (1987-2007). The former provides us with a large sample of modern English in a realistic setting with noisy, unstructured text. The latter is a modern, error free collection and serves as a comparison corpus. It is also used to extend the time span for the word sense evolution experiments, resulting in 222 years of text. For each of the two classes, we provide example applications for search and browsing.
ISBN:	9798382213859