Automated content assessment of text using Latent Semantic Analysis to simulate human cognition
Latent Semantic Analysis (LSA) is both a theory of human knowledge representation and a method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. The underlying idea is that the aggregate of all the word contexts in wh...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Latent Semantic Analysis (LSA) is both a theory of human knowledge representation and a method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. Simulations of psycholinguistic phenomena show that LSA reflects similarities of human meaning effectively. The adequacy of LSA's reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word-word and passage-word lexical priming data; it accurately estimates learnability of passages by individual students and the quality and quantity of knowledge contained in an essay.
To assess essay quality, LSA is first trained on domain-representative text. Then student essays are characterized by LSA representations of the meaning of their contained words and compared with essays of known quality on degree of conceptual relevance and amount of relevant content. Over many diverse topics, LSA scores agreed with human experts as accurately as expert scores agreed with each other.
LSA has also been used to characterize tasks, occupations and personnel and measure the overlap in content between instructional courses covering the full range of tasks performed in many different occupations. It extracts semantic information about people, occupations, and task-experience contained in natural-text databases. The various kinds of information are all represented in the same way in a common semantic space. As a result, the system can match or compare any of these objects with any one or more of the others. LSA-based agent software can help to identify required job knowledge, determine which members of the workforce have the knowledge, pinpoint needed retraining content, and maximize training and retraining efficiency.
Computational models of concept relations using LSA representations demonstrate that categories can be emergent and self-organizing based exclusively on the way language is used in the corpus without explicit hand-coding of category membership or semantic features. LSA modeling also shows that the categories which are most often impaired in category specific semantic disnomias are those that show the most internal coherence in LSA representational structure. If brain structure corresponds to LSA structure, the identification of concepts belonging to strongly clustered categories should suffer more than weakly clustered concepts when their representations are partially damaged. |
---|---|
Bibliography: | Director: Thomas K Landauer. Source: Dissertation Abstracts International, Volume: 61-07, Section: B, page: 3873. |
ISBN: | 0599854324 9780599854321 |