Quantifying Domain Knowledge in Large Language Models

Transformer based Large language models such as BERT, have demonstrated the ability to derive contextual information from the words surrounding it. However, when these models are applied in specific domains such as medicine, insurance, or scientific disciplines, publicly available models trained on...

Full description

Saved in:
Bibliographic Details
Published in:2023 IEEE Conference on Artificial Intelligence (CAI) pp. 193 - 194
Main Authors: Sayenju, Sudhashree, Aygun, Ramazan, Franks, Bill, Johnston, Sereres, Lee, George, Choi, Hansook, Modgil, Girish
Format: Conference Proceeding
Language:English
Published: IEEE 01-06-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transformer based Large language models such as BERT, have demonstrated the ability to derive contextual information from the words surrounding it. However, when these models are applied in specific domains such as medicine, insurance, or scientific disciplines, publicly available models trained on general knowledge sources such as Wikipedia, it may not be as effective in inferring the appropriate context compared to domain-specific models trained on specialized corpora. Given the limited availability of training data for specific domains, pre-trained models can be fine-tuned via transfer learning using relatively small domain-specific corpora. However, there is currently no standardized method for quantifying the effectiveness of these domain-specific models in acquiring the necessary domain knowledge. To address this issue, we explore hidden layer embeddings and introduce domain_gain, a measure to quantify the ability of a model to infer the correct context. In this paper, we show how our measure could be utilized to determine whether words with multiple meanings are more likely to be associated with domain-related meanings rather than their colloquial meanings.
DOI:10.1109/CAI54212.2023.00091