Assessing LLMs for High Stakes Applications

Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developin...

Full description

Saved in:

Bibliographic Details
Published in:	2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) pp. 103 - 105
Main Authors:	Gallagher, Shannon K., Ratchford, Jasmine, Brooks, Tyler, Brown, Bryan, Heim, Eric, McMillan, Scott, Nichols, William R., Rallapalli, Swati, Smith, Carol, VanHoudnos, Nathan, Winski, Nick, Mellinger, Andrew O.
Format:	Conference Proceeding
Language:	English
Published:	ACM 14-04-2024
Subjects:	HCI Large language models Measurement metrics Reliability scaling Security Software engineering TEVV trust Tuning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developing, deploying, and assessing LLMs within a specific high stakes application, intelligence reporting workflows. We identified the following challenges that need to be addressed before LLMs can be used in high stakes applications: (1) challenges with unverified data and data leakage, (2) challenges with fine tuning and inference at scale, and (3) challenges in re-producibility and assessment of LLMs. We argue that researchers should prioritize test and assessment metrics, as better metrics will lead to insight to further improve these LLMs.
ISSN:	2832-7659
DOI:	10.1145/3639477.3639720