Assessing LLMs for High Stakes Applications
Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developin...
Saved in:
Published in: | 2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) pp. 103 - 105 |
---|---|
Main Authors: | , , , , , , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
ACM
14-04-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developing, deploying, and assessing LLMs within a specific high stakes application, intelligence reporting workflows. We identified the following challenges that need to be addressed before LLMs can be used in high stakes applications: (1) challenges with unverified data and data leakage, (2) challenges with fine tuning and inference at scale, and (3) challenges in re-producibility and assessment of LLMs. We argue that researchers should prioritize test and assessment metrics, as better metrics will lead to insight to further improve these LLMs. |
---|---|
ISSN: | 2832-7659 |
DOI: | 10.1145/3639477.3639720 |