Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG

This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that...

Full description

Saved in:
Bibliographic Details
Main Authors: Fang, Chenhao, Larson, Derek, Zhu, Shitong, Zeng, Sophie, Summer, Wendy, Peng, Yanqing, Hulovatyy, Yuriy, Rao, Rajeev, Forgues, Gabriel, Pudota, Arya, Goncalves, Alex, Robert, Hervé
Format: Journal Article
Language:English
Published: 30-09-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.
AbstractList This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.
Author Forgues, Gabriel
Peng, Yanqing
Goncalves, Alex
Hulovatyy, Yuriy
Robert, Hervé
Fang, Chenhao
Zeng, Sophie
Pudota, Arya
Zhu, Shitong
Larson, Derek
Rao, Rajeev
Summer, Wendy
Author_xml – sequence: 1
  givenname: Chenhao
  surname: Fang
  fullname: Fang, Chenhao
– sequence: 2
  givenname: Derek
  surname: Larson
  fullname: Larson, Derek
– sequence: 3
  givenname: Shitong
  surname: Zhu
  fullname: Zhu, Shitong
– sequence: 4
  givenname: Sophie
  surname: Zeng
  fullname: Zeng, Sophie
– sequence: 5
  givenname: Wendy
  surname: Summer
  fullname: Summer, Wendy
– sequence: 6
  givenname: Yanqing
  surname: Peng
  fullname: Peng, Yanqing
– sequence: 7
  givenname: Yuriy
  surname: Hulovatyy
  fullname: Hulovatyy, Yuriy
– sequence: 8
  givenname: Rajeev
  surname: Rao
  fullname: Rao, Rajeev
– sequence: 9
  givenname: Gabriel
  surname: Forgues
  fullname: Forgues, Gabriel
– sequence: 10
  givenname: Arya
  surname: Pudota
  fullname: Pudota, Arya
– sequence: 11
  givenname: Alex
  surname: Goncalves
  fullname: Goncalves, Alex
– sequence: 12
  givenname: Hervé
  surname: Robert
  fullname: Robert, Hervé
BackLink https://doi.org/10.48550/arXiv.2410.02825$$DView paper in arXiv
BookMark eNqFzs0KgkAUBeBZ1KK_B2jVvMCYmYK0EysNCiKirQw52oXxjsyMlW-fSftWB845i29MBqhQEDJfuY4fBoG75PoNT8fzu8L1Qi8YkdsBS2EsizBniVYN5hu6BVMLKQFLmnIpmzsgt6DQ0EKrisYKLWDTLS07a2E1BxQ5PR5Phr7APuglSqZkWHBpxOyXE7LY765xynpBVmuouG6zryTrJev_jw_bGEAo
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2410.02825
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2410_02825
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2410_028253
IEDL.DBID GOX
IngestDate Wed Oct 16 12:30:12 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2410_028253
OpenAccessLink https://arxiv.org/abs/2410.02825
ParticipantIDs arxiv_primary_2410_02825
PublicationCentury 2000
PublicationDate 2024-09-30
PublicationDateYYYYMMDD 2024-09-30
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-30
  day: 30
PublicationDecade 2020
PublicationYear 2024
Score 3.8738303
SecondaryResourceType preprint
Snippet This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Cryptography and Security
Title Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG
URI https://arxiv.org/abs/2410.02825
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LS8QwEB7snryIi8r6WufgNVraXbrrrbiPCusDFdlbSZMUCqVIY0X_vTPJil72mglhyINvnl8ALmVYxNFEJ0KNDTkoZUxvLiy0oLsjyYGWauoY-LKX5GE9mc2ZJgd_e2Fk-1V9en7gwl4TvIRXrr0ygCCKuGRr-bj2yUlHxbWZ_zePbEw39A8kFvuwt7HuMPXH0Ycd0xzA213DSRyRNlpwqKfRNzirLKc9CDcwk3XdqcpH5SxyvwcyZ1TFZKH1t3hqjfvHwWhcre4tcuQUn9PlIVws5q-3mXCa5O-eNiJnJXOnZHwEPXLuzQBQmVIX3ExBlgyBpZqOR0lJGC_NiESFPIbBtlVOtotOYTci8PV1DWfQ-2g7cw6B1d3Q7eAP0Cpz_A
link.rule.ids 228,230,782,887
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ingest-And-Ground%3A+Dispelling+Hallucinations+from+Continually-Pretrained+LLMs+with+RAG&rft.au=Fang%2C+Chenhao&rft.au=Larson%2C+Derek&rft.au=Zhu%2C+Shitong&rft.au=Zeng%2C+Sophie&rft.date=2024-09-30&rft_id=info:doi/10.48550%2Farxiv.2410.02825&rft.externalDocID=2410_02825