Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequen...

Full description

Saved in:
Bibliographic Details
Published in:Big data Vol. 12; no. 3; p. 243
Main Authors: Oirbeek, Robin Van, Ponnet, Jolien, Baesens, Bart, Verdonck, Tim
Format: Journal Article
Language:English
Published: United States 01-06-2024
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.
AbstractList Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.
Author Ponnet, Jolien
Oirbeek, Robin Van
Verdonck, Tim
Baesens, Bart
Author_xml – sequence: 1
  givenname: Robin Van
  surname: Oirbeek
  fullname: Oirbeek, Robin Van
  organization: Data Office, Allianz Benelux, Brussels, Belgium
– sequence: 2
  givenname: Jolien
  orcidid: 0000-0003-2036-6213
  surname: Ponnet
  fullname: Ponnet, Jolien
  organization: Department of Mathematics, Faculty of Science, KU Leuven, Leuven, Belgium
– sequence: 3
  givenname: Bart
  orcidid: 0000-0002-5831-5668
  surname: Baesens
  fullname: Baesens, Bart
  organization: School of Management, University of Southampton, Southampton, United Kingdom
– sequence: 4
  givenname: Tim
  orcidid: 0000-0003-1105-2028
  surname: Verdonck
  fullname: Verdonck, Tim
  organization: Department of Mathematics, Faculty of Science, UAntwerp-imec, Antwerp, Belgium
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37289184$$D View this record in MEDLINE/PubMed
BookMark eNo1j11LwzAYhYMobs5deiv5A61JmjbJ5azzAwYKKghejCR9UyNtUtoM3L93-HFuzsUDh-ecoeMQAyB0QUlOiVRXxrc5I4zlhBJxhOaMViKruHiboeU0fZJDhFBc0lM0KwSTiko-R-917Idd0snHoDu8ds5bDyHh1TCM8cv3P2TC0eH0AbiOwcax0cECfhqj0cZ3Pu2xD1jja9_iG500foaUfGjP0YnT3QTLv16g19v1S32fbR7vHurVJrOsEikzQgE13OjGCW6VAMW11M46SQUrCqpLqsBJRyQH2UApHae25GAbqIxogC3Q5e_usDM9NNthPGiP--3_S_YN9ERWlw
CitedBy_id crossref_primary_10_3390_risks12050079
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1089/big.2022.0107
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: ECM
  name: MEDLINE
  url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
EISSN 2167-647X
ExternalDocumentID 37289184
Genre Journal Article
GroupedDBID 0R~
1-M
ABBKN
ABJNI
ACGFS
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BNQNF
CGR
CUY
CVF
EBS
ECM
EIF
EJD
NPM
O9-
RML
ID FETCH-LOGICAL-c267t-b79e1b4badf74c97e94a8afcf8172331a519ef8f084e8de58f41c54ecde6b7de2
IngestDate Sat Nov 02 12:24:15 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords clustering
efficient algorithm
performance measure
C-index
AUC
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c267t-b79e1b4badf74c97e94a8afcf8172331a519ef8f084e8de58f41c54ecde6b7de2
ORCID 0000-0003-1105-2028
0000-0003-2036-6213
0000-0002-5831-5668
PMID 37289184
ParticipantIDs pubmed_primary_37289184
PublicationCentury 2000
PublicationDate 2024-06-01
PublicationDateYYYYMMDD 2024-06-01
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Big data
PublicationTitleAlternate Big Data
PublicationYear 2024
SSID ssj0000779481
Score 2.3321812
Snippet Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the...
SourceID pubmed
SourceType Index Database
StartPage 243
SubjectTerms Area Under Curve
Big Data
Computer Simulation
Models, Statistical
Probability
ROC Curve
Title Computational Efficient Approximations of the Concordance Probability in a Big Data Setting
URI https://www.ncbi.nlm.nih.gov/pubmed/37289184
Volume 12
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LSwMxEA6tgnjx_X6QgzdZ3d2mze7RR8WLIqil4KEkm4ksxVZqBX--k2z2VVH04GVZEnYJmS_DzGTmG0KOkg7TsVDCU5L5HmuHykM3gqOrAjEeEBFIy1Nwfc9v-9Fll3UbjTxGVo79q6RxDGVtKmf_IO3ipziA7yhzfKLU8fkruWdtGvIQX9cyRJj7_jNDHv6RvpSpb1Pbss7wWCpbN3A3wbNtc2VtLaA4Pk-fERVTgQrFZkfXLoBxztW1ZUHadCIBhi5ZGz_vlbC7M8k02Z3HGE3eYvxcmNKnN3ftUeTf9GCicFlDB6VqXCJkZf7UCVj9FRpK9Q7j_ZqyDSugalU1Z8bW9EWj-5EhRJXpM_ryYXjiuya5FUm-vlhRtjj6jkHWb-7n2RmC7XyqSZpoLhmL-uKmCNP5nBs-G8fOios5rS1lkSzkn8_4JdY-eVghS86xoGcZIlZJA0ZrZDlv2kGdDl8nTzWA0AIgtA4QOtYUAUIrAKEVgNB0RAVFEFADEOoAskEer7oPF9ee67DhJWGHTz3JYwgkk0JpzpKYQ8xEJHSiI7RrW61AoH0POtJ-xCBS0I40C5I2g0RBR3IF4SaZG41HsE0o05ILLlEltDWLZYJ-sKnwVMoHLXxf7pCtbHsGrxmNyiDfuN1vZ_bIYomsfTKv8YzCAWm-qfdDK6ZPtXRf4w
link.rule.ids 782
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Computational+Efficient+Approximations+of+the+Concordance+Probability+in+a+Big+Data+Setting&rft.jtitle=Big+data&rft.au=Oirbeek%2C+Robin+Van&rft.au=Ponnet%2C+Jolien&rft.au=Baesens%2C+Bart&rft.au=Verdonck%2C+Tim&rft.date=2024-06-01&rft.eissn=2167-647X&rft.volume=12&rft.issue=3&rft.spage=243&rft_id=info:doi/10.1089%2Fbig.2022.0107&rft_id=info%3Apmid%2F37289184&rft_id=info%3Apmid%2F37289184&rft.externalDocID=37289184