Computational Efficient Approximations of the Concordance Probability in a Big Data Setting
Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequen...
Saved in:
Published in: | Big data Vol. 12; no. 3; p. 243 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
United States
01-06-2024
|
Subjects: | |
Online Access: | Get more information |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations. |
---|---|
AbstractList | Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations. |
Author | Ponnet, Jolien Oirbeek, Robin Van Verdonck, Tim Baesens, Bart |
Author_xml | – sequence: 1 givenname: Robin Van surname: Oirbeek fullname: Oirbeek, Robin Van organization: Data Office, Allianz Benelux, Brussels, Belgium – sequence: 2 givenname: Jolien orcidid: 0000-0003-2036-6213 surname: Ponnet fullname: Ponnet, Jolien organization: Department of Mathematics, Faculty of Science, KU Leuven, Leuven, Belgium – sequence: 3 givenname: Bart orcidid: 0000-0002-5831-5668 surname: Baesens fullname: Baesens, Bart organization: School of Management, University of Southampton, Southampton, United Kingdom – sequence: 4 givenname: Tim orcidid: 0000-0003-1105-2028 surname: Verdonck fullname: Verdonck, Tim organization: Department of Mathematics, Faculty of Science, UAntwerp-imec, Antwerp, Belgium |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37289184$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j11LwzAYhYMobs5deiv5A61JmjbJ5azzAwYKKghejCR9UyNtUtoM3L93-HFuzsUDh-ecoeMQAyB0QUlOiVRXxrc5I4zlhBJxhOaMViKruHiboeU0fZJDhFBc0lM0KwSTiko-R-917Idd0snHoDu8ds5bDyHh1TCM8cv3P2TC0eH0AbiOwcax0cECfhqj0cZ3Pu2xD1jja9_iG500foaUfGjP0YnT3QTLv16g19v1S32fbR7vHurVJrOsEikzQgE13OjGCW6VAMW11M46SQUrCqpLqsBJRyQH2UApHae25GAbqIxogC3Q5e_usDM9NNthPGiP--3_S_YN9ERWlw |
CitedBy_id | crossref_primary_10_3390_risks12050079 |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1089/big.2022.0107 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: ECM name: MEDLINE url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2167-647X |
ExternalDocumentID | 37289184 |
Genre | Journal Article |
GroupedDBID | 0R~ 1-M ABBKN ABJNI ACGFS ADBBV ALMA_UNASSIGNED_HOLDINGS BNQNF CGR CUY CVF EBS ECM EIF EJD NPM O9- RML |
ID | FETCH-LOGICAL-c267t-b79e1b4badf74c97e94a8afcf8172331a519ef8f084e8de58f41c54ecde6b7de2 |
IngestDate | Sat Nov 02 12:24:15 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | clustering efficient algorithm performance measure C-index AUC |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c267t-b79e1b4badf74c97e94a8afcf8172331a519ef8f084e8de58f41c54ecde6b7de2 |
ORCID | 0000-0003-1105-2028 0000-0003-2036-6213 0000-0002-5831-5668 |
PMID | 37289184 |
ParticipantIDs | pubmed_primary_37289184 |
PublicationCentury | 2000 |
PublicationDate | 2024-06-01 |
PublicationDateYYYYMMDD | 2024-06-01 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-06-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Big data |
PublicationTitleAlternate | Big Data |
PublicationYear | 2024 |
SSID | ssj0000779481 |
Score | 2.3321812 |
Snippet | Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 243 |
SubjectTerms | Area Under Curve Big Data Computer Simulation Models, Statistical Probability ROC Curve |
Title | Computational Efficient Approximations of the Concordance Probability in a Big Data Setting |
URI | https://www.ncbi.nlm.nih.gov/pubmed/37289184 |
Volume | 12 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LSwMxEA6tgnjx_X6QgzdZ3d2mze7RR8WLIqil4KEkm4ksxVZqBX--k2z2VVH04GVZEnYJmS_DzGTmG0KOkg7TsVDCU5L5HmuHykM3gqOrAjEeEBFIy1Nwfc9v-9Fll3UbjTxGVo79q6RxDGVtKmf_IO3ipziA7yhzfKLU8fkruWdtGvIQX9cyRJj7_jNDHv6RvpSpb1Pbss7wWCpbN3A3wbNtc2VtLaA4Pk-fERVTgQrFZkfXLoBxztW1ZUHadCIBhi5ZGz_vlbC7M8k02Z3HGE3eYvxcmNKnN3ftUeTf9GCicFlDB6VqXCJkZf7UCVj9FRpK9Q7j_ZqyDSugalU1Z8bW9EWj-5EhRJXpM_ryYXjiuya5FUm-vlhRtjj6jkHWb-7n2RmC7XyqSZpoLhmL-uKmCNP5nBs-G8fOios5rS1lkSzkn8_4JdY-eVghS86xoGcZIlZJA0ZrZDlv2kGdDl8nTzWA0AIgtA4QOtYUAUIrAKEVgNB0RAVFEFADEOoAskEer7oPF9ee67DhJWGHTz3JYwgkk0JpzpKYQ8xEJHSiI7RrW61AoH0POtJ-xCBS0I40C5I2g0RBR3IF4SaZG41HsE0o05ILLlEltDWLZYJ-sKnwVMoHLXxf7pCtbHsGrxmNyiDfuN1vZ_bIYomsfTKv8YzCAWm-qfdDK6ZPtXRf4w |
link.rule.ids | 782 |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Computational+Efficient+Approximations+of+the+Concordance+Probability+in+a+Big+Data+Setting&rft.jtitle=Big+data&rft.au=Oirbeek%2C+Robin+Van&rft.au=Ponnet%2C+Jolien&rft.au=Baesens%2C+Bart&rft.au=Verdonck%2C+Tim&rft.date=2024-06-01&rft.eissn=2167-647X&rft.volume=12&rft.issue=3&rft.spage=243&rft_id=info:doi/10.1089%2Fbig.2022.0107&rft_id=info%3Apmid%2F37289184&rft_id=info%3Apmid%2F37289184&rft.externalDocID=37289184 |