Practical Ensemble Classification Error Bounds for Different Operating Points
Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option fo...
Saved in:
Published in: | IEEE transactions on knowledge and data engineering Vol. 25; no. 11; pp. 2590 - 2601 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
IEEE
01-11-2013
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option for the classifier to leave uncertain test samples unlabeled. A generalization bound for ensemble classification at the standard operating point has been developed based on two interpretable properties of the ensemble: strength and correlation, using the Chebyshev inequality. Such generalization bounds for other operating points have not been developed previously and are developed in this paper. Significantly, the bounds are empirically shown to have much practical utility in determining optimal parameters for classification with a reject option, classification for ultralow probability of false alarm, and classification for ultralow probability of missed detection. Counter to the usual guideline of large strength and small correlation in the ensemble, different guidelines are recommended by the derived bounds in the ultralow false alarm and missed detection probability regimes. |
---|---|
AbstractList | Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option for the classifier to leave uncertain test samples unlabeled. A generalization bound for ensemble classification at the standard operating point has been developed based on two interpretable properties of the ensemble: strength and correlation, using the Chebyshev inequality. Such generalization bounds for other operating points have not been developed previously and are developed in this paper. Significantly, the bounds are empirically shown to have much practical utility in determining optimal parameters for classification with a reject option, classification for ultralow probability of false alarm, and classification for ultralow probability of missed detection. Counter to the usual guideline of large strength and small correlation in the ensemble, different guidelines are recommended by the derived bounds in the ultralow false alarm and missed detection probability regimes. |
Author | Prenger, Ryan J. Chen, Barry Y. Varshney, Kush R. Marlatt, Tracy L. Hanley, William G. |
Author_xml | – sequence: 1 givenname: Kush R. surname: Varshney fullname: Varshney, Kush R. email: krvarshn@us.ibm.com organization: Bus. Analytics & Math. Sci. Dept., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA – sequence: 2 givenname: Ryan J. surname: Prenger fullname: Prenger, Ryan J. email: prenger1@llnl.gov organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA – sequence: 3 givenname: Tracy L. surname: Marlatt fullname: Marlatt, Tracy L. email: marlatt1@llnl.gov organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA – sequence: 4 givenname: Barry Y. surname: Chen fullname: Chen, Barry Y. email: chen52@llnl.gov organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA – sequence: 5 givenname: William G. surname: Hanley fullname: Hanley, William G. email: wghanley3@hotmail.com organization: Exponent, Menlo Park, CA, USA |
BookMark | eNo9kL1OwzAURi1UJNrCyMTiF0jxtR3bGaENP6KoHcoc2ek1Ckqdyg4Db4-rIqZ7dHX0DWdGJmEISMgtsAUAq-53b6t6wRnwBYfqgkyhLE2RESaZmYRCCqmvyCylL8aY0Qam5H0bbTt2re1pHRIeXI902duUOp-fYzcEWsc4RPo4fId9oj7jqvMeI4aRbo4YsxQ-6XbowpiuyaW3fcKbvzsnH0_1bvlSrDfPr8uHddHyUo-FRyf3mikBXkGFoNWeO1tyAcLrspIgnHGoRcW51b5VJXPeIPfSW-mUN2JOivNuG4eUIvrmGLuDjT8NsObUojm1aE4tmhwg-3dnv0PEf1cJbYSqxC-W81zL |
CODEN | ITKEEH |
CitedBy_id | crossref_primary_10_1089_big_2016_0051 crossref_primary_10_1142_S1469026813400038 crossref_primary_10_1049_iet_syb_2015_0064 crossref_primary_10_1016_j_ins_2017_09_064 crossref_primary_10_1109_ACCESS_2016_2628102 |
Cites_doi | 10.1214/09-EJS363 10.1002/cjs.5550340410 10.1109/TIT.1970.1054406 10.1109/TKDE.2011.58 10.1023/A:1022859003006 10.1109/TKDE.2011.28 10.1214/009053604000000058 10.1109/TKDE.2007.190724 10.1109/TKDE.2009.156 10.1007/s10994-007-5013-y 10.1109/TKDE.2004.29 10.1007/s10994-006-9449-2 10.1007/BF02530506 10.1214/aos/1024691079 10.1109/TKDE.2006.127 10.1006/jcss.1997.1504 10.1145/1835804.1835911 10.1007/BF00058655 10.1214/aos/1024691352 10.1109/SSP.2011.5967817 10.1109/TKDE.2011.207 10.1137/S1052623401399903 10.1023/A:1010933404324 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TKDE.2012.219 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library Online CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2191 |
EndPage | 2601 |
ExternalDocumentID | 10_1109_TKDE_2012_219 6378369 |
Genre | orig-research |
GroupedDBID | -~X .DC 0R~ 1OL 29I 4.4 5GY 5VS 6IK 97E 9M8 AAJGR AASAJ AAYOK ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNI RNS RXW RZB TAE TAF TN5 UHB VH1 XFK AAYXX CITATION |
ID | FETCH-LOGICAL-c257t-feb4d70631f619e176d2ba52313f759413b8be73922a7fc650bf8e2f4fa4b6f83 |
IEDL.DBID | RIE |
ISSN | 1041-4347 |
IngestDate | Fri Aug 23 01:04:22 EDT 2024 Wed Jun 26 19:28:22 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c257t-feb4d70631f619e176d2ba52313f759413b8be73922a7fc650bf8e2f4fa4b6f83 |
PageCount | 12 |
ParticipantIDs | crossref_primary_10_1109_TKDE_2012_219 ieee_primary_6378369 |
PublicationCentury | 2000 |
PublicationDate | 2013-11-01 |
PublicationDateYYYYMMDD | 2013-11-01 |
PublicationDate_xml | – month: 11 year: 2013 text: 2013-11-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | IEEE transactions on knowledge and data engineering |
PublicationTitleAbbrev | TKDE |
PublicationYear | 2013 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | bibttk201311259018 yuan (bibttk201311259021) 2010; 11 chen (bibttk201311259034) 2004 bibttk201311259030 cantelli (bibttk201311259031) 1928; 6 bibttk201311259011 bibttk201311259012 bibttk201311259032 bibttk201311259015 bibttk201311259016 bibttk201311259013 van trees (bibttk20131125907) 1968 bibttk201311259014 bibttk20131125901 bibttk201311259028 bibttk201311259029 varshney (bibttk20131125905) 2011 wang (bibttk201311259010) 2011; 12 bibttk20131125906 biau (bibttk201311259025) 2008; 9 bibttk20131125909 bibttk20131125908 bibttk20131125903 bibttk20131125902 bibttk20131125904 el-yaniv (bibttk201311259022) 2010; 11 usunier (bibttk201311259017) 2005 bartlett (bibttk201311259019) 2008; 9 bibttk201311259023 bibttk201311259020 asuncion (bibttk201311259033) 2007 bibttk201311259026 bibttk201311259027 freund (bibttk201311259024) 2004 |
References_xml | – ident: bibttk201311259020 doi: 10.1214/09-EJS363 – volume: 6 start-page: 47 year: 1928 ident: bibttk201311259031 article-title: Sui Confini Della Probabilit publication-title: Atti del Congresso Internazionale dei Matematici contributor: fullname: cantelli – volume: 9 start-page: 2015 year: 2008 ident: bibttk201311259025 article-title: Consistency of Random Forests and Other Averaging Classifiers publication-title: J Machine Learning Research contributor: fullname: biau – ident: bibttk201311259018 doi: 10.1002/cjs.5550340410 – ident: bibttk20131125906 doi: 10.1109/TIT.1970.1054406 – ident: bibttk201311259015 doi: 10.1109/TKDE.2011.58 – ident: bibttk201311259026 doi: 10.1023/A:1022859003006 – ident: bibttk201311259016 doi: 10.1109/TKDE.2011.28 – start-page: 1698 year: 2004 ident: bibttk201311259024 article-title: Generalization Bounds for Averaged Classifiers publication-title: Annals of Statistics doi: 10.1214/009053604000000058 contributor: fullname: freund – ident: bibttk20131125904 doi: 10.1109/TKDE.2007.190724 – ident: bibttk201311259014 doi: 10.1109/TKDE.2009.156 – ident: bibttk201311259030 doi: 10.1007/s10994-007-5013-y – volume: 11 start-page: 111 year: 2010 ident: bibttk201311259021 article-title: Classification Methods with Reject Option Based on Convex Risk Minimization publication-title: J Machine Learning Research contributor: fullname: yuan – ident: bibttk201311259027 doi: 10.1109/TKDE.2004.29 – volume: 9 start-page: 1823 year: 2008 ident: bibttk201311259019 article-title: Classification with a Reject Option Using a Hinge Loss publication-title: J Machine Learning Research contributor: fullname: bartlett – volume: 11 start-page: 1605 year: 2010 ident: bibttk201311259022 article-title: On the Foundations of Noise-Free Selective Classification publication-title: J Machine Learning Research contributor: fullname: el-yaniv – ident: bibttk201311259028 doi: 10.1007/s10994-006-9449-2 – ident: bibttk201311259023 doi: 10.1007/BF02530506 – year: 2011 ident: bibttk20131125905 article-title: Classification of IT Service Tickets for Defect Prevention publication-title: Proc INFORMS Annu Meeting contributor: fullname: varshney – ident: bibttk20131125903 doi: 10.1214/aos/1024691079 – ident: bibttk201311259013 doi: 10.1109/TKDE.2006.127 – year: 2005 ident: bibttk201311259017 article-title: A Data-Dependent Generalisation Error Bound for the AUC publication-title: Proc ICML Workshop ROC Anal Mach Learning contributor: fullname: usunier – ident: bibttk20131125902 doi: 10.1006/jcss.1997.1504 – ident: bibttk201311259011 doi: 10.1145/1835804.1835911 – ident: bibttk20131125901 doi: 10.1007/BF00058655 – volume: 12 start-page: 1835 year: 2011 ident: bibttk201311259010 article-title: A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin publication-title: J Machine Learning Research contributor: fullname: wang – year: 2007 ident: bibttk201311259033 article-title: UCI Machine Learning Repository contributor: fullname: asuncion – ident: bibttk20131125909 doi: 10.1214/aos/1024691352 – ident: bibttk201311259012 doi: 10.1109/SSP.2011.5967817 – ident: bibttk201311259029 doi: 10.1109/TKDE.2011.207 – year: 1968 ident: bibttk20131125907 publication-title: Detection Estimation and Modulation Theory contributor: fullname: van trees – ident: bibttk201311259032 doi: 10.1137/S1052623401399903 – ident: bibttk20131125908 doi: 10.1023/A:1010933404324 – year: 2004 ident: bibttk201311259034 article-title: Using Random Forest to Learn Imbalanced Data contributor: fullname: chen |
SSID | ssj0008781 |
Score | 2.2030098 |
Snippet | Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication... |
SourceID | crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 2590 |
SubjectTerms | Cantelli inequality Chebyshev approximation Correlation Guidelines Humans random forests receiver operating characteristic Receivers reject option Terrorism |
Title | Practical Ensemble Classification Error Bounds for Different Operating Points |
URI | https://ieeexplore.ieee.org/document/6378369 |
Volume | 25 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT4NAEJ3YnvRgtdVYv7IH40kqZSkLR7U0TYwfiTXxRthlNjFRaID-f2cXWnvw4o0QQsibLPsezLwHcJW6auJnEh0tpXT8TEknRVKtmS9S7oap0mhmh-dv4vkjnMbGJudmMwuDiLb5DEfm0P7Lzwq1Mp_KbgNuZg6iDnREFDazWpu3bihsICmpC9JE3Be_fpq3i8dpbJq4vJFnDHW29p-tQBW7n8x6_3uSA9hveSO7awp9CDuY96G3zmRg7RLtw96WweAAnho7IqoDi_MKv-UXMhuDaRqEbE1YXJZFye5NulLFiMGyaRuZUrOXpXFcphux1-Izr6sjeJ_Fi4e50wYoOIpWYu1olH4miISMNekkHIsg82RK0nPMtZhEtH_JUKIgiuSlQisia1KH6Glfp74MdMiPoZsXOZ4AI5lCyBJ_4MZgTqmIo-ni5Gkw1sp19RCu17Amy8YnI7H6wo0Sg39i8E8I_yEMDJybi1okT_8-fQa7nsmfsMN_59CtyxVeQKfKVpe2_D_O-a78 |
link.rule.ids | 315,782,786,798,27933,27934,54767 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT4NAEJ3YelAPVluN9XMPxpO0FChLj2ppavqhiTXxRthlNjFR2lD6_51daO3BizdCCCFvdpl5MPMewG1sy66XCLSUEMLyEimsGIm1Jh6PXTuIpUI9Ozx849OPoB9qmZz7zSwMIprmM2zpQ_MvP5nLlf5U1vZdPXPQq8Bu1-M-L6a1Nu_dgBtLUuIXxIpcj_8qarZno36o27iclqMldbYy0Jaliskog9r_nuUIDsvKkT0UoT6GHUzrUFu7MrByk9bhYEtisAGTQpCIIsHCdInf4guZMcLULUImKizMsnnGHrW_0pJRDcv6pWlKzl4WWnOZbsRe559pvjyB90E4expapYWCJWkv5pZC4SWcypCOIqaEHe4njoiJfHZcxbs9ymAiEMipSHJiriSVa0IF6ChPxZ7wVeCeQjWdp3gGjIgKIUsVhKsl5qTsuaj7ON3Y7yhp26oJd2tYo0WhlBEZhmH3Io1_pPGPCP8mNDScm4tKJM__Pn0De8PZZByNn6ejC9h3tBuFGQW8hGqerfAKKstkdW2Wwg82GLJN |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Practical+Ensemble+Classification+Error+Bounds+for+Different+Operating+Points&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Varshney%2C+Kush+R.&rft.au=Prenger%2C+Ryan+J.&rft.au=Marlatt%2C+Tracy+L.&rft.au=Chen%2C+Barry+Y.&rft.date=2013-11-01&rft.pub=IEEE&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=25&rft.issue=11&rft.spage=2590&rft.epage=2601&rft_id=info:doi/10.1109%2FTKDE.2012.219&rft.externalDocID=6378369 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |