Practical Ensemble Classification Error Bounds for Different Operating Points

Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option fo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 25; no. 11; pp. 2590 - 2601
Main Authors: Varshney, Kush R., Prenger, Ryan J., Marlatt, Tracy L., Chen, Barry Y., Hanley, William G.
Format: Journal Article
Language:English
Published: IEEE 01-11-2013
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option for the classifier to leave uncertain test samples unlabeled. A generalization bound for ensemble classification at the standard operating point has been developed based on two interpretable properties of the ensemble: strength and correlation, using the Chebyshev inequality. Such generalization bounds for other operating points have not been developed previously and are developed in this paper. Significantly, the bounds are empirically shown to have much practical utility in determining optimal parameters for classification with a reject option, classification for ultralow probability of false alarm, and classification for ultralow probability of missed detection. Counter to the usual guideline of large strength and small correlation in the ensemble, different guidelines are recommended by the derived bounds in the ultralow false alarm and missed detection probability regimes.
AbstractList Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option for the classifier to leave uncertain test samples unlabeled. A generalization bound for ensemble classification at the standard operating point has been developed based on two interpretable properties of the ensemble: strength and correlation, using the Chebyshev inequality. Such generalization bounds for other operating points have not been developed previously and are developed in this paper. Significantly, the bounds are empirically shown to have much practical utility in determining optimal parameters for classification with a reject option, classification for ultralow probability of false alarm, and classification for ultralow probability of missed detection. Counter to the usual guideline of large strength and small correlation in the ensemble, different guidelines are recommended by the derived bounds in the ultralow false alarm and missed detection probability regimes.
Author Prenger, Ryan J.
Chen, Barry Y.
Varshney, Kush R.
Marlatt, Tracy L.
Hanley, William G.
Author_xml – sequence: 1
  givenname: Kush R.
  surname: Varshney
  fullname: Varshney, Kush R.
  email: krvarshn@us.ibm.com
  organization: Bus. Analytics & Math. Sci. Dept., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
– sequence: 2
  givenname: Ryan J.
  surname: Prenger
  fullname: Prenger, Ryan J.
  email: prenger1@llnl.gov
  organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA
– sequence: 3
  givenname: Tracy L.
  surname: Marlatt
  fullname: Marlatt, Tracy L.
  email: marlatt1@llnl.gov
  organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA
– sequence: 4
  givenname: Barry Y.
  surname: Chen
  fullname: Chen, Barry Y.
  email: chen52@llnl.gov
  organization: Nat. Security Eng. Div., Lawrence Livermore Nat. Lab., Livermore, CA, USA
– sequence: 5
  givenname: William G.
  surname: Hanley
  fullname: Hanley, William G.
  email: wghanley3@hotmail.com
  organization: Exponent, Menlo Park, CA, USA
BookMark eNo9kL1OwzAURi1UJNrCyMTiF0jxtR3bGaENP6KoHcoc2ek1Ckqdyg4Db4-rIqZ7dHX0DWdGJmEISMgtsAUAq-53b6t6wRnwBYfqgkyhLE2RESaZmYRCCqmvyCylL8aY0Qam5H0bbTt2re1pHRIeXI902duUOp-fYzcEWsc4RPo4fId9oj7jqvMeI4aRbo4YsxQ-6XbowpiuyaW3fcKbvzsnH0_1bvlSrDfPr8uHddHyUo-FRyf3mikBXkGFoNWeO1tyAcLrspIgnHGoRcW51b5VJXPeIPfSW-mUN2JOivNuG4eUIvrmGLuDjT8NsObUojm1aE4tmhwg-3dnv0PEf1cJbYSqxC-W81zL
CODEN ITKEEH
CitedBy_id crossref_primary_10_1089_big_2016_0051
crossref_primary_10_1142_S1469026813400038
crossref_primary_10_1049_iet_syb_2015_0064
crossref_primary_10_1016_j_ins_2017_09_064
crossref_primary_10_1109_ACCESS_2016_2628102
Cites_doi 10.1214/09-EJS363
10.1002/cjs.5550340410
10.1109/TIT.1970.1054406
10.1109/TKDE.2011.58
10.1023/A:1022859003006
10.1109/TKDE.2011.28
10.1214/009053604000000058
10.1109/TKDE.2007.190724
10.1109/TKDE.2009.156
10.1007/s10994-007-5013-y
10.1109/TKDE.2004.29
10.1007/s10994-006-9449-2
10.1007/BF02530506
10.1214/aos/1024691079
10.1109/TKDE.2006.127
10.1006/jcss.1997.1504
10.1145/1835804.1835911
10.1007/BF00058655
10.1214/aos/1024691352
10.1109/SSP.2011.5967817
10.1109/TKDE.2011.207
10.1137/S1052623401399903
10.1023/A:1010933404324
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TKDE.2012.219
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library Online
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2191
EndPage 2601
ExternalDocumentID 10_1109_TKDE_2012_219
6378369
Genre orig-research
GroupedDBID -~X
.DC
0R~
1OL
29I
4.4
5GY
5VS
6IK
97E
9M8
AAJGR
AASAJ
AAYOK
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AI.
AIBXA
AKJIK
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIC
RIE
RIG
RNI
RNS
RXW
RZB
TAE
TAF
TN5
UHB
VH1
XFK
AAYXX
CITATION
ID FETCH-LOGICAL-c257t-feb4d70631f619e176d2ba52313f759413b8be73922a7fc650bf8e2f4fa4b6f83
IEDL.DBID RIE
ISSN 1041-4347
IngestDate Fri Aug 23 01:04:22 EDT 2024
Wed Jun 26 19:28:22 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c257t-feb4d70631f619e176d2ba52313f759413b8be73922a7fc650bf8e2f4fa4b6f83
PageCount 12
ParticipantIDs crossref_primary_10_1109_TKDE_2012_219
ieee_primary_6378369
PublicationCentury 2000
PublicationDate 2013-11-01
PublicationDateYYYYMMDD 2013-11-01
PublicationDate_xml – month: 11
  year: 2013
  text: 2013-11-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2013
Publisher IEEE
Publisher_xml – name: IEEE
References bibttk201311259018
yuan (bibttk201311259021) 2010; 11
chen (bibttk201311259034) 2004
bibttk201311259030
cantelli (bibttk201311259031) 1928; 6
bibttk201311259011
bibttk201311259012
bibttk201311259032
bibttk201311259015
bibttk201311259016
bibttk201311259013
van trees (bibttk20131125907) 1968
bibttk201311259014
bibttk20131125901
bibttk201311259028
bibttk201311259029
varshney (bibttk20131125905) 2011
wang (bibttk201311259010) 2011; 12
bibttk20131125906
biau (bibttk201311259025) 2008; 9
bibttk20131125909
bibttk20131125908
bibttk20131125903
bibttk20131125902
bibttk20131125904
el-yaniv (bibttk201311259022) 2010; 11
usunier (bibttk201311259017) 2005
bartlett (bibttk201311259019) 2008; 9
bibttk201311259023
bibttk201311259020
asuncion (bibttk201311259033) 2007
bibttk201311259026
bibttk201311259027
freund (bibttk201311259024) 2004
References_xml – ident: bibttk201311259020
  doi: 10.1214/09-EJS363
– volume: 6
  start-page: 47
  year: 1928
  ident: bibttk201311259031
  article-title: Sui Confini Della Probabilit
  publication-title: Atti del Congresso Internazionale dei Matematici
  contributor:
    fullname: cantelli
– volume: 9
  start-page: 2015
  year: 2008
  ident: bibttk201311259025
  article-title: Consistency of Random Forests and Other Averaging Classifiers
  publication-title: J Machine Learning Research
  contributor:
    fullname: biau
– ident: bibttk201311259018
  doi: 10.1002/cjs.5550340410
– ident: bibttk20131125906
  doi: 10.1109/TIT.1970.1054406
– ident: bibttk201311259015
  doi: 10.1109/TKDE.2011.58
– ident: bibttk201311259026
  doi: 10.1023/A:1022859003006
– ident: bibttk201311259016
  doi: 10.1109/TKDE.2011.28
– start-page: 1698
  year: 2004
  ident: bibttk201311259024
  article-title: Generalization Bounds for Averaged Classifiers
  publication-title: Annals of Statistics
  doi: 10.1214/009053604000000058
  contributor:
    fullname: freund
– ident: bibttk20131125904
  doi: 10.1109/TKDE.2007.190724
– ident: bibttk201311259014
  doi: 10.1109/TKDE.2009.156
– ident: bibttk201311259030
  doi: 10.1007/s10994-007-5013-y
– volume: 11
  start-page: 111
  year: 2010
  ident: bibttk201311259021
  article-title: Classification Methods with Reject Option Based on Convex Risk Minimization
  publication-title: J Machine Learning Research
  contributor:
    fullname: yuan
– ident: bibttk201311259027
  doi: 10.1109/TKDE.2004.29
– volume: 9
  start-page: 1823
  year: 2008
  ident: bibttk201311259019
  article-title: Classification with a Reject Option Using a Hinge Loss
  publication-title: J Machine Learning Research
  contributor:
    fullname: bartlett
– volume: 11
  start-page: 1605
  year: 2010
  ident: bibttk201311259022
  article-title: On the Foundations of Noise-Free Selective Classification
  publication-title: J Machine Learning Research
  contributor:
    fullname: el-yaniv
– ident: bibttk201311259028
  doi: 10.1007/s10994-006-9449-2
– ident: bibttk201311259023
  doi: 10.1007/BF02530506
– year: 2011
  ident: bibttk20131125905
  article-title: Classification of IT Service Tickets for Defect Prevention
  publication-title: Proc INFORMS Annu Meeting
  contributor:
    fullname: varshney
– ident: bibttk20131125903
  doi: 10.1214/aos/1024691079
– ident: bibttk201311259013
  doi: 10.1109/TKDE.2006.127
– year: 2005
  ident: bibttk201311259017
  article-title: A Data-Dependent Generalisation Error Bound for the AUC
  publication-title: Proc ICML Workshop ROC Anal Mach Learning
  contributor:
    fullname: usunier
– ident: bibttk20131125902
  doi: 10.1006/jcss.1997.1504
– ident: bibttk201311259011
  doi: 10.1145/1835804.1835911
– ident: bibttk20131125901
  doi: 10.1007/BF00058655
– volume: 12
  start-page: 1835
  year: 2011
  ident: bibttk201311259010
  article-title: A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin
  publication-title: J Machine Learning Research
  contributor:
    fullname: wang
– year: 2007
  ident: bibttk201311259033
  article-title: UCI Machine Learning Repository
  contributor:
    fullname: asuncion
– ident: bibttk20131125909
  doi: 10.1214/aos/1024691352
– ident: bibttk201311259012
  doi: 10.1109/SSP.2011.5967817
– ident: bibttk201311259029
  doi: 10.1109/TKDE.2011.207
– year: 1968
  ident: bibttk20131125907
  publication-title: Detection Estimation and Modulation Theory
  contributor:
    fullname: van trees
– ident: bibttk201311259032
  doi: 10.1137/S1052623401399903
– ident: bibttk20131125908
  doi: 10.1023/A:1010933404324
– year: 2004
  ident: bibttk201311259034
  article-title: Using Random Forest to Learn Imbalanced Data
  contributor:
    fullname: chen
SSID ssj0008781
Score 2.2030098
Snippet Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 2590
SubjectTerms Cantelli inequality
Chebyshev approximation
Correlation
Guidelines
Humans
random forests
receiver operating characteristic
Receivers
reject option
Terrorism
Title Practical Ensemble Classification Error Bounds for Different Operating Points
URI https://ieeexplore.ieee.org/document/6378369
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT4NAEJ3YnvRgtdVYv7IH40kqZSkLR7U0TYwfiTXxRthlNjFRaID-f2cXWnvw4o0QQsibLPsezLwHcJW6auJnEh0tpXT8TEknRVKtmS9S7oap0mhmh-dv4vkjnMbGJudmMwuDiLb5DEfm0P7Lzwq1Mp_KbgNuZg6iDnREFDazWpu3bihsICmpC9JE3Be_fpq3i8dpbJq4vJFnDHW29p-tQBW7n8x6_3uSA9hveSO7awp9CDuY96G3zmRg7RLtw96WweAAnho7IqoDi_MKv-UXMhuDaRqEbE1YXJZFye5NulLFiMGyaRuZUrOXpXFcphux1-Izr6sjeJ_Fi4e50wYoOIpWYu1olH4miISMNekkHIsg82RK0nPMtZhEtH_JUKIgiuSlQisia1KH6Glfp74MdMiPoZsXOZ4AI5lCyBJ_4MZgTqmIo-ni5Gkw1sp19RCu17Amy8YnI7H6wo0Sg39i8E8I_yEMDJybi1okT_8-fQa7nsmfsMN_59CtyxVeQKfKVpe2_D_O-a78
link.rule.ids 315,782,786,798,27933,27934,54767
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT4NAEJ3YelAPVluN9XMPxpO0FChLj2ppavqhiTXxRthlNjFR2lD6_51daO3BizdCCCFvdpl5MPMewG1sy66XCLSUEMLyEimsGIm1Jh6PXTuIpUI9Ozx849OPoB9qmZz7zSwMIprmM2zpQ_MvP5nLlf5U1vZdPXPQq8Bu1-M-L6a1Nu_dgBtLUuIXxIpcj_8qarZno36o27iclqMldbYy0Jaliskog9r_nuUIDsvKkT0UoT6GHUzrUFu7MrByk9bhYEtisAGTQpCIIsHCdInf4guZMcLULUImKizMsnnGHrW_0pJRDcv6pWlKzl4WWnOZbsRe559pvjyB90E4expapYWCJWkv5pZC4SWcypCOIqaEHe4njoiJfHZcxbs9ymAiEMipSHJiriSVa0IF6ChPxZ7wVeCeQjWdp3gGjIgKIUsVhKsl5qTsuaj7ON3Y7yhp26oJd2tYo0WhlBEZhmH3Io1_pPGPCP8mNDScm4tKJM__Pn0De8PZZByNn6ejC9h3tBuFGQW8hGqerfAKKstkdW2Wwg82GLJN
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Practical+Ensemble+Classification+Error+Bounds+for+Different+Operating+Points&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Varshney%2C+Kush+R.&rft.au=Prenger%2C+Ryan+J.&rft.au=Marlatt%2C+Tracy+L.&rft.au=Chen%2C+Barry+Y.&rft.date=2013-11-01&rft.pub=IEEE&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=25&rft.issue=11&rft.spage=2590&rft.epage=2601&rft_id=info:doi/10.1109%2FTKDE.2012.219&rft.externalDocID=6378369
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon