A genetic programming framework to schedule webpage updates

The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing particularly on freshness, one key challenge is to estimate the likelihood of a previously crawled webpage being modified. Such estimates are...

Full description

Saved in:
Bibliographic Details
Published in:Information retrieval (Boston) Vol. 18; no. 1; pp. 73 - 94
Main Authors: Santos, Aécio S. R., de Carvalho, Cristiano R., Almeida, Jussara M., de Moura, Edleno S., da Silva, Altigran S., Ziviani, Nivio
Format: Journal Article
Language:English
Published: Dordrecht Springer Netherlands 01-02-2015
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing particularly on freshness, one key challenge is to estimate the likelihood of a previously crawled webpage being modified. Such estimates are used to define the order in which those pages should be visited, and thus, can be exploited to reduce the cost of monitoring crawled webpages for keeping updated versions. We here present a Genetic Programming framework, called G P 4 C — Genetic Programming for Crawling , to generate score functions that produce accurate rankings of pages regarding their probabilities of having been modified. We compare G P 4 C with state-of-the-art methods using a large dataset of webpages crawled from the Brazilian Web. Our evaluation includes multiple performance metrics and several variations of our framework, built from exploring different sets of terminals and fitness functions. In particular, we evaluate G P 4 C using the ChangeRate and Normalized Discounted Cumulative Gain (NDCG) metrics as both objective function and evaluation metric. We show that, in comparison with ChangeRate, NDCG has the ability of better evaluating the effectiveness of scheduling strategies, since it is able to take the ranking produced by the scheduling into account.
AbstractList (ProQuest: ... denotes formulae and/or non-USASCII text omitted; see image) The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing particularly on freshness, one key challenge is to estimate the likelihood of a previously crawled webpage being modified. Such estimates are used to define the order in which those pages should be visited, and thus, can be exploited to reduce the cost of monitoring crawled webpages for keeping updated versions. We here present a Genetic Programming framework, called ...--Genetic Programming for Crawling, to generate score functions that produce accurate rankings of pages regarding their probabilities of having been modified. We compare ... with state-of-the-art methods using a large dataset of webpages crawled from the Brazilian Web. Our evaluation includes multiple performance metrics and several variations of our framework, built from exploring different sets of terminals and fitness functions. In particular, we evaluate ... using the ChangeRate and Normalized Discounted Cumulative Gain (NDCG) metrics as both objective function and evaluation metric. We show that, in comparison with ChangeRate, NDCG has the ability of better evaluating the effectiveness of scheduling strategies, since it is able to take the ranking produced by the scheduling into account.
The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing particularly on freshness, one key challenge is to estimate the likelihood of a previously crawled webpage being modified. Such estimates are used to define the order in which those pages should be visited, and thus, can be exploited to reduce the cost of monitoring crawled webpages for keeping updated versions. We here present a Genetic Programming framework, called G P 4 C — Genetic Programming for Crawling , to generate score functions that produce accurate rankings of pages regarding their probabilities of having been modified. We compare G P 4 C with state-of-the-art methods using a large dataset of webpages crawled from the Brazilian Web. Our evaluation includes multiple performance metrics and several variations of our framework, built from exploring different sets of terminals and fitness functions. In particular, we evaluate G P 4 C using the ChangeRate and Normalized Discounted Cumulative Gain (NDCG) metrics as both objective function and evaluation metric. We show that, in comparison with ChangeRate, NDCG has the ability of better evaluating the effectiveness of scheduling strategies, since it is able to take the ranking produced by the scheduling into account.
Author Almeida, Jussara M.
de Moura, Edleno S.
da Silva, Altigran S.
Santos, Aécio S. R.
de Carvalho, Cristiano R.
Ziviani, Nivio
Author_xml – sequence: 1
  givenname: Aécio S. R.
  surname: Santos
  fullname: Santos, Aécio S. R.
  organization: Department of Computer Science, Federal University of Minas Gerais, Zunnit Technologies
– sequence: 2
  givenname: Cristiano R.
  surname: de Carvalho
  fullname: de Carvalho, Cristiano R.
  organization: Department of Computer Science, Federal University of Minas Gerais
– sequence: 3
  givenname: Jussara M.
  surname: Almeida
  fullname: Almeida, Jussara M.
  email: jussara@dcc.ufmg.br
  organization: Department of Computer Science, Federal University of Minas Gerais
– sequence: 4
  givenname: Edleno S.
  surname: de Moura
  fullname: de Moura, Edleno S.
  organization: Institute of Computing, Federal University of Amazonas
– sequence: 5
  givenname: Altigran S.
  surname: da Silva
  fullname: da Silva, Altigran S.
  organization: Institute of Computing, Federal University of Amazonas
– sequence: 6
  givenname: Nivio
  surname: Ziviani
  fullname: Ziviani, Nivio
  organization: Department of Computer Science, Federal University of Minas Gerais, Zunnit Technologies
BookMark eNp1kD1PwzAQhi1UJNrCD2CLxGw4O_6KmKqKL6kSC8yW41xCS5sEO1HFv8dVGFiY7obnfe_0LMis7Vok5JrBLQPQd5GBLhgFJmjBhaHyjMyZ1DnVShaztOdGUSGVuCCLGHcAoIQo5uR-lTXY4rD1WR-6JrjDYds2WZ0WPHbhMxu6LPoPrMY9Zkcse9dgNvaVGzBekvPa7SNe_c4leX98eFs_083r08t6taFeCBgod6zilealZ6VDzvO61hJLYKYQJjfeFEr5Kk-U8rz2RnqhJOMOoJTSVJAvyc3Umz78GjEOdteNoU0nLTNS5RpYIRPFJsqHLsaAte3D9uDCt2VgT47s5MgmR_bkyJ4yfMrExLYNhj_N_4Z-ABqeaoo
CitedBy_id crossref_primary_10_3390_s18010016
crossref_primary_10_1016_j_eswa_2019_112915
crossref_primary_10_1016_j_knosys_2022_110126
crossref_primary_10_1007_s10796_016_9701_7
crossref_primary_10_14778_3229863_3240490
crossref_primary_10_29130_dubited_1097123
crossref_primary_10_2339_politeknik_1347054
Cites_doi 10.1145/582415.582418
10.1109/TKDE.2004.1269663
10.1145/857166.857170
10.1002/asi.22665
10.1002/asi.20009
10.1561/1500000017
10.1007/s10791-005-6991-7
10.1145/1852102.1852103
10.1016/j.is.2008.07.003
10.1007/978-3-319-02432-5_30
10.1016/B978-155860869-6/50052-4
10.1145/1277741.1277810
10.1145/335191.335391
10.1145/2433396.2433448
10.1145/1571941.1572041
10.1002/(SICI)1099-1425(199806)1:1<15::AID-JOS3>3.0.CO;2-K
10.1007/978-3-642-24583-1_23
ContentType Journal Article
Copyright Springer Science+Business Media New York 2014
Springer Science+Business Media New York 2014.
Copyright_xml – notice: Springer Science+Business Media New York 2014
– notice: Springer Science+Business Media New York 2014.
DBID AAYXX
CITATION
3V.
7SC
7WY
7WZ
7XB
87Z
88I
8AL
8AO
8FD
8FE
8FG
8FK
8FL
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
L7M
L~C
L~D
M0C
M0N
M2P
P5Z
P62
PQBIZ
PQBZA
PQEST
PQQKQ
PQUKI
PYYUZ
Q9U
DOI 10.1007/s10791-014-9248-5
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One Community College
ProQuest Central
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ABI/INFORM Collection China
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest Central Korea
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ABI/INFORM China
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList ABI/INFORM Global (Corporate)

DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
Computer Science
EISSN 1573-7659
EndPage 94
ExternalDocumentID 4296169351
10_1007_s10791_014_9248_5
GroupedDBID -59
-5G
-BR
-EM
-Y2
-~C
.4I
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
203
29I
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5VS
67Z
6NX
78A
7WY
88I
8AO
8FE
8FG
8FL
8FW
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AAFGU
AAHNG
AAIAL
AAJKR
AANZL
AARHV
AARTL
AATNV
AATVU
AAUYE
AAWCG
AAYFA
AAYIU
AAYQN
AAYTO
ABBBX
ABBXA
ABDBF
ABDZT
ABECU
ABFGW
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKAS
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACBMV
ACBRV
ACBXY
ACBYP
ACGFS
ACGOD
ACHSB
ACHXU
ACIGE
ACIPQ
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACSNA
ACTTH
ACVWB
ACWMK
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADMDM
ADOXG
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFTE
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AENEX
AEOHA
AEPYU
AESKC
AESTI
AETLH
AEVLU
AEVTX
AEXYK
AFGCZ
AFKRA
AFLOW
AFNRJ
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGBP
AGGDS
AGJBK
AGMZJ
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIIXL
AILAN
AIMYW
AITGF
AJBLW
AJDOV
AJRNO
AJZVZ
AKQUC
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ELW
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GROUPED_ABI_INFORM_RESEARCH
GXS
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IHE
IJ-
IKXTQ
IWAJR
IXC
IXD
IXE
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
LAK
LLZTM
M0C
M0N
M2P
M4Y
MA-
N2Q
NB0
NPVJJ
NQJWS
NU0
O9-
O93
O9J
OAM
OVD
P2P
P62
P9O
PF0
PQBIZ
PQQKQ
PROAC
PT4
PT5
Q2X
QOS
R89
R9I
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S27
S3B
SAP
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TEORI
TSG
TSK
TSV
TUC
U2A
UG4
UNUBA
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7X
Z7Z
Z81
Z83
Z88
ZMTXR
AAYXX
ABAKF
ACZOJ
AEFQL
AFBBN
AGQEE
AGRTI
C6C
CITATION
H13
PQBZA
7SC
7XB
8AL
8FD
8FK
JQ2
L.-
L7M
L~C
L~D
PQEST
PQUKI
Q9U
ID FETCH-LOGICAL-c440t-2a1d2d72bc1bae223ff75eb01894838c8966cd3a1d6c2fc85c46512a00b558d03
IEDL.DBID AEJHL
ISSN 1386-4564
IngestDate Tue Nov 19 05:52:14 EST 2024
Thu Nov 21 21:34:14 EST 2024
Wed Jan 03 01:20:54 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Scheduling functions
Genetic Programming
Web crawling
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c440t-2a1d2d72bc1bae223ff75eb01894838c8966cd3a1d6c2fc85c46512a00b558d03
PQID 1856370195
PQPubID 26106
PageCount 22
ParticipantIDs proquest_journals_1856370195
crossref_primary_10_1007_s10791_014_9248_5
springer_journals_10_1007_s10791_014_9248_5
PublicationCentury 2000
PublicationDate 2015-02-01
PublicationDateYYYYMMDD 2015-02-01
PublicationDate_xml – month: 02
  year: 2015
  text: 2015-02-01
  day: 01
PublicationDecade 2010
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationTitle Information retrieval (Boston)
PublicationTitleAbbrev Inf Retrieval J
PublicationYear 2015
Publisher Springer Netherlands
Springer Nature B.V
Publisher_xml – name: Springer Netherlands
– name: Springer Nature B.V
References Tan, Mitra (CR22) 2010; 28
CR2
Fan, Gordon, Pathak (CR12) 2004; 16
CR4
CR5
Fan, Fox, Pathak, Wu (CR10) 2004; 55
CR8
CR19
CR9
Koza (CR17) 1992
CR14
CR13
CR11
CR20
Jain (CR15) 1991
Olston, Najork (CR18) 2010; 4
Trotman (CR23) 2005; 8
da Costa Carvalho, Rossi, de Moura, da Silva, Fernandes (CR6) 2012; 63
Järvelin, Kekäläinen (CR16) 2002; 20
Cho, Garcia-Molina (CR3) 2003; 3
Silva, de Moura, Cavalcanti, da Silva, de Carvalho, Gonçalves (CR21) 2009; 34
Carvalho, Rossi, de Moura, Fernandes, da Silva (CR1) 2012; 55
W Fan (9248_CR10) 2004; 55
Q Tan (9248_CR22) 2010; 28
W Fan (9248_CR12) 2004; 16
9248_CR20
R Jain (9248_CR15) 1991
C Olston (9248_CR18) 2010; 4
9248_CR8
9248_CR14
J Cho (9248_CR3) 2003; 3
9248_CR5
9248_CR4
9248_CR11
9248_CR2
9248_CR13
JR Koza (9248_CR17) 1992
K Järvelin (9248_CR16) 2002; 20
TPC Silva (9248_CR21) 2009; 34
A Carvalho (9248_CR1) 2012; 55
A Trotman (9248_CR23) 2005; 8
AL da Costa Carvalho (9248_CR6) 2012; 63
9248_CR19
9248_CR9
References_xml – ident: CR19
– volume: 20
  start-page: 422
  issue: 4
  year: 2002
  end-page: 446
  ident: CR16
  article-title: Cumulated gain-based evaluation of ir techniques
  publication-title: ACM Transactions on Information Systems
  doi: 10.1145/582415.582418
  contributor:
    fullname: Kekäläinen
– year: 1991
  ident: CR15
  publication-title: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling
  contributor:
    fullname: Jain
– ident: CR4
– ident: CR14
– year: 1992
  ident: CR17
  publication-title: Genetic Programming: On the Programming of Computers by Means of Natural Selection
  contributor:
    fullname: Koza
– ident: CR2
– volume: 16
  start-page: 523
  issue: 4
  year: 2004
  end-page: 527
  ident: CR12
  article-title: Discovery of context-specific ranking functions for effective information retrieval using genetic programming
  publication-title: IEEE Transactions on Knowledge and Data Engineering
  doi: 10.1109/TKDE.2004.1269663
  contributor:
    fullname: Pathak
– volume: 3
  start-page: 256
  year: 2003
  end-page: 290
  ident: CR3
  article-title: Estimating frequency of change
  publication-title: ACM Transactions on Internet Technology
  doi: 10.1145/857166.857170
  contributor:
    fullname: Garcia-Molina
– ident: CR13
– ident: CR11
– ident: CR9
– volume: 63
  start-page: 1383
  issue: 7
  year: 2012
  end-page: 1397
  ident: CR6
  article-title: Lepref: Learn to precompute evidence fusion for efficient query evaluation
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.22665
  contributor:
    fullname: Fernandes
– ident: CR5
– volume: 55
  start-page: 628
  issue: 7
  year: 2004
  end-page: 636
  ident: CR10
  article-title: The effects of fitness functions on genetic programming-based ranking discovery for web search
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.20009
  contributor:
    fullname: Wu
– volume: 4
  start-page: 175
  issue: 3
  year: 2010
  end-page: 246
  ident: CR18
  article-title: Web crawling
  publication-title: Foundations and Trends in Information Retrieval
  doi: 10.1561/1500000017
  contributor:
    fullname: Najork
– volume: 8
  start-page: 359
  issue: 3
  year: 2005
  end-page: 381
  ident: CR23
  article-title: Learning to rank
  publication-title: Information Retrieval
  doi: 10.1007/s10791-005-6991-7
  contributor:
    fullname: Trotman
– volume: 55
  start-page: 1
  issue: 92
  year: 2012
  end-page: 28
  ident: CR1
  article-title: LePrEF: Learn to Pre-compute Evidence Fusion for Efficient Query Evaluation
  publication-title: Journal of the American Society for Information Science and Technology
  contributor:
    fullname: da Silva
– ident: CR8
– volume: 28
  start-page: 17:1
  year: 2010
  end-page: 17:27
  ident: CR22
  article-title: Clustering-based incremental web crawling
  publication-title: ACM Transactions on Information Systems
  doi: 10.1145/1852102.1852103
  contributor:
    fullname: Mitra
– ident: CR20
– volume: 34
  start-page: 276
  year: 2009
  end-page: 289
  ident: CR21
  article-title: An evolutionary approach for combining different sources of evidence in search engines
  publication-title: Information Systems
  doi: 10.1016/j.is.2008.07.003
  contributor:
    fullname: Gonçalves
– volume: 16
  start-page: 523
  issue: 4
  year: 2004
  ident: 9248_CR12
  publication-title: IEEE Transactions on Knowledge and Data Engineering
  doi: 10.1109/TKDE.2004.1269663
  contributor:
    fullname: W Fan
– volume: 28
  start-page: 17:1
  year: 2010
  ident: 9248_CR22
  publication-title: ACM Transactions on Information Systems
  doi: 10.1145/1852102.1852103
  contributor:
    fullname: Q Tan
– volume: 3
  start-page: 256
  year: 2003
  ident: 9248_CR3
  publication-title: ACM Transactions on Internet Technology
  doi: 10.1145/857166.857170
  contributor:
    fullname: J Cho
– volume: 20
  start-page: 422
  issue: 4
  year: 2002
  ident: 9248_CR16
  publication-title: ACM Transactions on Information Systems
  doi: 10.1145/582415.582418
  contributor:
    fullname: K Järvelin
– ident: 9248_CR20
  doi: 10.1007/978-3-319-02432-5_30
– ident: 9248_CR4
  doi: 10.1016/B978-155860869-6/50052-4
– volume: 55
  start-page: 1
  issue: 92
  year: 2012
  ident: 9248_CR1
  publication-title: Journal of the American Society for Information Science and Technology
  contributor:
    fullname: A Carvalho
– volume: 55
  start-page: 628
  issue: 7
  year: 2004
  ident: 9248_CR10
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.20009
  contributor:
    fullname: W Fan
– ident: 9248_CR11
– volume-title: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling
  year: 1991
  ident: 9248_CR15
  contributor:
    fullname: R Jain
– ident: 9248_CR8
  doi: 10.1145/1277741.1277810
– ident: 9248_CR2
  doi: 10.1145/335191.335391
– ident: 9248_CR19
  doi: 10.1145/2433396.2433448
– ident: 9248_CR13
  doi: 10.1145/1571941.1572041
– volume: 8
  start-page: 359
  issue: 3
  year: 2005
  ident: 9248_CR23
  publication-title: Information Retrieval
  doi: 10.1007/s10791-005-6991-7
  contributor:
    fullname: A Trotman
– volume: 34
  start-page: 276
  year: 2009
  ident: 9248_CR21
  publication-title: Information Systems
  doi: 10.1016/j.is.2008.07.003
  contributor:
    fullname: TPC Silva
– ident: 9248_CR5
  doi: 10.1002/(SICI)1099-1425(199806)1:1<15::AID-JOS3>3.0.CO;2-K
– volume: 63
  start-page: 1383
  issue: 7
  year: 2012
  ident: 9248_CR6
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.22665
  contributor:
    fullname: AL da Costa Carvalho
– ident: 9248_CR9
– ident: 9248_CR14
  doi: 10.1007/978-3-642-24583-1_23
– volume: 4
  start-page: 175
  issue: 3
  year: 2010
  ident: 9248_CR18
  publication-title: Foundations and Trends in Information Retrieval
  doi: 10.1561/1500000017
  contributor:
    fullname: C Olston
– volume-title: Genetic Programming: On the Programming of Computers by Means of Natural Selection
  year: 1992
  ident: 9248_CR17
  contributor:
    fullname: JR Koza
SSID ssj0006449
Score 2.1288292
Snippet The quality of a Web search engine is influenced by several factors, including coverage and the freshness of the content gathered by the web crawler. Focusing...
(ProQuest: ... denotes formulae and/or non-USASCII text omitted; see image) The quality of a Web search engine is influenced by several factors, including...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Publisher
StartPage 73
SubjectTerms Computer Science
Data Mining and Knowledge Discovery
Data Structures and Information Theory
Freshness
Genetic algorithms
Image quality
Information retrieval
Information Storage and Retrieval
Internet
Machine learning
Mathematical programming
Natural Language Processing (NLP)
Pattern Recognition
Performance evaluation
Performance measurement
Programming
Scheduling
Search engines
Studies
Title A genetic programming framework to schedule webpage updates
URI https://link.springer.com/article/10.1007/s10791-014-9248-5
https://www.proquest.com/docview/1856370195
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED5Bu8BAoYAotMgDYgAZJU6cOGKqoKUgxEKR2KL4xQCkFW3_P-c0aQuCAaYMfii6O_u-s-8-A5zYLAuUMh6NmeQ0jLmmMnJAToRaxzLkTLoz3cFj_PAsrnuOJoctji7y14vqRrLYqFdq3WKXpOOHFEMGQfk61NH1cLTterd3N7hf7L_o4ZMizBIRdWQp1V3mT5N89UZLiPntVrRwNv3Gf35zG7ZKaEm6c1vYgTWTN6FRPdtAylXchM0VDsImdMrKBXJKytIkp6qq9y5cdgnamCt1JGUu1zuOI7ZK6iLTEcEIGT3WmyEFV9OLIbOxO0iY7MFTvze8GtDyxQWqwtCbUpb5mmlUm_JlZhA5WBtzIz1fJKEIhBIYHCkdYK9IMasEV-4pdZZ5nuRcaC_Yh1o-ys0BEI64MtZZplmgQq6SROmEM2u1NUJayVpwVkk-Hc-JNdIlhbITYopCTJ0QU96CdqWbtFxjkxSRRhQ4NnlsPq-UsdL822SHf-p9BBuIkfg8UbsNtenHzHRgfaJnx6Xhue_t8Kb_CTyb028
link.rule.ids 315,782,786,27933,27934,41073,42142,48344,48347,48357,49649,49652,49662,52153
linkProvider Springer Nature
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NTwIxEJ0IHNSDKGpEQXswHjRNlm67240noiBE5CIm3jbbLy8KROD_2y5bQaMHPXe22cxMO2_amVeAc5NloZQ6wDERDNOYKSwiB-Q4VSoWlBHhznR7j_Hwmd92HE1O6Hth8mp3fyWZ79RrzW6xq9JpUWxzBo5ZCSo0iah15Uq7P7rrfm7ANsQneZ7FI-zYUvxl5k-TfA1HK4z57Vo0jzbd6r_-cxd2CnCJ2ktv2IMNPa5B1T_cgIp1XIPtNRbCGjSL3gV0gYrmJGcsL70P121kvcw1O6KimuvNfoeML-tC8wmyObKNWa8a5WxNLxotpu4oYXYAT93O6KaHizcXsKQ0mGOStRRR1nCyJTJtsYMxMdMiaPGE8pBLbtMjqUIrFUliJGfSPaZOsiAQjHEVhIdQHk_G-ggQs8gyVlmmSCgpk0kiVcKIMcpoLowgdbj0qk-nS2qNdEWi7JSYWiWmTokpq0PDGyctVtkstVgjCh2fvB2-8sZYG_5tsuM_SZ_BZm_0MEgH_eH9CWxZxMSWZdsNKM_fF7oJpZlanBZe-AFUYdXR
linkToPdf http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEB60BdGD1apYrZqDeFCC22zSzeJBirbUByKo4G3ZvLxoW2z7_520G1tFD-J1kw3LzGTnm2TmG4BDl-ex1jaiCVOC8kQYqpoeyEluTKK4YMqf6XYfkrtnedn2NDnnoRZmku0eriSnNQ2epak3Oh0YdzpX-Jb4jJ0Gpxg_SCoWoczxGRp6udW-7t5-_ozR3aeTmEs2qWdOCRebPy3y1TXN8Oa3K9KJ5-lU_v3Na7BagE7SmlrJOizYXhUqoaEDKfZ3FVbm2AmrsFfUNJAjUhQteSWG2Rtw1iJofb4IkhRZXm_4HnEh3YuM-gRjZ_Rlr5ZMWJxeLBkP_BHDcBOeOu3Hiy4tejFQzXk0oixvGGZQobqhcouYwrlEWBU1ZMplLLXEsEmbGGc1NXNaCu2brLM8ipQQ0kTxFpR6_Z7dBiIQcSYmzw2LNRc6TbVJBXPOOCuVU6wGx0EN2WBKuZHNyJW9EDMUYuaFmIka1IOismL3DTPEIM3Y88zj8ElQzNzwb4vt_Gn2ASzdX3ay26u7m11YRiAlptncdSiN3sd2DxaHZrxfGOQHXu3elA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+genetic+programming+framework+to+schedule+webpage+updates&rft.jtitle=Information+retrieval+%28Boston%29&rft.au=Santos%2C+A%C3%A9cio+S.+R.&rft.au=de+Carvalho%2C+Cristiano+R.&rft.au=Almeida%2C+Jussara+M.&rft.au=de+Moura%2C+Edleno+S.&rft.date=2015-02-01&rft.pub=Springer+Netherlands&rft.issn=1386-4564&rft.eissn=1573-7659&rft.volume=18&rft.issue=1&rft.spage=73&rft.epage=94&rft_id=info:doi/10.1007%2Fs10791-014-9248-5&rft.externalDocID=10_1007_s10791_014_9248_5
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1386-4564&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1386-4564&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1386-4564&client=summon