Big Data: from collection to visualization

Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘sto...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning Vol. 106; no. 6; pp. 837 - 862
Main Authors: Ghesmoune, Mohammed, Azzag, Hanene, Benbernou, Salima, Lebbah, Mustapha, Duong, Tarn, Ouziri, Mourad
Format: Journal Article
Language:English
Published: New York Springer US 01-06-2017
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream , a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
AbstractList Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a 'story' of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream , a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
Author Lebbah, Mustapha
Ouziri, Mourad
Azzag, Hanene
Duong, Tarn
Benbernou, Salima
Ghesmoune, Mohammed
Author_xml – sequence: 1
  givenname: Mohammed
  surname: Ghesmoune
  fullname: Ghesmoune, Mohammed
  email: mohammed.ghesmoune@lipn.univ-paris13.fr
  organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City
– sequence: 2
  givenname: Hanene
  surname: Azzag
  fullname: Azzag, Hanene
  organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City
– sequence: 3
  givenname: Salima
  surname: Benbernou
  fullname: Benbernou, Salima
  organization: LIPADE, University of Paris Descartes, Sorbonne Paris City
– sequence: 4
  givenname: Mustapha
  surname: Lebbah
  fullname: Lebbah, Mustapha
  organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City
– sequence: 5
  givenname: Tarn
  surname: Duong
  fullname: Duong, Tarn
  organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City
– sequence: 6
  givenname: Mourad
  surname: Ouziri
  fullname: Ouziri, Mourad
  organization: LIPADE, University of Paris Descartes, Sorbonne Paris City
BookMark eNp1kE1LAzEQhoNUsK3-AG8L3oToTL428aatX1DwoucQd5OyZbupyVbQX--W9eBF5jAwPO878MzIpIudJ-Qc4QoByuuMYIyggIpKxRgVR2SKsuQUpJITMgWtJVXI5AmZ5bwBAKa0mpLLu2ZdLF3vboqQ4raoYtv6qm9iV_Sx-Gzy3rXNtzscTslxcG32Z797Tt4e7l8XT3T18vi8uF3RikvT0wo542gESAMmhFIzFLU3oqq5cVqJmoN2pVB18IKV6Icp3cCXgCKEd-BzcjH27lL82Pvc203cp254aVEbo1AwzgcKR6pKMefkg92lZuvSl0WwByV2VGIHJfagxIohw8ZMHthu7dOf5n9DP42NYt4
CitedBy_id crossref_primary_10_1080_13658816_2021_1885675
Cites_doi 10.1145/872757.872817
10.1137/1.9781611973082.3
10.1007/978-3-642-31537-4_21
10.1016/B978-012722442-8/50016-1
10.1137/1.9781611972764.29
10.1016/S0893-6080(02)00078-3
10.1007/s10618-011-0242-x
10.1007/s10115-010-0342-8
10.14778/2367502.2367550
10.1007/978-0-387-84858-7
10.21236/ADA575859
10.1016/S0168-1699(99)00046-0
10.1080/01621459.1971.10482356
10.1145/2588555.2610511
10.1109/ICPR.2008.4761768
10.1145/235968.233324
10.1007/978-3-319-26187-4_27
10.1109/CTS.2013.6567203
10.1007/978-3-319-18032-8_11
10.14778/2824032.2824083
10.2200/S00578ED1V01Y201404DTM040
10.1109/ICDE.2015.7113332
10.14778/2904121.2904123
10.1007/978-3-642-17746-0_20
10.1007/978-3-642-30284-8_32
10.1007/978-3-319-12637-1_26
10.1145/502512.502568
ContentType Journal Article
Copyright The Author(s) 2017
Machine Learning is a copyright of Springer, 2017.
Copyright_xml – notice: The Author(s) 2017
– notice: Machine Learning is a copyright of Springer, 2017.
DBID AAYXX
CITATION
3V.
7SC
7XB
88I
8AL
8AO
8FD
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L7M
L~C
L~D
M0N
M2P
P5Z
P62
PQEST
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1007/s10994-016-5622-4
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Databases
Technology Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
ProQuest Science Journals
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ProQuest Central China
ProQuest Central
ProQuest Central Korea
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
DatabaseTitleList Computer Science Database

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0565
EndPage 862
ExternalDocumentID 10_1007_s10994_016_5622_4
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29M
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
6TJ
78A
88I
8AO
8FE
8FG
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AABYN
AAFGU
AAHNG
AAIAL
AAJKR
AANZL
AAOBN
AAPBV
AARHV
AARTL
AATNV
AATVU
AAUYE
AAWCG
AAWWR
AAYFA
AAYIU
AAYQN
AAYTO
ABBBX
ABBXA
ABDZT
ABECU
ABFGW
ABFTV
ABHLI
ABHQN
ABIVO
ABJNI
ABJOX
ABKAS
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACBMV
ACBRV
ACBXY
ACBYP
ACGFS
ACGOD
ACHSB
ACHXU
ACIGE
ACIPQ
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACTTH
ACVWB
ACWMK
ADGRI
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMDM
ADOXG
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEEQQ
AEFIE
AEFTE
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AENEX
AEOHA
AEPYU
AESKC
AESTI
AETLH
AEVLU
AEVTX
AEXYK
AEYWE
AFEXP
AFGCZ
AFKRA
AFLOW
AFNRJ
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGBP
AGJBK
AGMZJ
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIIXL
AILAN
AIMYW
AITGF
AJBLW
AJDOV
AJRNO
AJZVZ
AKQUC
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BPHCQ
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GXS
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IHE
IJ-
IKXTQ
ITG
ITH
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Y
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K6V
K7-
KDC
KOV
KOW
LAK
LLZTM
M0N
M2P
M4Y
MA-
MVM
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9O
PF-
PQQKQ
PROAC
PT4
Q2X
QF4
QM1
QN7
QO4
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZC
RZE
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UNUBA
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WH7
WIP
WK8
XFK
XJT
YLTOR
Z45
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z83
Z85
Z86
Z87
Z88
Z8M
Z8N
Z8O
Z8P
Z8Q
Z8R
Z8S
Z8T
Z8U
Z8W
Z8Z
Z91
Z92
ZMTXR
AACDK
AAEOY
AAEWM
AAGNY
AAJBT
AASML
AAYXX
AAYZH
ABAKF
ACAOD
ACDTI
ACZOJ
AEFQL
AEMSY
AFBBN
AGQEE
AGRTI
AIGIU
CITATION
H13
7SC
7XB
8AL
8FD
8FK
JQ2
L7M
L~C
L~D
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c359t-c132319405909ff78214de94cd39a864d308a746dfe4271e1e17a1947014ffb03
IEDL.DBID AEJHL
ISSN 0885-6125
IngestDate Tue Nov 19 05:07:33 EST 2024
Thu Nov 21 21:10:18 EST 2024
Sat Dec 16 11:59:38 EST 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords Topological structure
GNG
Visualization
RDF
Semantic
Data fusion
Big data
Map-Reduce
Spark
Data stream clustering
Entity resolution
Micro-Batch streaming
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c359t-c132319405909ff78214de94cd39a864d308a746dfe4271e1e17a1947014ffb03
OpenAccessLink https://link.springer.com/content/pdf/10.1007/s10994-016-5622-4.pdf
PQID 1899614233
PQPubID 54194
PageCount 26
ParticipantIDs proquest_journals_1899614233
crossref_primary_10_1007_s10994_016_5622_4
springer_journals_10_1007_s10994_016_5622_4
PublicationCentury 2000
PublicationDate 2017-06-01
PublicationDateYYYYMMDD 2017-06-01
PublicationDate_xml – month: 06
  year: 2017
  text: 2017-06-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationTitle Machine learning
PublicationTitleAbbrev Mach Learn
PublicationYear 2017
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Zaharia, M., Das, T., Li, H., Shenker, S., & Stoica, I. (2012b). Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on hot topics in cloud Ccomputing, HotCloud’12 (pp. 10–10).
SubercazeJGravierCChevalierJLaforestFInferray: Fast in-memory RDF inferenceProceedings of the VLDB Endowment20169646847910.14778/2904121.2904123
Madden, S., Franklin, M. J. Hellerstein, J. M., & Hong, W. (2003). The design of an acquisitional query processor for sensor networks. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 491–502). ACM.
KohonenTSchroederMRHuangTSSelf-organizing maps20013Secaucus, NJSpringer New York Inc.0957.68097
Martinetz, T., & Schulten, K. (1991). A “neural-gas” network learns topologies. Artificial Neural Networks, I, 397–402.
Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., & Koziris, N. (2014). H2RDF+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{H}_{2}\text{ RDF }{+}$$\end{document}: An efficient data management system for big RDF graphs. In International conference on management of data, SIGMOD 2014, Snowbird, UT (pp. 909–912).
Ghesmoune, M., Azzag, H., & Lebbah, M. (2014). G-stream: Growing neural gas over data stream. In Neural information processing—21st international conference, ICONIP 2014, Kuching, Malaysia. Proceedings, Part I (pp. 207–214).
DongXLSrivastavaDBig data integrationSynthesis Lectures on Data Management201571119810.2200/S00578ED1V01Y201404DTM040
Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
HarbiRAbdelazizIKalnisPMamoulisNEvaluating SPARQL queries on massive RDF datasetsProceedings of the VLDB Endowment20158121848185910.14778/2824032.2824083
Fernandez, R. C., Migliavacca, M., Kalyvianaki, E., & Pietzuch, P. (2014). Making state explicit for imperative big data processing. In 2014 USENIX annual technical conference (USENIX ATC 14) (pp. 49–60).
Isaksson, C., Dunham, M. H., & Hahsler, M. (2012). SOStream: Self organizing density-based clustering over data stream. In MLDM. (pp. 264–278).
KranenPAssentIBaldaufCSeidlTThe ClusTree: Indexing micro-clusters for anytime stream miningKnowledge and Information Systems201129224927210.1007/s10115-010-0342-8
Cao, F., Ester, M., Qian, W., & Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In SDM (pp. 328–339).
MengXBradleyJYavuzBSparksEVenkataramanSLiuDFreemanJTsaiDBAmdeMOwenSXinDXinRFranklinMJZadehRZahariaMTalwalkarAMLlib: Machine learning in apache sparkJournal of Machine Learning Research201617112351241349112806589448
Halpin, H., Hayes, P., McCusker, J. P., McGuinness, D., & Thompson, H. S. (2010). When owl:sameAs isn’t the same: An analysis of identity in linked data. In Proceedings of the ISWC.
Gurajada, S., Seufert, S., Miliaraki, I., & Theobald, M. (2014). TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In SIGMOD conference (pp. 289–300).
Ghesmoune, M., Lebbah, M., & Azzag, H. (2015). Clustering over data streams based on growing neural gas. In Advances in knowledge discovery and data mining—19th Pacific-Asia conference, PAKDD 2015, Ho Chi Minh City, Proceedings, Part II (pp. 134–145).
de Andrade SilvaJFariaERBarrosRCHruschkaERde CarvalhoACGamaJData stream clustering: A surveyACM Computing Surveys2013461131288.68200
Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013). Addressing big data issues in scientific data infrastructure. In Collaboration technologies and systems (CTS), 2013 international conference on, IEEE (pp. 48–55).
Ailon, N., Jaiswal, R., & Monteleoni, C. (2009). Streaming k-means approximation. In Advances in neural information processing systems 22: 23rd annual conference on neural information processing systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, BC (pp. 10–18).
Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., & Tagiku, B. (2011). Streaming k-means on well-clusterable data. In Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA 2011, San Francisco, CA (pp. 26–40).
RandWObjective criteria for the evaluation of clustering methodsJournal of the American Statistical Association19716633684685010.1080/01621459.1971.10482356
Sledge, I. J., & Keller, J. M. (2008). Growing neural gas for temporal clustering. In 19th International conference on pattern recognition (ICPR 2008), Tampa, FL (pp. 1–4).
Aggarwal, C. C., Watson, T. J., Ctr, R., Han, J., Wang, J., & Yu, P. S. (2003). A framework for clustering evolving data streams. In VLDB (pp. 81–92).
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., et al. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on networked systems design and implementation, NSDI 2012, San Jose, CA, USA (pp. 15–28).
Goasdoué, F., Kaoudi, Z., Manolescu, I., Ruiz, J. A. Q., & Zampetakis, S. (2015). CliqueSquare: Flat plans for massively parallel RDF queries. In 31st IEEE international conference on data engineering, ICDE, Seoul (pp. 771–782).
StrehlAGhoshJCluster ensembles—A knowledge reuse framework for combining multiple partitionsJournal of Machine Learning Research200235836171084.68759
Benbernou, S., Huang, X., & Ouziri, M. (2015). Fusion of Big RDF data: A semantic entity resolution and query rewriting-based inference approach. In WISE (2) (pp. 300–30).
Street, W. N., & Kim, Y. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 377–382). ACM.
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for very large databases. In SIGMOD conference (pp. 103–114).
Knoblock, C. A., Szekely, P.A., Ambite, J. L., Goel, A., Gupta, S., Lerman, K., et al. (2012). Semi-automatically Mapping Structured Sources into the Semantic Web. In The Semantic Web: Research and Applications—9th Extended Semantic Web Conference, ESWC, 2012, Heraklion, Crete.
HastieTTibshiraniRFriedmanJThe elements of statistical learning: Data mining, inference, and prediction20092New YorkSpringer10.1007/978-0-387-84858-71273.62005
Shindler, M., Wong, A., & Meyerson, A. (2011). Fast and accurate k-means for large datasets. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada (pp. 2375–2383).
BlackardJADeanDJComparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variablesComputers and Electronics in Agriculture199924313115110.1016/S0168-1699(99)00046-0
Stolfo, J. (2000). Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection. In Results from the JAM Project by Salvatore.
ForestieroAPizzutiCSpezzanoGA single pass algorithm for clustering evolving data streams based on swarm intelligenceData Mining and Knowledge Discovery2013261126300977110.1007/s10618-011-0242-x
Hang Du, J., Wang, H., Ni, Y., & Yu, Y. (2012). HadoopRDF: A scalable semantic data analytical engine. In Intelligent computing theories and applications—8th international Conference, ICIC 2012, Huangshan, China. Proceedings (pp. 633–641).
Therneau, T., Atkinson, B., & Ripley, B. (2015). rpart: Recursive partitioning and regression trees. R package version 4.1-10. https://CRAN.R-project.org/package=rpart.
MarslandSShapiroJNehmzowUA self-organising network that grows when requiredNeural Networks2002158–91041105810.1016/S0893-6080(02)00078-3
Marz, N., & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co.
Bolanos, M., Forrest, J., & Hahsler, M. (2014). stream: Infrastructure for Data Stream Mining, r package version 0.2-0. http://CRAN.R-project.org/package=stream.
Endrullis, S., Thor, A., & Rahm, E. (2012). WETSUIT: An efficient mashup tool for searching and fusing web entities. Proceedings of the VLDB Endowment, 5(12). 1970–1973.
Wache, H., Vgele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., & Hbner, S. (2001). Ontology-based integration of information—A survey of existing approaches. In IJCAI-01 workshop: Ontologies and information sharing (pp. 108–117).
5622_CR24
5622_CR43
5622_CR22
X Meng (5622_CR31) 2016; 17
5622_CR44
5622_CR41
5622_CR42
5622_CR40
(5622_CR23) 2001
T Hastie (5622_CR21) 2009
JA Blackard (5622_CR4) 1999; 24
R Harbi (5622_CR20) 2015; 8
J Andrade Silva de (5622_CR8) 2013; 46
A Forestiero (5622_CR13) 2013; 26
A Strehl (5622_CR38) 2002; 3
5622_CR27
5622_CR28
5622_CR26
5622_CR12
P Kranen (5622_CR25) 2011; 29
5622_CR34
5622_CR35
5622_CR10
5622_CR32
5622_CR11
5622_CR6
5622_CR30
5622_CR5
XL Dong (5622_CR9) 2015; 7
5622_CR7
5622_CR2
5622_CR1
5622_CR3
S Marsland (5622_CR29) 2002; 15
J Subercaze (5622_CR39) 2016; 9
W Rand (5622_CR33) 1971; 66
5622_CR18
5622_CR19
5622_CR16
5622_CR17
5622_CR14
5622_CR36
5622_CR15
5622_CR37
References_xml – ident: 5622_CR27
  doi: 10.1145/872757.872817
– ident: 5622_CR6
  doi: 10.1137/1.9781611973082.3
– ident: 5622_CR22
  doi: 10.1007/978-3-642-31537-4_21
– ident: 5622_CR1
  doi: 10.1016/B978-012722442-8/50016-1
– ident: 5622_CR7
  doi: 10.1137/1.9781611972764.29
– volume: 15
  start-page: 1041
  issue: 8–9
  year: 2002
  ident: 5622_CR29
  publication-title: Neural Networks
  doi: 10.1016/S0893-6080(02)00078-3
  contributor:
    fullname: S Marsland
– ident: 5622_CR12
– volume: 26
  start-page: 1
  issue: 1
  year: 2013
  ident: 5622_CR13
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1007/s10618-011-0242-x
  contributor:
    fullname: A Forestiero
– ident: 5622_CR2
– volume: 29
  start-page: 249
  issue: 2
  year: 2011
  ident: 5622_CR25
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-010-0342-8
  contributor:
    fullname: P Kranen
– ident: 5622_CR11
  doi: 10.14778/2367502.2367550
– volume-title: The elements of statistical learning: Data mining, inference, and prediction
  year: 2009
  ident: 5622_CR21
  doi: 10.1007/978-0-387-84858-7
  contributor:
    fullname: T Hastie
– ident: 5622_CR28
– ident: 5622_CR41
– ident: 5622_CR43
  doi: 10.21236/ADA575859
– volume: 24
  start-page: 131
  issue: 3
  year: 1999
  ident: 5622_CR4
  publication-title: Computers and Electronics in Agriculture
  doi: 10.1016/S0168-1699(99)00046-0
  contributor:
    fullname: JA Blackard
– ident: 5622_CR26
– volume: 66
  start-page: 846
  issue: 336
  year: 1971
  ident: 5622_CR33
  publication-title: Journal of the American Statistical Association
  doi: 10.1080/01621459.1971.10482356
  contributor:
    fullname: W Rand
– ident: 5622_CR17
  doi: 10.1145/2588555.2610511
– ident: 5622_CR35
  doi: 10.1109/ICPR.2008.4761768
– ident: 5622_CR44
  doi: 10.1145/235968.233324
– ident: 5622_CR3
  doi: 10.1007/978-3-319-26187-4_27
– ident: 5622_CR34
– ident: 5622_CR10
  doi: 10.1109/CTS.2013.6567203
– volume-title: Self-organizing maps
  year: 2001
  ident: 5622_CR23
– ident: 5622_CR36
– ident: 5622_CR30
– ident: 5622_CR15
  doi: 10.1007/978-3-319-18032-8_11
– ident: 5622_CR19
– volume: 8
  start-page: 1848
  issue: 12
  year: 2015
  ident: 5622_CR20
  publication-title: Proceedings of the VLDB Endowment
  doi: 10.14778/2824032.2824083
  contributor:
    fullname: R Harbi
– volume: 7
  start-page: 1
  issue: 1
  year: 2015
  ident: 5622_CR9
  publication-title: Synthesis Lectures on Data Management
  doi: 10.2200/S00578ED1V01Y201404DTM040
  contributor:
    fullname: XL Dong
– ident: 5622_CR16
  doi: 10.1109/ICDE.2015.7113332
– ident: 5622_CR32
– volume: 9
  start-page: 468
  issue: 6
  year: 2016
  ident: 5622_CR39
  publication-title: Proceedings of the VLDB Endowment
  doi: 10.14778/2904121.2904123
  contributor:
    fullname: J Subercaze
– ident: 5622_CR18
  doi: 10.1007/978-3-642-17746-0_20
– volume: 3
  start-page: 583
  year: 2002
  ident: 5622_CR38
  publication-title: Journal of Machine Learning Research
  contributor:
    fullname: A Strehl
– ident: 5622_CR24
  doi: 10.1007/978-3-642-30284-8_32
– ident: 5622_CR14
  doi: 10.1007/978-3-319-12637-1_26
– volume: 17
  start-page: 1235
  issue: 1
  year: 2016
  ident: 5622_CR31
  publication-title: Journal of Machine Learning Research
  contributor:
    fullname: X Meng
– volume: 46
  start-page: 13
  issue: 1
  year: 2013
  ident: 5622_CR8
  publication-title: ACM Computing Surveys
  contributor:
    fullname: J Andrade Silva de
– ident: 5622_CR42
– ident: 5622_CR40
– ident: 5622_CR5
– ident: 5622_CR37
  doi: 10.1145/502512.502568
SSID ssj0002686
Score 2.2678485
Snippet Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Publisher
StartPage 837
SubjectTerms Artificial Intelligence
Big Data
Clustering
Clusters
Computer Science
Control
Data acquisition
Data collection
Data integration
Data management
Data transmission
Mechatronics
Multisensor fusion
Natural Language Processing (NLP)
Robotics
Simulation and Modeling
Visualization
Title Big Data: from collection to visualization
URI https://link.springer.com/article/10.1007/s10994-016-5622-4
https://www.proquest.com/docview/1899614233
Volume 106
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEJ4IXLyIz4ii2YMnzBq6u-0WbyggGuNFTLw1-6ohJsUI-PudLS2g0YOmhzbpZtJM57Uz-80AnIU61kaHlqL111Q4GVFtjKXKBQofRMCMBycPH-XDc9zr-zY5bJm6yF4vyopkbqjXsG55F9sgouiyGRUVqKHrCVG2a93-3fB-aX9ZlM93RPUJqfffZS3zJyJfvdEqxPxWFc2dzaD-n8_chq0itCTdhSzswIbLdqFejm0ghRbvQetq_EJ6aqYuiQeXEC8K-YGsjMwm5GM89TjLBTpzH54G_dH1kBYjE6jhYWdGDW4uUamEh5T6bGzMAmFdRxjLOyqOhOXtWEkR2dQJJgOHl1S4XuJOKU11mx9ANZtk7hBIxFMmteSRMEhPpnHqYovhXyiVwP-oG9AqWZe8LTpjJKseyJ4LiT895rmQiAY0S-YmhZJMkwD3ehgdMM4bcF5yc-31b8SO_rT6GDaZd8V55qQJ1dn73J1AZWrnp4Xk-Pvt6GbwCeUbvI0
link.rule.ids 315,782,786,27933,27934,41073,42142,48344,48347,48357,49649,49652,49662,52153
linkProvider Springer Nature
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFH8ROOhF_IxT1B48YZqwtluHNxQIRuQiJnBa-jXDZRgZ_v22YxM0etDssmTNy_L6Ptv3ew_gKpCRVDLQ2Fp_iZnhIZZKaSyML-wL84ly4OTBEx9Nom7PtcmhJRYmr3YvryRzS70Bdsvb2Pohtj6bYFaBmmt2TqpQ60ym0-6nASZhPuDR6k-AnQMvLzN_IvLVHa1jzG_Xorm36df_9Z97sFsEl6izkoZ92DLpAdTLwQ2o0ONDaN7OXlBXZOIGOXgJcsKQl2SlKJuj99nCIS1X-MwjeO73xncDXAxNwIoG7Qwrm15atWIOVOrOYyPiM23aTGnaFlHING1FgrNQJ4YR7hv7cGHXc5srJYls0WOopvPUnAAKaUK45DRkytLjSZSYSNsAMOCC2Z2UHjRL3sWvq94Y8boLsuNC7OrHHBdi5kGj5G5cqMki9m22Z-MDQqkH1yU3Nz7_Ruz0T6svYXswfhzGw_vRwxnsEOeY83OUBlSzt6U5h8pCLy8KMfoAqUy_MQ
linkToPdf http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwED_cBuKL8xOnU_Pg0yRsTdKm80WmW91QhuAE30qapLKXbrjOv9-kH26KPoj0pdBwlMtd7iP3uwO4cCM_kpGrsDn9I8w093AkpcJCO8K8MIdIC04ePvHxi98f2DY51yUWJqt2L68kc0yD7dKUpO25ittrwLespa3jYWO_CWYVqNmsGKtCrTea3AWfhzHxsmGPRpdcbI15ebH5E5Gvpmnlb367Is0sT1D_9z_vwHbhdKJeLiW7sKGTPaiXAx1Qod_70LqZvqK-SMUVsrATZIUkK9VKUDpD79OFRWDmuM0DeA4Gk9shLoYpYEndboqlCTuNujELNrV5Wp84TOkuk4p2he8xRTu-4MxTsWaEO9o8XJj13MRQcRx16CFUk1mijwB5NCY84tRj0tDjsR9rXxnH0OWCmR2OGtAq-RjO854Z4ao7suVCaOvKLBdC1oBmyemwUJ9F6Jgo0PgNhNIGXJacXfv8G7HjP60-h83HfhA-jMb3J7BFrL3O0itNqKZvS30KlYVanhUS9QGBB8fI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data%3A+from+collection+to+visualization&rft.jtitle=Machine+learning&rft.au=Ghesmoune%2C+Mohammed&rft.au=Azzag%2C+Hanene&rft.au=Benbernou%2C+Salima&rft.au=Lebbah%2C+Mustapha&rft.date=2017-06-01&rft.pub=Springer+US&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=106&rft.issue=6&rft.spage=837&rft.epage=862&rft_id=info:doi/10.1007%2Fs10994-016-5622-4&rft.externalDocID=10_1007_s10994_016_5622_4
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon