Big Data: from collection to visualization

Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘sto...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning Vol. 106; no. 6; pp. 837 - 862
Main Authors:	Ghesmoune, Mohammed, Azzag, Hanene, Benbernou, Salima, Lebbah, Mustapha, Duong, Tarn, Ouziri, Mourad
Format:	Journal Article
Language:	English
Published:	New York Springer US 01-06-2017 Springer Nature B.V
Subjects:	Artificial Intelligence Big Data Clustering Clusters Computer Science Control Data acquisition Data collection Data integration Data management Data transmission Mechatronics Multisensor fusion Natural Language Processing (NLP) Robotics Simulation and Modeling Visualization Topological structure GNG Visualization RDF Semantic Data fusion Big data Map-Reduce Spark Data stream clustering Entity resolution Micro-Batch streaming
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream , a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
AbstractList	Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a 'story' of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data. Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream , a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
Author	Lebbah, Mustapha Ouziri, Mourad Azzag, Hanene Duong, Tarn Benbernou, Salima Ghesmoune, Mohammed
Author_xml	– sequence: 1 givenname: Mohammed surname: Ghesmoune fullname: Ghesmoune, Mohammed email: mohammed.ghesmoune@lipn.univ-paris13.fr organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City – sequence: 2 givenname: Hanene surname: Azzag fullname: Azzag, Hanene organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City – sequence: 3 givenname: Salima surname: Benbernou fullname: Benbernou, Salima organization: LIPADE, University of Paris Descartes, Sorbonne Paris City – sequence: 4 givenname: Mustapha surname: Lebbah fullname: Lebbah, Mustapha organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City – sequence: 5 givenname: Tarn surname: Duong fullname: Duong, Tarn organization: LIPN-UMR 7030 - CNRS, University of Paris 13, Sorbonne Paris City – sequence: 6 givenname: Mourad surname: Ouziri fullname: Ouziri, Mourad organization: LIPADE, University of Paris Descartes, Sorbonne Paris City
BookMark	eNp1kE1LAzEQhoNUsK3-AG8L3oToTL428aatX1DwoucQd5OyZbupyVbQX--W9eBF5jAwPO878MzIpIudJ-Qc4QoByuuMYIyggIpKxRgVR2SKsuQUpJITMgWtJVXI5AmZ5bwBAKa0mpLLu2ZdLF3vboqQ4raoYtv6qm9iV_Sx-Gzy3rXNtzscTslxcG32Z797Tt4e7l8XT3T18vi8uF3RikvT0wo542gESAMmhFIzFLU3oqq5cVqJmoN2pVB18IKV6Icp3cCXgCKEd-BzcjH27lL82Pvc203cp254aVEbo1AwzgcKR6pKMefkg92lZuvSl0WwByV2VGIHJfagxIohw8ZMHthu7dOf5n9DP42NYt4
CitedBy_id	crossref_primary_10_1080_13658816_2021_1885675
Cites_doi	10.1145/872757.872817 10.1137/1.9781611973082.3 10.1007/978-3-642-31537-4_21 10.1016/B978-012722442-8/50016-1 10.1137/1.9781611972764.29 10.1016/S0893-6080(02)00078-3 10.1007/s10618-011-0242-x 10.1007/s10115-010-0342-8 10.14778/2367502.2367550 10.1007/978-0-387-84858-7 10.21236/ADA575859 10.1016/S0168-1699(99)00046-0 10.1080/01621459.1971.10482356 10.1145/2588555.2610511 10.1109/ICPR.2008.4761768 10.1145/235968.233324 10.1007/978-3-319-26187-4_27 10.1109/CTS.2013.6567203 10.1007/978-3-319-18032-8_11 10.14778/2824032.2824083 10.2200/S00578ED1V01Y201404DTM040 10.1109/ICDE.2015.7113332 10.14778/2904121.2904123 10.1007/978-3-642-17746-0_20 10.1007/978-3-642-30284-8_32 10.1007/978-3-319-12637-1_26 10.1145/502512.502568
ContentType	Journal Article
Copyright	The Author(s) 2017 Machine Learning is a copyright of Springer, 2017.
Copyright_xml	– notice: The Author(s) 2017 – notice: Machine Learning is a copyright of Springer, 2017.
DBID	AAYXX CITATION 3V. 7SC 7XB 88I 8AL 8AO 8FD 8FE 8FG 8FK ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L7M L~C L~D M0N M2P P5Z P62 PQEST PQQKQ PQUKI PRINS Q9U
DOI	10.1007/s10994-016-5622-4
DatabaseName	CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Databases Technology Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Science Journals Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic
DatabaseTitle	CrossRef Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest Central Korea Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Science Journals (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni)
DatabaseTitleList	Computer Science Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-0565
EndPage	862
ExternalDocumentID	10_1007_s10994_016_5622_4
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29M 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 6TJ 78A 88I 8AO 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AABYN AAFGU AAHNG AAIAL AAJKR AANZL AAOBN AAPBV AARHV AARTL AATNV AATVU AAUYE AAWCG AAWWR AAYFA AAYIU AAYQN AAYTO ABBBX ABBXA ABDZT ABECU ABFGW ABFTV ABHLI ABHQN ABIVO ABJNI ABJOX ABKAS ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACBMV ACBRV ACBXY ACBYP ACGFS ACGOD ACHSB ACHXU ACIGE ACIPQ ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACTTH ACVWB ACWMK ADGRI ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMDM ADOXG ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEEQQ AEFIE AEFTE AEGAL AEGNC AEJHL AEJRE AEKMD AENEX AEOHA AEPYU AESKC AESTI AETLH AEVLU AEVTX AEXYK AEYWE AFEXP AFGCZ AFKRA AFLOW AFNRJ AFQWF AFWTZ AFZKB AGAYW AGDGC AGGBP AGJBK AGMZJ AGQMX AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIIXL AILAN AIMYW AITGF AJBLW AJDOV AJRNO AJZVZ AKQUC ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BPHCQ CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GXS HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITG ITH ITM IWAJR IXC IZIGR IZQ I~X I~Y I~Z J-C J0Z JBSCW JCJTX JZLTJ K6V K7- KDC KOV KOW LAK LLZTM M0N M2P M4Y MA- MVM N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF- PQQKQ PROAC PT4 Q2X QF4 QM1 QN7 QO4 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZC RZE S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TAE TEORI TN5 TSG TSK TSV TUC TUS U2A UG4 UNUBA UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WH7 WIP WK8 XFK XJT YLTOR Z45 Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z83 Z85 Z86 Z87 Z88 Z8M Z8N Z8O Z8P Z8Q Z8R Z8S Z8T Z8U Z8W Z8Z Z91 Z92 ZMTXR AACDK AAEOY AAEWM AAGNY AAJBT AASML AAYXX AAYZH ABAKF ACAOD ACDTI ACZOJ AEFQL AEMSY AFBBN AGQEE AGRTI AIGIU CITATION H13 7SC 7XB 8AL 8FD 8FK JQ2 L7M L~C L~D PQEST PQUKI PRINS Q9U
ID	FETCH-LOGICAL-c359t-c132319405909ff78214de94cd39a864d308a746dfe4271e1e17a1947014ffb03
IEDL.DBID	AEJHL
ISSN	0885-6125
IngestDate	Tue Nov 19 05:07:33 EST 2024 Thu Nov 21 21:10:18 EST 2024 Sat Dec 16 11:59:38 EST 2023
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	6
Keywords	Topological structure GNG Visualization RDF Semantic Data fusion Big data Map-Reduce Spark Data stream clustering Entity resolution Micro-Batch streaming
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c359t-c132319405909ff78214de94cd39a864d308a746dfe4271e1e17a1947014ffb03
OpenAccessLink	https://link.springer.com/content/pdf/10.1007/s10994-016-5622-4.pdf
PQID	1899614233
PQPubID	54194
PageCount	26
ParticipantIDs	proquest_journals_1899614233 crossref_primary_10_1007_s10994_016_5622_4 springer_journals_10_1007_s10994_016_5622_4
PublicationCentury	2000
PublicationDate	2017-06-01
PublicationDateYYYYMMDD	2017-06-01
PublicationDate_xml	– month: 06 year: 2017 text: 2017-06-01 day: 01
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Dordrecht
PublicationTitle	Machine learning
PublicationTitleAbbrev	Mach Learn
PublicationYear	2017
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	Zaharia, M., Das, T., Li, H., Shenker, S., & Stoica, I. (2012b). Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on hot topics in cloud Ccomputing, HotCloud’12 (pp. 10–10). SubercazeJGravierCChevalierJLaforestFInferray: Fast in-memory RDF inferenceProceedings of the VLDB Endowment20169646847910.14778/2904121.2904123 Madden, S., Franklin, M. J. Hellerstein, J. M., & Hong, W. (2003). The design of an acquisitional query processor for sensor networks. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 491–502). ACM. KohonenTSchroederMRHuangTSSelf-organizing maps20013Secaucus, NJSpringer New York Inc.0957.68097 Martinetz, T., & Schulten, K. (1991). A “neural-gas” network learns topologies. Artificial Neural Networks, I, 397–402. Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., & Koziris, N. (2014). H2RDF+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{H}_{2}\text{ RDF }{+}$$\end{document}: An efficient data management system for big RDF graphs. In International conference on management of data, SIGMOD 2014, Snowbird, UT (pp. 909–912). Ghesmoune, M., Azzag, H., & Lebbah, M. (2014). G-stream: Growing neural gas over data stream. In Neural information processing—21st international conference, ICONIP 2014, Kuching, Malaysia. Proceedings, Part I (pp. 207–214). DongXLSrivastavaDBig data integrationSynthesis Lectures on Data Management201571119810.2200/S00578ED1V01Y201404DTM040 Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. HarbiRAbdelazizIKalnisPMamoulisNEvaluating SPARQL queries on massive RDF datasetsProceedings of the VLDB Endowment20158121848185910.14778/2824032.2824083 Fernandez, R. C., Migliavacca, M., Kalyvianaki, E., & Pietzuch, P. (2014). Making state explicit for imperative big data processing. In 2014 USENIX annual technical conference (USENIX ATC 14) (pp. 49–60). Isaksson, C., Dunham, M. H., & Hahsler, M. (2012). SOStream: Self organizing density-based clustering over data stream. In MLDM. (pp. 264–278). KranenPAssentIBaldaufCSeidlTThe ClusTree: Indexing micro-clusters for anytime stream miningKnowledge and Information Systems201129224927210.1007/s10115-010-0342-8 Cao, F., Ester, M., Qian, W., & Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In SDM (pp. 328–339). MengXBradleyJYavuzBSparksEVenkataramanSLiuDFreemanJTsaiDBAmdeMOwenSXinDXinRFranklinMJZadehRZahariaMTalwalkarAMLlib: Machine learning in apache sparkJournal of Machine Learning Research201617112351241349112806589448 Halpin, H., Hayes, P., McCusker, J. P., McGuinness, D., & Thompson, H. S. (2010). When owl:sameAs isn’t the same: An analysis of identity in linked data. In Proceedings of the ISWC. Gurajada, S., Seufert, S., Miliaraki, I., & Theobald, M. (2014). TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In SIGMOD conference (pp. 289–300). Ghesmoune, M., Lebbah, M., & Azzag, H. (2015). Clustering over data streams based on growing neural gas. In Advances in knowledge discovery and data mining—19th Pacific-Asia conference, PAKDD 2015, Ho Chi Minh City, Proceedings, Part II (pp. 134–145). de Andrade SilvaJFariaERBarrosRCHruschkaERde CarvalhoACGamaJData stream clustering: A surveyACM Computing Surveys2013461131288.68200 Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013). Addressing big data issues in scientific data infrastructure. In Collaboration technologies and systems (CTS), 2013 international conference on, IEEE (pp. 48–55). Ailon, N., Jaiswal, R., & Monteleoni, C. (2009). Streaming k-means approximation. In Advances in neural information processing systems 22: 23rd annual conference on neural information processing systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, BC (pp. 10–18). Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., & Tagiku, B. (2011). Streaming k-means on well-clusterable data. In Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA 2011, San Francisco, CA (pp. 26–40). RandWObjective criteria for the evaluation of clustering methodsJournal of the American Statistical Association19716633684685010.1080/01621459.1971.10482356 Sledge, I. J., & Keller, J. M. (2008). Growing neural gas for temporal clustering. In 19th International conference on pattern recognition (ICPR 2008), Tampa, FL (pp. 1–4). Aggarwal, C. C., Watson, T. J., Ctr, R., Han, J., Wang, J., & Yu, P. S. (2003). A framework for clustering evolving data streams. In VLDB (pp. 81–92). Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., et al. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on networked systems design and implementation, NSDI 2012, San Jose, CA, USA (pp. 15–28). Goasdoué, F., Kaoudi, Z., Manolescu, I., Ruiz, J. A. Q., & Zampetakis, S. (2015). CliqueSquare: Flat plans for massively parallel RDF queries. In 31st IEEE international conference on data engineering, ICDE, Seoul (pp. 771–782). StrehlAGhoshJCluster ensembles—A knowledge reuse framework for combining multiple partitionsJournal of Machine Learning Research200235836171084.68759 Benbernou, S., Huang, X., & Ouziri, M. (2015). Fusion of Big RDF data: A semantic entity resolution and query rewriting-based inference approach. In WISE (2) (pp. 300–30). Street, W. N., & Kim, Y. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 377–382). ACM. Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for very large databases. In SIGMOD conference (pp. 103–114). Knoblock, C. A., Szekely, P.A., Ambite, J. L., Goel, A., Gupta, S., Lerman, K., et al. (2012). Semi-automatically Mapping Structured Sources into the Semantic Web. In The Semantic Web: Research and Applications—9th Extended Semantic Web Conference, ESWC, 2012, Heraklion, Crete. HastieTTibshiraniRFriedmanJThe elements of statistical learning: Data mining, inference, and prediction20092New YorkSpringer10.1007/978-0-387-84858-71273.62005 Shindler, M., Wong, A., & Meyerson, A. (2011). Fast and accurate k-means for large datasets. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada (pp. 2375–2383). BlackardJADeanDJComparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variablesComputers and Electronics in Agriculture199924313115110.1016/S0168-1699(99)00046-0 Stolfo, J. (2000). Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection. In Results from the JAM Project by Salvatore. ForestieroAPizzutiCSpezzanoGA single pass algorithm for clustering evolving data streams based on swarm intelligenceData Mining and Knowledge Discovery2013261126300977110.1007/s10618-011-0242-x Hang Du, J., Wang, H., Ni, Y., & Yu, Y. (2012). HadoopRDF: A scalable semantic data analytical engine. In Intelligent computing theories and applications—8th international Conference, ICIC 2012, Huangshan, China. Proceedings (pp. 633–641). Therneau, T., Atkinson, B., & Ripley, B. (2015). rpart: Recursive partitioning and regression trees. R package version 4.1-10. https://CRAN.R-project.org/package=rpart. MarslandSShapiroJNehmzowUA self-organising network that grows when requiredNeural Networks2002158–91041105810.1016/S0893-6080(02)00078-3 Marz, N., & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co. Bolanos, M., Forrest, J., & Hahsler, M. (2014). stream: Infrastructure for Data Stream Mining, r package version 0.2-0. http://CRAN.R-project.org/package=stream. Endrullis, S., Thor, A., & Rahm, E. (2012). WETSUIT: An efficient mashup tool for searching and fusing web entities. Proceedings of the VLDB Endowment, 5(12). 1970–1973. Wache, H., Vgele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., & Hbner, S. (2001). Ontology-based integration of information—A survey of existing approaches. In IJCAI-01 workshop: Ontologies and information sharing (pp. 108–117). 5622_CR24 5622_CR43 5622_CR22 X Meng (5622_CR31) 2016; 17 5622_CR44 5622_CR41 5622_CR42 5622_CR40 (5622_CR23) 2001 T Hastie (5622_CR21) 2009 JA Blackard (5622_CR4) 1999; 24 R Harbi (5622_CR20) 2015; 8 J Andrade Silva de (5622_CR8) 2013; 46 A Forestiero (5622_CR13) 2013; 26 A Strehl (5622_CR38) 2002; 3 5622_CR27 5622_CR28 5622_CR26 5622_CR12 P Kranen (5622_CR25) 2011; 29 5622_CR34 5622_CR35 5622_CR10 5622_CR32 5622_CR11 5622_CR6 5622_CR30 5622_CR5 XL Dong (5622_CR9) 2015; 7 5622_CR7 5622_CR2 5622_CR1 5622_CR3 S Marsland (5622_CR29) 2002; 15 J Subercaze (5622_CR39) 2016; 9 W Rand (5622_CR33) 1971; 66 5622_CR18 5622_CR19 5622_CR16 5622_CR17 5622_CR14 5622_CR36 5622_CR15 5622_CR37
References_xml	– ident: 5622_CR27 doi: 10.1145/872757.872817 – ident: 5622_CR6 doi: 10.1137/1.9781611973082.3 – ident: 5622_CR22 doi: 10.1007/978-3-642-31537-4_21 – ident: 5622_CR1 doi: 10.1016/B978-012722442-8/50016-1 – ident: 5622_CR7 doi: 10.1137/1.9781611972764.29 – volume: 15 start-page: 1041 issue: 8–9 year: 2002 ident: 5622_CR29 publication-title: Neural Networks doi: 10.1016/S0893-6080(02)00078-3 contributor: fullname: S Marsland – ident: 5622_CR12 – volume: 26 start-page: 1 issue: 1 year: 2013 ident: 5622_CR13 publication-title: Data Mining and Knowledge Discovery doi: 10.1007/s10618-011-0242-x contributor: fullname: A Forestiero – ident: 5622_CR2 – volume: 29 start-page: 249 issue: 2 year: 2011 ident: 5622_CR25 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-010-0342-8 contributor: fullname: P Kranen – ident: 5622_CR11 doi: 10.14778/2367502.2367550 – volume-title: The elements of statistical learning: Data mining, inference, and prediction year: 2009 ident: 5622_CR21 doi: 10.1007/978-0-387-84858-7 contributor: fullname: T Hastie – ident: 5622_CR28 – ident: 5622_CR41 – ident: 5622_CR43 doi: 10.21236/ADA575859 – volume: 24 start-page: 131 issue: 3 year: 1999 ident: 5622_CR4 publication-title: Computers and Electronics in Agriculture doi: 10.1016/S0168-1699(99)00046-0 contributor: fullname: JA Blackard – ident: 5622_CR26 – volume: 66 start-page: 846 issue: 336 year: 1971 ident: 5622_CR33 publication-title: Journal of the American Statistical Association doi: 10.1080/01621459.1971.10482356 contributor: fullname: W Rand – ident: 5622_CR17 doi: 10.1145/2588555.2610511 – ident: 5622_CR35 doi: 10.1109/ICPR.2008.4761768 – ident: 5622_CR44 doi: 10.1145/235968.233324 – ident: 5622_CR3 doi: 10.1007/978-3-319-26187-4_27 – ident: 5622_CR34 – ident: 5622_CR10 doi: 10.1109/CTS.2013.6567203 – volume-title: Self-organizing maps year: 2001 ident: 5622_CR23 – ident: 5622_CR36 – ident: 5622_CR30 – ident: 5622_CR15 doi: 10.1007/978-3-319-18032-8_11 – ident: 5622_CR19 – volume: 8 start-page: 1848 issue: 12 year: 2015 ident: 5622_CR20 publication-title: Proceedings of the VLDB Endowment doi: 10.14778/2824032.2824083 contributor: fullname: R Harbi – volume: 7 start-page: 1 issue: 1 year: 2015 ident: 5622_CR9 publication-title: Synthesis Lectures on Data Management doi: 10.2200/S00578ED1V01Y201404DTM040 contributor: fullname: XL Dong – ident: 5622_CR16 doi: 10.1109/ICDE.2015.7113332 – ident: 5622_CR32 – volume: 9 start-page: 468 issue: 6 year: 2016 ident: 5622_CR39 publication-title: Proceedings of the VLDB Endowment doi: 10.14778/2904121.2904123 contributor: fullname: J Subercaze – ident: 5622_CR18 doi: 10.1007/978-3-642-17746-0_20 – volume: 3 start-page: 583 year: 2002 ident: 5622_CR38 publication-title: Journal of Machine Learning Research contributor: fullname: A Strehl – ident: 5622_CR24 doi: 10.1007/978-3-642-30284-8_32 – ident: 5622_CR14 doi: 10.1007/978-3-319-12637-1_26 – volume: 17 start-page: 1235 issue: 1 year: 2016 ident: 5622_CR31 publication-title: Journal of Machine Learning Research contributor: fullname: X Meng – volume: 46 start-page: 13 issue: 1 year: 2013 ident: 5622_CR8 publication-title: ACM Computing Surveys contributor: fullname: J Andrade Silva de – ident: 5622_CR42 – ident: 5622_CR40 – ident: 5622_CR5 – ident: 5622_CR37 doi: 10.1145/502512.502568
SSID	ssj0002686
Score	2.2678485
Snippet	Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously...
SourceID	proquest crossref springer
SourceType	Aggregation Database Publisher
StartPage	837
SubjectTerms	Artificial Intelligence Big Data Clustering Clusters Computer Science Control Data acquisition Data collection Data integration Data management Data transmission Mechatronics Multisensor fusion Natural Language Processing (NLP) Robotics Simulation and Modeling Visualization
Title	Big Data: from collection to visualization
URI	https://link.springer.com/article/10.1007/s10994-016-5622-4 https://www.proquest.com/docview/1899614233
Volume	106
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEJ4IXLyIz4ii2YMnzBq6u-0WbyggGuNFTLw1-6ohJsUI-PudLS2g0YOmhzbpZtJM57Uz-80AnIU61kaHlqL111Q4GVFtjKXKBQofRMCMBycPH-XDc9zr-zY5bJm6yF4vyopkbqjXsG55F9sgouiyGRUVqKHrCVG2a93-3fB-aX9ZlM93RPUJqfffZS3zJyJfvdEqxPxWFc2dzaD-n8_chq0itCTdhSzswIbLdqFejm0ghRbvQetq_EJ6aqYuiQeXEC8K-YGsjMwm5GM89TjLBTpzH54G_dH1kBYjE6jhYWdGDW4uUamEh5T6bGzMAmFdRxjLOyqOhOXtWEkR2dQJJgOHl1S4XuJOKU11mx9ANZtk7hBIxFMmteSRMEhPpnHqYovhXyiVwP-oG9AqWZe8LTpjJKseyJ4LiT895rmQiAY0S-YmhZJMkwD3ehgdMM4bcF5yc-31b8SO_rT6GDaZd8V55qQJ1dn73J1AZWrnp4Xk-Pvt6GbwCeUbvI0
link.rule.ids	315,782,786,27933,27934,41073,42142,48344,48347,48357,49649,49652,49662,52153
linkProvider	Springer Nature
linkToHtml	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFH8ROOhF_IxT1B48YZqwtluHNxQIRuQiJnBa-jXDZRgZ_v22YxM0etDssmTNy_L6Ptv3ew_gKpCRVDLQ2Fp_iZnhIZZKaSyML-wL84ly4OTBEx9Nom7PtcmhJRYmr3YvryRzS70Bdsvb2Pohtj6bYFaBmmt2TqpQ60ym0-6nASZhPuDR6k-AnQMvLzN_IvLVHa1jzG_Xorm36df_9Z97sFsEl6izkoZ92DLpAdTLwQ2o0ONDaN7OXlBXZOIGOXgJcsKQl2SlKJuj99nCIS1X-MwjeO73xncDXAxNwIoG7Qwrm15atWIOVOrOYyPiM23aTGnaFlHING1FgrNQJ4YR7hv7cGHXc5srJYls0WOopvPUnAAKaUK45DRkytLjSZSYSNsAMOCC2Z2UHjRL3sWvq94Y8boLsuNC7OrHHBdi5kGj5G5cqMki9m22Z-MDQqkH1yU3Nz7_Ruz0T6svYXswfhzGw_vRwxnsEOeY83OUBlSzt6U5h8pCLy8KMfoAqUy_MQ
linkToPdf	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwED_cBuKL8xOnU_Pg0yRsTdKm80WmW91QhuAE30qapLKXbrjOv9-kH26KPoj0pdBwlMtd7iP3uwO4cCM_kpGrsDn9I8w093AkpcJCO8K8MIdIC04ePvHxi98f2DY51yUWJqt2L68kc0yD7dKUpO25ittrwLespa3jYWO_CWYVqNmsGKtCrTea3AWfhzHxsmGPRpdcbI15ebH5E5Gvpmnlb367Is0sT1D_9z_vwHbhdKJeLiW7sKGTPaiXAx1Qod_70LqZvqK-SMUVsrATZIUkK9VKUDpD79OFRWDmuM0DeA4Gk9shLoYpYEndboqlCTuNujELNrV5Wp84TOkuk4p2he8xRTu-4MxTsWaEO9o8XJj13MRQcRx16CFUk1mijwB5NCY84tRj0tDjsR9rXxnH0OWCmR2OGtAq-RjO854Z4ao7suVCaOvKLBdC1oBmyemwUJ9F6Jgo0PgNhNIGXJacXfv8G7HjP60-h83HfhA-jMb3J7BFrL3O0itNqKZvS30KlYVanhUS9QGBB8fI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data%3A+from+collection+to+visualization&rft.jtitle=Machine+learning&rft.au=Ghesmoune%2C+Mohammed&rft.au=Azzag%2C+Hanene&rft.au=Benbernou%2C+Salima&rft.au=Lebbah%2C+Mustapha&rft.date=2017-06-01&rft.pub=Springer+US&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=106&rft.issue=6&rft.spage=837&rft.epage=862&rft_id=info:doi/10.1007%2Fs10994-016-5622-4&rft.externalDocID=10_1007_s10994_016_5622_4
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon