Machine learning approaches for anomaly detection of water quality on a real-world data set

Accurate detection of water quality changes is a crucial task of water companies. Water supply companies must provide safe drinking water. Nowadays in different areas, we find sensible sensors which monitor data during the time. Normally the data registered by the sensors contain a meaning, such as...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of information and telecommunication (Print) Vol. 3; no. 3; pp. 294 - 307
Main Authors:	Muharemi, Fitore, Logofătu, Doina, Leon, Florin
Format:	Journal Article
Language:	English
Published:	Abingdon Taylor & Francis 03-07-2019 Taylor & Francis Ltd Taylor & Francis Group
Subjects:	Algorithms Anomalies Artificial neural networks Classification Computer simulation Discriminant analysis Drinking water event F1 score imputation Learning theory Machine learning Neural networks Performance evaluation Recurrent neural networks Regression analysis Sensors Support vector machines Time series Water quality Water supply
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Accurate detection of water quality changes is a crucial task of water companies. Water supply companies must provide safe drinking water. Nowadays in different areas, we find sensible sensors which monitor data during the time. Normally the data registered by the sensors contain a meaning, such as there can be any event. Sometimes the data are ill-understood and stating if there is an event which is difficult. This work represents the description of several approaches to identifying changes or anomalies occurring on water quality time series data. This work also discusses and proposes a solution to some challenges when dealing with time series data. The following models are applied to water quality data: logistic regression, linear discriminant analysis, support vector machines (SVM), artificial neural network (ANN), deep neural network (DNN), recurrent neural network (RNN) and long short-term memory (LSTM). The performance evaluation is conducted using F-score metric. A simulation study is conducted to check the performance of each algorithm using F-score. Solving imbalanced data is basically intentionally biasing the data to get interesting results instead of accurate results. The results show that all algorithms are vulnerable although SVM, ANN and logistic regressions tend to be a little less vulnerable, while DNN, RNN and LSTM are very vulnerable.
ISSN:	2475-1839 2475-1847
DOI:	10.1080/24751839.2019.1565653