Machine learning approaches for anomaly detection of water quality on a real-world data set
Accurate detection of water quality changes is a crucial task of water companies. Water supply companies must provide safe drinking water. Nowadays in different areas, we find sensible sensors which monitor data during the time. Normally the data registered by the sensors contain a meaning, such as...
Saved in:
Published in: | Journal of information and telecommunication (Print) Vol. 3; no. 3; pp. 294 - 307 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Abingdon
Taylor & Francis
03-07-2019
Taylor & Francis Ltd Taylor & Francis Group |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Accurate detection of water quality changes is a crucial task of water companies. Water supply companies must provide safe drinking water. Nowadays in different areas, we find sensible sensors which monitor data during the time. Normally the data registered by the sensors contain a meaning, such as there can be any event. Sometimes the data are ill-understood and stating if there is an event which is difficult. This work represents the description of several approaches to identifying changes or anomalies occurring on water quality time series data. This work also discusses and proposes a solution to some challenges when dealing with time series data. The following models are applied to water quality data: logistic regression, linear discriminant analysis, support vector machines (SVM), artificial neural network (ANN), deep neural network (DNN), recurrent neural network (RNN) and long short-term memory (LSTM). The performance evaluation is conducted using F-score metric. A simulation study is conducted to check the performance of each algorithm using F-score. Solving imbalanced data is basically intentionally biasing the data to get interesting results instead of accurate results. The results show that all algorithms are vulnerable although SVM, ANN and logistic regressions tend to be a little less vulnerable, while DNN, RNN and LSTM are very vulnerable. |
---|---|
ISSN: | 2475-1839 2475-1847 |
DOI: | 10.1080/24751839.2019.1565653 |