Validity as a Measure of Data Quality in Internet of Things Systems
Data quality became significant with the emergence of data warehouse systems. While accuracy is intrinsic data quality, validity of data presents a wider perspective, which is more representational and contextual in nature. Through our article we present a different perspective in data collection an...
Saved in:
Published in: | Wireless personal communications Vol. 126; no. 1; pp. 933 - 948 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
2022
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Data quality became significant with the emergence of data warehouse systems. While accuracy is intrinsic data quality, validity of data presents a wider perspective, which is more representational and contextual in nature. Through our article we present a different perspective in data collection and collation. We focus on faults experienced in data sets and present validity as a function of allied parameters such as completeness, usability, availability and timeliness for determining the data quality. We also analyze the applicability of these metrics and apply modifications to make it conform to IoT applications. Another major focus of this article is to verify these metrics on aggregated data set instead of separate data values. This work focuses on using the different validation parameters for determining the quality of data generated in a pervasive environment. Analysis approach presented is simple and can be employed to test the validity of collected data, isolate faults in the data set and also measure the suitability of data before applying algorithms for analysis. On analyzing the data quality of the two data sets on the basis of above-mentioned parameters. We show that validity for data set 1 was found to be 75% while it was found to be 67% only for data set 2. Availability and data freshness metrics performance were analyzed graphically. It was found that for data set 1, data freshness was better while availability metric was found better for data set 2. Usability obtained for data set 2 was 86% which was higher as compared to data set 1 whose usability metric was 69%. Thus, this work presents methods that can be leveraged for estimating data quality that can be beneficial in various IoT based industries which are essentially data centric and the decisions made by them depends upon the validity of data. |
---|---|
ISSN: | 0929-6212 1572-834X |
DOI: | 10.1007/s11277-022-09777-w |