Data Quality
There are multiple different ways to define data quality and there is currently no commonly agreed definition on what data quality is. However, the following table provides an overview of popular data quality definitions.
Authors | Data Quality Definition |
---|---|
Wang and Strong (1996) | “[…] data that are fit for use by data consumers.”[1] |
Kahn, Strong, and Wang (2002) | “conformance to specifications” and “meeting or exceeding consumer expectations”[2] |
Redman (2001) | “Data are of high quality if they are fit for their intended uses in operations, decision making, and planning. Data are fit for use if they are free of defects and possess desired features.”[3] |
Olson (2003) | “[…] data has quality if it satisfies the requirements of its intended use.”[4] |
ISO 8000 | Quality is the "degree to which a set of inherent characteristics fulfils requirements"[5] |
All of the above definitions have something in common: data quality encompasses the comparison of the status quo of data to its desired state. The desired state has multiple different names, such as "fitness for use", "specification", "consumer expectations", "defect-free", "desired features", or simply "requirements". Based on this analysis, we can derive the following definition of data quality [6]:
Data quality is the degree to which data fulfills requirements.
Data requirements may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements which can be manifold and contradictory due to multiple different tastes and needs. Data quality is also multi-dimensional and covers several different aspects. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 [7]:
Category | Dimension | Definition |
---|---|---|
Intrinsic | Believability | “The extent to which data are accepted or regarded as true, real and credible.” |
Accuracy | “The extent to which data are correct, reliable and certified free of error.” | |
Objectivity | “The extent to which data are unbiased (unprejudiced) and impartial.” | |
Reputation | “The extent to which data are trusted or highly regarded in terms of their source or content.” | |
Contextual | Value-added | “The extent to which data are beneficial and provide advantages from their use.” |
Relevancy | “The extent to which data are applicable and helpful for the task at hand.” | |
Timeliness | “The extent to which the age of the data is appropriate for the task at hand.” | |
Completeness | “The extent to which data are of sufficient depth, breadth, and scope for the task at hand.” | |
Appropriate amount of data | “The extent to which the quantity and volume of available data is appropriate.” | |
Representational | Interpretability | “The extent to which data are in appropriate language and units and the data definitions are clear.” |
Ease of understanding | “The extent to which data are clear without ambiguity and easily comprehended.” | |
Representational consistency | “The extent to which data are always presented in the same format and are compatible with previous data.” | |
Concise representation | “The extent to which data are compactly represented without being overwhelming (i.e., brief in presentation, yet complete and to the point).” | |
Accessibility | Accessibility | “The extent to which data are available or easily and quickly retrievable.” |
Access security | “The extent to which access to data can be restricted and hence kept secure.” |
See Also
Quality Perception in Information Systems
Data Requirements
Quality Criteria for Linked Data Sources (Findings by Annika Flemming & Olaf Hartig)
- ↑ Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-33.
- ↑ Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
- ↑ Redman, T. C. (2001). Data quality: the field guide. Boston: Digital Press.
- ↑ Olson, J. (2003). Data quality: the accuracy dimension. San Francisco, USA: Morgan Kaufmann; Elsevier Science.
- ↑ ISO 8000-102:2009, Data quality — Part 102: Master data: Exchange of characteristic data: Vocabulary
- ↑ Fürber, Christian and Hepp, Martin: SWIQA – A Semantic Web Information Quality Assessment Framework, in: Proceedings of the 19th European Conference on Information Systems (ECIS2011), June 9th – 11th, 2011, Helsinki, Finland
- ↑ Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
- ↑ Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
(This Article was created by Christian Fürber on October 09th 2011)