Data Quality

From SemWebQuality.org
(Difference between revisions)
Jump to: navigation, search
(See Also)
 
(5 intermediate revisions by 2 users not shown)
Line 14: Line 14:
 
| Olson (2003) || “[…] data has quality if it satisfies the requirements of its intended use.”<ref>Olson, J. (2003). Data quality: the accuracy dimension. San Francisco, USA: Morgan Kaufmann; Elsevier Science.</ref>
 
| Olson (2003) || “[…] data has quality if it satisfies the requirements of its intended use.”<ref>Olson, J. (2003). Data quality: the accuracy dimension. San Francisco, USA: Morgan Kaufmann; Elsevier Science.</ref>
 
|-
 
|-
|ISO 8000 || Quality is the "degree to which a set of inherent characteristics fulfils requirements"<ref>ISO (2005) ISO8000-102:2009, Data quality — Part 102: Master data: Exchange of characteristic data: Vocabulary</ref>
+
|ISO 8000 || Quality is the "degree to which a set of inherent characteristics fulfils requirements"<ref>ISO 8000-102:2009, Data quality — Part 102: Master data: Exchange of characteristic data: Vocabulary</ref>
 
|}
 
|}
  
All of the above definitions have something in common: data quality encompasses the comparison of the '''status quo''' of data to its '''desired state'''. The desired state has multiple different names, such as "fitness for use", "specification", "consumer expectations", "defect-free", "desired features", or simply "requirements". The desired state may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements. Based on this analysis, we can derive the following definition of data quality:
+
All of the above definitions have something in common: data quality encompasses the comparison of the '''status quo''' of data to its '''desired state'''. The desired state has multiple different names, such as "fitness for use", "specification", "consumer expectations", "defect-free", "desired features", or simply "requirements". Based on this analysis, we can derive the following definition of data quality <ref>Fürber, Christian and Hepp, Martin: SWIQA – A Semantic Web Information Quality Assessment Framework, in: Proceedings of the 19th European Conference on Information Systems (ECIS2011), June 9th – 11th, 2011,  Helsinki, Finland</ref>:
  
  '''Data quality is the degree to which requirements are fulfilled'''.  
+
  '''Data quality is the degree to which data fulfills requirements'''.  
  
The requirements can thereby be manyfold due to multiple different tastes, needs, and perspectives. Hence, data quality is also multi-dimensional. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 <ref>Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.</ref>:
+
Data requirements may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements which can be manifold and contradictory due to multiple different tastes and needs. Data quality is also multi-dimensional and covers several different aspects. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 <ref>Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.</ref>:
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 61: Line 61:
 
==See Also==
 
==See Also==
 
[[Quality Perception in Information Systems]]<br />
 
[[Quality Perception in Information Systems]]<br />
 +
[[Data Requirements]]<br />
 
[http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources Quality Criteria for Linked Data Sources (Findings by Annika Flemming & Olaf Hartig)]<br />
 
[http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources Quality Criteria for Linked Data Sources (Findings by Annika Flemming & Olaf Hartig)]<br />
  

Latest revision as of 16:56, 17 October 2011

There are multiple different ways to define data quality and there is currently no commonly agreed definition on what data quality is. However, the following table provides an overview of popular data quality definitions.

Popular Data Quality Definitions
Authors Data Quality Definition
Wang and Strong (1996) “[…] data that are fit for use by data consumers.”[1]
Kahn, Strong, and Wang (2002) “conformance to specifications” and “meeting or exceeding consumer expectations”[2]
Redman (2001) “Data are of high quality if they are fit for their intended uses in operations, decision making, and planning. Data are fit for use if they are free of defects and possess desired features.”[3]
Olson (2003) “[…] data has quality if it satisfies the requirements of its intended use.”[4]
ISO 8000 Quality is the "degree to which a set of inherent characteristics fulfils requirements"[5]

All of the above definitions have something in common: data quality encompasses the comparison of the status quo of data to its desired state. The desired state has multiple different names, such as "fitness for use", "specification", "consumer expectations", "defect-free", "desired features", or simply "requirements". Based on this analysis, we can derive the following definition of data quality [6]:

Data quality is the degree to which data fulfills requirements. 

Data requirements may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements which can be manifold and contradictory due to multiple different tastes and needs. Data quality is also multi-dimensional and covers several different aspects. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 [7]:

The 15 most important data quality dimensions from consumer perspective[8]
Category Dimension Definition
Intrinsic Believability “The extent to which data are accepted or regarded as true, real and credible.”
Accuracy “The extent to which data are correct, reliable and certified free of error.”
Objectivity “The extent to which data are unbiased (unprejudiced) and impartial.”
Reputation “The extent to which data are trusted or highly regarded in terms of their source or content.”
Contextual Value-added “The extent to which data are beneficial and provide advantages from their use.”
Relevancy “The extent to which data are applicable and helpful for the task at hand.”
Timeliness “The extent to which the age of the data is appropriate for the task at hand.”
Completeness “The extent to which data are of sufficient depth, breadth, and scope for the task at hand.”
Appropriate amount of data “The extent to which the quantity and volume of available data is appropriate.”
Representational Interpretability “The extent to which data are in appropriate language and units and the data definitions are clear.”
Ease of understanding “The extent to which data are clear without ambiguity and easily comprehended.”
Representational consistency “The extent to which data are always presented in the same format and are compatible with previous data.”
Concise representation “The extent to which data are compactly represented without being overwhelming (i.e., brief in presentation, yet complete and to the point).”
Accessibility Accessibility “The extent to which data are available or easily and quickly retrievable.”
Access security “The extent to which access to data can be restricted and hence kept secure.”

See Also

Quality Perception in Information Systems
Data Requirements
Quality Criteria for Linked Data Sources (Findings by Annika Flemming & Olaf Hartig)


  1. Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-33.
  2. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
  3. Redman, T. C. (2001). Data quality: the field guide. Boston: Digital Press.
  4. Olson, J. (2003). Data quality: the accuracy dimension. San Francisco, USA: Morgan Kaufmann; Elsevier Science.
  5. ISO 8000-102:2009, Data quality — Part 102: Master data: Exchange of characteristic data: Vocabulary
  6. Fürber, Christian and Hepp, Martin: SWIQA – A Semantic Web Information Quality Assessment Framework, in: Proceedings of the 19th European Conference on Information Systems (ECIS2011), June 9th – 11th, 2011, Helsinki, Finland
  7. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
  8. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.

(This Article was created by Christian Fürber on October 09th 2011)

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox