Data Quality

From SemWebQuality.org
(Difference between revisions)
Jump to: navigation, search
Line 19: Line 19:
 
All of the above definitions have something in common: data quality is something that compares the '''status quo''' to a '''desired state'''. The desired state is called "fitness for use", "specification", "consumer expectations","defect-free","desired features", or simply "requirements". The desired state may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements, but all of the definitions basically agree that '''data quality is the degree to which requirements are fulfilled'''.  
 
All of the above definitions have something in common: data quality is something that compares the '''status quo''' to a '''desired state'''. The desired state is called "fitness for use", "specification", "consumer expectations","defect-free","desired features", or simply "requirements". The desired state may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements, but all of the definitions basically agree that '''data quality is the degree to which requirements are fulfilled'''.  
  
The requirements can thereby be manyfold due to multiple different tastes, needs, and perspectives. Hence, data quality is also multi-dimensional. Wang and Strong
+
The requirements can thereby be manyfold due to multiple different tastes, needs, and perspectives. Hence, data quality is also multi-dimensional. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 <ref>Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.</ref>:
  
 
{| class="wikitable"
 
{| class="wikitable"
 +
|+The 15 most important data quality dimensions from consumer perspective<ref>Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.</ref>
 
|-
 
|-
 
! Category !! Dimension !! Definition
 
! Category !! Dimension !! Definition

Revision as of 12:49, 9 October 2011

There are multiple different ways to define data quality and there is currently no commonly agreed definition on what data quality is. However, the following table provides an overview of popular data quality definitions.

Popular Data Quality Definitions
Authors Data Quality Definition
Wang and Strong (1996) “[…] data that are fit for use by data consumers.”[1]
Kahn, Strong, and Wang (2002) “conformance to specifications” and “meeting or exceeding consumer expectations”[2]
Redman (2001) “Data are of high quality if they are fit for their intended uses in operations, decision making, and planning. Data are fit for use if they are free of defects and possess desired features.”[3]
Olson (2003) “[…] data has quality if it satisfies the requirements of its intended use.”[4]
ISO 8000 Quality is the "degree to which a set of inherent characteristics fulfils requirements"[5]

All of the above definitions have something in common: data quality is something that compares the status quo to a desired state. The desired state is called "fitness for use", "specification", "consumer expectations","defect-free","desired features", or simply "requirements". The desired state may thereby not only be stated by data consumers, but also by data providers, administrators, legal authorities, and many other stakeholders. Thus, there are multiple different perspectives on requirements, but all of the definitions basically agree that data quality is the degree to which requirements are fulfilled.

The requirements can thereby be manyfold due to multiple different tastes, needs, and perspectives. Hence, data quality is also multi-dimensional. Wang and Strong identified the following 15 most important dimensions in the eyes of data consumers during an empirical study in 1996 [6]:

The 15 most important data quality dimensions from consumer perspective[7]
Category Dimension Definition
Intrinsic Believability “The extent to which data are accepted or regarded as true, real and credible.”
Accuracy “The extent to which data are correct, reliable and certified free of error.”
Objectivity “The extent to which data are unbiased (unprejudiced) and impartial.”
Reputation “The extent to which data are trusted or highly regarded in terms of their source or content.”
Contextual Value-added “The extent to which data are beneficial and provide advantages from their use.”
Relevancy “The extent to which data are applicable and helpful for the task at hand.”
Timeliness “The extent to which the age of the data is appropriate for the task at hand.”
Completeness “The extent to which data are of sufficient depth, breadth, and scope for the task at hand.”
Appropriate amount of data “The extent to which the quantity and volume of available data is appropriate.”
Representational Interpretability “The extent to which data are in appropriate language and units and the data definitions are clear.”
Ease of understanding “The extent to which data are clear without ambiguity and easily comprehended.”
Representational consistency “The extent to which data are always presented in the same format and are compatible with previous data.”
Concise representation “The extent to which data are compactly represented without being overwhelming (i.e., brief in presentation, yet complete and to the point).”
Accessibility Accessibility “The extent to which data are available or easily and quickly retrievable.”
Access security “The extent to which access to data can be restricted and hence kept secure.”



  1. Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-33.
  2. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
  3. Redman, T. C. (2001). Data quality: the field guide. Boston: Digital Press.
  4. Olson, J. (2003). Data quality: the accuracy dimension. San Francisco, USA: Morgan Kaufmann; Elsevier Science.
  5. ISO (2005) ISO8000-102:2009, Data quality — Part 102: Master data: Exchange of characteristic data: Vocabulary
  6. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
  7. Kahn, B. K., Strong, D. M., & Wang, R. Y. (2002). Information quality benchmarks: product and service performance. Commun. ACM, 45(4), 184-192.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox