Create Data Requirements
(→OWL DL Definition) |
(→Types of Data Requirements) |
||
Line 7: | Line 7: | ||
== Types of Data Requirements == | == Types of Data Requirements == | ||
− | Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4)the table/spreadsheet itself. | + | Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself. |
[[File:Terminology.png|500px|Table to illustrate used terminology]] | [[File:Terminology.png|500px|Table to illustrate used terminology]] |
Revision as of 21:07, 22 September 2011
Contents |
What are Data Requirements?
Data requirements are prescribed directives or consensual agreement that define the content and/or structure that constitute high quality data instances and values. Data requirements can thereby be stated by several different individuals or groups of individuals. Moreover, data requirments may also be based on laws, standards, or other directives. They may be agreed upon or contrary to each other.
However, data requirements are required as a prerequisite to measure data quality. Hence, they serve as a benchmark to define the desired state of data. In the following we describe how you can express your data requirements via the DQM-Vocabulary.
Types of Data Requirements
Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself.
In Semantic Web environments, we can compare columns to properties, rows to instances, schemata to ontologies, and tables to classes. Data requirements can usually be related to one of these elements. In particular, there are
- data requirements related to the values of a single property (column)
- data requirements related to the values of multiple properties within an instance (multiple columns in a row)
- data requirements related to the instances of a whole class (table)
- data requirements related to the ontology elements (schema)
With the DQM-Vocabulary, you can model the first three types of requirements. Schema/ontology requirements are currently not part of the vocabulary, but may be added in future releases. In the following, we explain how Property-, Multi-Property-, Class-, and Custom-Requirements can be modelled with the current version of the DQM-Vocabulary.
Define Tested Elements
Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the data requirements. You have two options to do this:
OWL Full Definition
You make the classes and properties that shall be tested for data quality problems direct instances of the classes dqm:TestedClass or dqm:TestedProperty.
foo:MyClass a dqm:TestedClass
foo:MyProperty a dqm:TestedProperty
Attention: This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.
OWL DL Definition
You map the classes and properties that shall be tested for data quality problems to new instances of the classes dqm:TestedClass and dqm:TestedProperty.
foo:Class_1 a dqm:TestedClass dqm:hasURI "http://www.example.org/MyClass"^^xsd:anyURI
foo:Property_1 a dqm:TestedProperty dqm:hasURI "http://www.example.org/MyProperty"^^xsd:anyURI
Type 1: Property Requirements
Property requirements are data requirements that are related to values of a single property. The DQM-Vocabulary provides you requirements to model
- dqm:IllegalValueRangeRule
- dqm:IllegalValueRule
- dqm:LegalValueRangeRule
- dqm:LegalValueRule
- dqm:PropertyCompletenessRule
- dqm:MissingPropertyRule--> RENAME
- dqm:MissingValueRule--> RENAME
- dqm:SyntaxRule
- dqm:UniqueValueRule