Create Data Requirements

From SemWebQuality.org
(Difference between revisions)
Jump to: navigation, search
(Sample Dataset used in Examples)
(Examples of Data Requirements)
 
(34 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''<span style="color:red">This site is currently under construction!</span>'''
 
  
 
__TOC__
 
__TOC__
Line 26: Line 25:
 
Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:
 
Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:
  
=== Option 1: Classes and Properties as Instances (OWL Full) ===
+
=== Design Option 1: Classes and Properties as Instances (OWL Full) ===
 
Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes [[dqm:TestedClass]] or [[dqm:TestedProperty]].
 
Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes [[dqm:TestedClass]] or [[dqm:TestedProperty]].
 
<pre>
 
<pre>
Line 36: Line 35:
 
<span style="color:red;">'''Attention:'''</span> This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.
 
<span style="color:red;">'''Attention:'''</span> This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.
  
=== Option 2: Mapping of Classes and Properties to new URIs (OWL DL) ===
+
=== Design Option 2: Mapping of Classes and Properties to new URIs (OWL DL) ===
 
Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes [[dqm:TestedClass]] and [[dqm:TestedProperty]].
 
Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes [[dqm:TestedClass]] and [[dqm:TestedProperty]].
 
<pre>
 
<pre>
Line 47: Line 46:
 
</pre>
 
</pre>
  
== Examples of Data Requirements ==
+
== Sample Dataset used in Examples ==
 
+
=== Sample Dataset used in Examples ===
+
 
The following examples use classes and properties from our sample dataset as [[dqm:TestedClass]] and [[dqm:TestedProperties]]. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:
 
The following examples use classes and properties from our sample dataset as [[dqm:TestedClass]] and [[dqm:TestedProperties]]. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:
  
Line 76: Line 73:
 
|-
 
|-
 
|foo:STATE||foo:Prop_Location_State
 
|foo:STATE||foo:Prop_Location_State
 +
|-
 +
|foo:validThrough||foo:Prop_validThrough
 
|}
 
|}
  
 
In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.
 
In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.
  
=== Type 1: Property Requirements ===
+
==Syntax Of Examples==
Property requirements are data requirements that are related to values of a single property. The DQM-Vocabulary provides the following property requirements:
+
  
* [[dqm:IllegalValueRangeRule]]
+
The following examples show instance data in [http://www.w3.org/TeamSubmission/turtle/ Turtle/Notation 3 syntax].
* [[dqm:IllegalValueRule]]
+
* [[dqm:LegalValueRangeRule]]
+
* [[dqm:LegalValueRule]]
+
* [[dqm:PropertyCompletenessRule]]
+
* [[dqm:SyntaxRule]]
+
* [[dqm:UniqueValueRule]]
+
  
 +
== Examples of Data Requirements ==
  
====Example 1: PropertyCompletenessRule (Minimal Input)====
+
===Example 1: Property Completeness Rule===
A property completeness rule is a data requirement that specifies that a certain property and/or its value must exist in all instances of a certain class.
+
 
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that a specific property and/or its values must exist for all instances of a specific class.
 +
|-
 +
|'''Notional Example:'''||In a location data set, the property foo:COUNTRY must exist and have a value in all instances of the class foo:Location.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:MissingPropertyAndValue]] [[dqm:MissingProperty]] [[dqm:MissingValue]]
 +
|-
 +
|'''Dimension:'''||[[dqm:PropertyCompleteness]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:MultiPropertyRequirement]]
 +
|}
  
 
If you defined your data elements in OWL Full ([[#Option 1: Classes and Properties as Instances (OWL Full)|Option 1]]), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:
 
If you defined your data elements in OWL Full ([[#Option 1: Classes and Properties as Instances (OWL Full)|Option 1]]), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:
Line 99: Line 103:
  
 
'''Definition in OWL Full'''
 
'''Definition in OWL Full'''
<pre>
+
<syntaxhighlight lang="n3">
 
foo:PropertyCompletenessRule_1
 
foo:PropertyCompletenessRule_1
 
       a      dqm:PropertyCompletenessRule ;
 
       a      dqm:PropertyCompletenessRule ;
Line 106: Line 110:
 
       dqm:requiredProperty "true"^^xsd:boolean ;
 
       dqm:requiredProperty "true"^^xsd:boolean ;
 
       dqm:requiredValue "true"^^xsd:boolean .
 
       dqm:requiredValue "true"^^xsd:boolean .
</pre>
+
</syntaxhighlight>
[[Generate_Problem_Reports#Example 1.1: Property Completeness Violations (OWL Full Design)|Click here to learn how to generate a problem report from this]]<br />
+
[[Data Quality Monitoring Reports#Example 1.1: Missing Properties / Values (OWL Full Design)|Click here to learn how to generate a monitoring report from this]]<br />
  
 
The property [[dqm:requiredProperty]] specifies that the property "MyProperty" must exist in each instance. The property [[dqm:requiredValue]] specifies that a value must exist for property "MyProperty".
 
The property [[dqm:requiredProperty]] specifies that the property "MyProperty" must exist in each instance. The property [[dqm:requiredValue]] specifies that a value must exist for property "MyProperty".
Line 115: Line 119:
  
 
'''Definition in OWL-DL'''
 
'''Definition in OWL-DL'''
<pre>
+
<syntaxhighlight lang="n3">
 
foo:PropertyCompletenessRule_1
 
foo:PropertyCompletenessRule_1
 
       a      dqm:PropertyCompletenessRule ;
 
       a      dqm:PropertyCompletenessRule ;
Line 122: Line 126:
 
       dqm:requiredProperty "true"^^xsd:boolean ;
 
       dqm:requiredProperty "true"^^xsd:boolean ;
 
       dqm:requiredValue "true"^^xsd:boolean .
 
       dqm:requiredValue "true"^^xsd:boolean .
</pre>
+
</syntaxhighlight>
[[Generate_Problem_Reports#Example 1.2: Property Completeness Violations (OWL DL Design)|Click here to learn how to generate a problem report from this]]<br />
+
[[Data Quality Monitoring Reports#Example 1.2: Missing Properties / Values (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 
+
 
+
  
 
The property [[dqm:requiredProperty]] specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property [[dqm:requiredValue]] specifies that also a value must exist for property "foo:Property_1".
 
The property [[dqm:requiredProperty]] specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property [[dqm:requiredValue]] specifies that also a value must exist for property "foo:Property_1".
  
Congratulations, you can use now generic SPARQL queries to test the completeness of "MyProperty" / "foo:Property_1" in instances of "MyClass" / "foo:Class_1".
+
===Example 2: Conditional Property Completeness Rule (1 Condition, OWL DL)===
  
===Example 2: Conditional Property Completeness Rule===
+
{|class="wikitable"
 
+
|'''Task:'''||Specify that a specific property and/or its values must exist if another property obtains a specific value.
====Requirement Description====
+
|-
 
+
|'''Notional Example:'''||In a location data set, the property foo:STATE must exist and have a value in all instances of the class foo:Location that have value "USA" for the property foo:COUNTRY.
In this example, we define that the property '''foo:location_STATE''' must exist and have a value in all instances of the class '''foo:Location''' that have the value "USA" for the property '''foo:location_COUNTRY'''.
+
|-
 
+
|'''DQ-Problem:'''|| [[dqm:MissingPropertyAndValue]]
====Procedure====
+
|-
 +
|'''Dimension:'''||[[dqm:PropertyCompleteness]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:MultiPropertyRequirement]]
 +
|}
  
 
To define this data requirement, you must perform the following two steps:
 
To define this data requirement, you must perform the following two steps:
Line 143: Line 149:
 
'''1. Define the Condition'''
 
'''1. Define the Condition'''
  
<pre>
+
<syntaxhighlight lang="n3">
 
foo:Condition_USA
 
foo:Condition_USA
 
       a      dqm:Condition ;
 
       a      dqm:Condition ;
Line 149: Line 155:
 
       dqm:conditionalProperty foo:Prop_Location_Country ;
 
       dqm:conditionalProperty foo:Prop_Location_Country ;
 
       dqm:equals "USA"^^xsd:string .
 
       dqm:equals "USA"^^xsd:string .
</pre>
+
</syntaxhighlight>
  
 
'''2. Define the Conditional Property Completeness Rule'''
 
'''2. Define the Conditional Property Completeness Rule'''
  
<pre>
+
<syntaxhighlight lang="n3">
 
foo:ConditionalPropertyCompletenessRule_State
 
foo:ConditionalPropertyCompletenessRule_State
 
       a      dqm:ConditionalPropertyCompletenessRule ;
 
       a      dqm:ConditionalPropertyCompletenessRule ;
Line 162: Line 168:
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedProperty1 foo:Prop_Location_State .
 
       dqm:testedProperty1 foo:Prop_Location_State .
</pre>
+
</syntaxhighlight>
[[Generate_Problem_Reports#Example 2: Conditional Property Completeness Violations (OWL DL Design)|Click here to learn how to generate a problem report from this]]<br />
+
[[Data Quality Monitoring Reports#Example 2: Conditional Missing Properties / Values (1 Condition, OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
  
===Example 2: PropertyCompletenessRule with Requirement-Metadata===
+
===Example 3: Syntax Rule (OWL DL)===
  
You can annotate your data requirements with several meta-information, such as information about its provenance, its task-dependency, a natural language description, and how the requirement shall be used. Below you can see an example that makes extensive use of the DQM-Vocabulary regarding the specification of data requirements.
+
{|class="wikitable"
 +
|'''Task:'''||Specify that values of a specific property must obtain a specific syntax.
 +
|-
 +
|'''Notional Example:'''||In a location data set, the property foo:ZIP must contain values with exactly five digits.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:SyntaxViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:PropertyRequirement]]
 +
|}
  
<pre>
+
You can specify syntax requirements by creating an instance of the class [[dqm:SyntaxRule]], e.g. as follows:
foo:PropertyCompletenessRule_1
+
      a      dqm:PropertyCompletenessRule ;
+
      dqm:testedClass foo:Class_Location ;
+
      dqm:testedProperty1 foo:Prop_Location_Country ;
+
      dqm:requiredProperty "true"^^xsd:boolean ;
+
      dqm:requiredValue "true"^^xsd:boolean ;
+
      dqm:reqName "Country Completeness in Class Location"^^xsd:string ;
+
      dqm:reqDescription "Each instance of the class \"Location\" must have a property
+
                          value for the property \"Country\""^^xsd:string ;
+
      dqm:reqSource "Christian Fürber"^^xsd:string ;
+
      dqm:taskDependent "false"^^xsd:boolean ;
+
      dqm:assessment "true"^^xsd:boolean ;
+
      dqm:confidence "80"^^rdfs:Literal ;
+
      dqm:filtering "true"^^xsd:boolean ;
+
      dqm:validation "true"^^xsd:boolean ;
+
      dqm:importance "3" ;
+
      dqm:lastModified "2011-10-10T18:20:55.106+01:00"^^xsd:dateTime ;
+
      dqm:validFrom "2011-10-10T18:19:32.917+01:00"^^xsd:dateTime ;
+
      dqm:validUntil "2012-10-10T18:19:57.191+01:00"^^xsd:dateTime .
+
</pre>
+
  
====Example 3: Syntax Rule (OWL DL)====
+
<syntaxhighlight lang="n3">
 
+
Defines that a value in the property mapped to "foo:Property_ZIP" must have five digits.
+
<pre>
+
 
foo:SyntaxRule_ZIP
 
foo:SyntaxRule_ZIP
 
       a      dqm:SyntaxRule ;
 
       a      dqm:SyntaxRule ;
Line 201: Line 194:
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedProperty1 foo:Property_ZIP .
 
       dqm:testedProperty1 foo:Property_ZIP .
</pre>
+
</syntaxhighlight>
  
[[Generate_Problem_Reports#Example 3: Syntax Rule Violations (OWL DL Design)|Click here to learn how to generate a problem report from this]]<br />
+
[[Data Quality Monitoring Reports#Example 3: Syntax Violations (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
  
====Example 4:  Conditional Syntax Rule (OWL DL)====
+
===Example 4:  Conditional Syntax Rule (1 Condition, OWL DL)===
{|
+
{|class="wikitable"
|valign="top"|'''Scenario:'''  
+
|'''Task:'''||Specify that values of a specific property must obtain a specific syntax if another property obtains a specific value.
|Identify values in the property foo:STATE that do not have exactly 2 digits. The rule only applies for instances with the value "USA" in the property foo:COUNTRY.
+
|-
 +
|'''Notional Example:'''||In a location data set, the property foo:STATE must contain a value with two letters if the property foo:COUNTRY has the value "USA".
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:SyntaxViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:MultiPropertyRequirement]]
 
|}
 
|}
 +
 +
In order to specify a conditional syntax rule, you must perform the following steps:
 +
 
'''1.Define Condition'''
 
'''1.Define Condition'''
<pre>
+
<syntaxhighlight lang="n3">
 
foo:Condition_USA
 
foo:Condition_USA
 
       a      dqm:Condition ;
 
       a      dqm:Condition ;
Line 217: Line 220:
 
       dqm:conditionalProperty foo:Prop_Location_Country ;
 
       dqm:conditionalProperty foo:Prop_Location_Country ;
 
       dqm:equals "USA"^^xsd:string .
 
       dqm:equals "USA"^^xsd:string .
</pre>
+
</syntaxhighlight>
  
 
'''2.Define Conditional Syntax Rule'''
 
'''2.Define Conditional Syntax Rule'''
<pre>
+
<syntaxhighlight lang="n3">
 
foo:ConditionalSyntaxRule_State
 
foo:ConditionalSyntaxRule_State
 
       a      dqm:ConditionalSyntaxRule ;
 
       a      dqm:ConditionalSyntaxRule ;
Line 228: Line 231:
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedProperty1 foo:Prop_Location_State .
 
       dqm:testedProperty1 foo:Prop_Location_State .
</pre>
+
</syntaxhighlight>
  
====Example 5: Legal Value Range Rule (OWL DL)====
+
[[Data Quality Monitoring Reports#Example 4: Conditional Syntax Violations (1 Condition, OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
  
 +
===Example 5: Legal Value Range Rule (OWL DL)===
 
{|class="wikitable"
 
{|class="wikitable"
 
|'''Task:'''||Specify valid value ranges for properties that hold numeric values.
 
|'''Task:'''||Specify valid value ranges for properties that hold numeric values.
 +
|-
 +
|'''Notional Example:'''||In a product data set, the property foo:PRICE can never contain negative values.
 
|-
 
|-
 
|'''DQ-Problem:'''|| [[dqm:OutOfRangeValue]]
 
|'''DQ-Problem:'''|| [[dqm:OutOfRangeValue]]
 
|-
 
|-
 
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:PropertyRequirement]]
 
|}
 
|}
  
 
You can specify a legal value range for a property by adding the following instance data to the class [[dqm:LegalValueRangeRule]]:
 
You can specify a legal value range for a property by adding the following instance data to the class [[dqm:LegalValueRangeRule]]:
  
<pre>
+
<syntaxhighlight lang="n3">
 
foo:LegalValueRangeRule_Price
 
foo:LegalValueRangeRule_Price
 
       a      dqm:LegalValueRangeRule ;
 
       a      dqm:LegalValueRangeRule ;
Line 249: Line 257:
 
       dqm:testedClass foo:Class_Product ;
 
       dqm:testedClass foo:Class_Product ;
 
       dqm:testedProperty1 foo:Prop_Product_Price .
 
       dqm:testedProperty1 foo:Prop_Product_Price .
</pre>
+
</syntaxhighlight>
  
 
The class [[dqm:LegalValueRangeRule]] has the special properties
 
The class [[dqm:LegalValueRangeRule]] has the special properties
Line 257: Line 265:
 
to specify the lowest and/or highest allowed value.
 
to specify the lowest and/or highest allowed value.
  
====Example 6: Legal Value Rule (OWL DL)====
+
[[Data Quality Monitoring Reports#Example 5: Out of Range Values (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 6: Legal Value Rule (OWL DL)===
  
 
{|class="wikitable"
 
{|class="wikitable"
 
|'''Task:'''||Specify a reference property that holds the allowed values.
 
|'''Task:'''||Specify a reference property that holds the allowed values.
 +
|-
 +
|'''Notional Example:'''||In a location data set, the property foo:COUNTRY can only contain values of the trusted property foo:legalValue in the trusted class foo:LegalValueCountry.
 
|-
 
|-
 
|'''DQ-Problem:'''|| [[dqm:IllegalValue]]
 
|'''DQ-Problem:'''|| [[dqm:IllegalValue]]
 
|-
 
|-
 
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 
|'''Dimension:'''||[[dqm:SyntacticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:PropertyRequirement]]
 
|}
 
|}
  
Line 273: Line 287:
 
# Create an instance of [[dqm:LegalValueRule]], e.g. as follows:
 
# Create an instance of [[dqm:LegalValueRule]], e.g. as follows:
  
<pre>
+
<syntaxhighlight lang="n3">
 
foo:LegalValueRule_Country
 
foo:LegalValueRule_Country
 
       a      dqm:LegalValueRule ;
 
       a      dqm:LegalValueRule ;
Line 281: Line 295:
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedProperty1 foo:Prop_Location_Country .
 
       dqm:testedProperty1 foo:Prop_Location_Country .
</pre>
+
</syntaxhighlight>
  
== Type 2: Class Requirements ==
+
[[Data Quality Monitoring Reports#Example 6: Illegal Values (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 7: Unique Value Rule (OWL DL)===
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that values of a property must be unique.
 +
|-
 +
|'''Notional Example:'''||In a location data set, the property foo:LOCID of class foo:Location must only contain unique values.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:UniquenessViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:PropertyUniqueness]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:PropertyRequirement]]
 +
|}
 +
 
 +
You can specify that values of a property must be unique by creating an instance of the class [[dqm:UniqueValueRule]]:
 +
 
 +
<syntaxhighlight lang="n3">
 +
foo:UniqueValueRule_LOCID
 +
      a      dqm:UniqueValueRule ;
 +
      rdfs:label "Unique value rule LOCID"^^xsd:string ;
 +
      dqm:testedClass foo:Class_Location ;
 +
      dqm:testedProperty1 foo:Prop_Location_ID .
 +
</syntaxhighlight>
 +
 
 +
[[Data Quality Monitoring Reports#Example 7: Uniqueness Violations (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 8: Functional Dependency Value Rule (1 Condition, OWL DL)===
 +
 
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that a value of one property must obtain a specific value if a second property has a certain value.
 +
|-
 +
|'''Notional Example:'''||In an address data set, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:FunctionalDependencyViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:SemanticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:MultiPropertyRequirement]]
 +
|}
 +
 
 +
In order to specify the dependency between two property values, you must perform the following steps:
 +
 
 +
# Define a condition under which a specific value is always required.
 +
# Create an instance of the class [[dqm:FuncDepValueRule]], e.g. as follows:
 +
 
 +
<syntaxhighlight lang="n3">
 +
foo:FuncDepValueRule_1
 +
      a      dqm:FuncDepValueRule ;
 +
      rdfs:label "Func dep value rule 1"^^xsd:string ;
 +
      dqm:equals "USA"^^xsd:string ;
 +
      dqm:hasCondition1 foo:Condition_New_York ;
 +
      dqm:reqDescription "If the city value is \"New York\" then the country must be   
 +
                        \"USA\"."^^xsd:string ;
 +
      dqm:testedClass foo:Class_Location ;
 +
      dqm:testedProperty1 foo:Prop_Location_Country .
 +
</syntaxhighlight>
 +
 
 +
[[Data Quality Monitoring Reports#Example 8: Functional Dependency Violations (1 Condition, OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)===
 +
 
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that a value of one property must obtain a specific value if a second and third property have specific values.
 +
|-
 +
|'''Notional Example:'''||In an address data set, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:FunctionalDependencyViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:SemanticAccuracy]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:MultiPropertyRequirement]]
 +
|}
 +
 
 +
In order to specify the dependency between three property values, you must perform the following steps:
 +
 
 +
# Define both conditions under which a specific value is always required.
 +
# Create an instance of the class [[dqm:FuncDepValueRule]], e.g. as follows:
 +
 
 +
<syntaxhighlight lang="n3">
 +
foo:FuncDepValueRule_2
 +
      a      dqm:FuncDepValueRule ;
 +
      rdfs:label "Func dep value rule 2"^^xsd:string ;
 +
      dqm:equals "NY"^^xsd:string ;
 +
      dqm:hasCondition1 foo:Condition_USA ;
 +
      dqm:hasCondition2 foo:Condition_New_York ;
 +
      dqm:reqDescription "If the city value is \"New York\" and the country value is
 +
                        \"USA\" then the state must be \"NY\"."^^xsd:string ;
 +
      dqm:testedClass foo:Class_Location ;
 +
      dqm:testedProperty1 foo:Prop_Location_State .
 +
</syntaxhighlight>
 +
 
 +
[[Data Quality Monitoring Reports#Example 9: Functional Dependency Violations (2 Conditions, OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 10: Expiry Rule (OWL DL)===
 +
 
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that instances of a specific class expire.
 +
|-
 +
|'''Notional Example:'''||In a product data set, the class foo:Product has instances with product offerings that expire on a certain date which is specified via the property foo:validThrough.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:OutdatedInstance]]
 +
|-
 +
|'''Dimension:'''||[[dqm:Timeliness]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:ClassRequirement]]
 +
|}
 +
 
 +
You can specify that instances of a class have an expiry date by creating an instance of class [[dqm:ExpiryRule]], e.g. as follows:
 +
 
 +
<syntaxhighlight lang="n3">
 +
foo:ExpiryRule_1
 +
      a      dqm:ExpiryRule ;
 +
      rdfs:label "Expiry rule 1"^^xsd:string ;
 +
      dqm:testedClass foo:Class_Product ;
 +
      dqm:testedProperty1 foo:Prop_Product_validThrough .
 +
</syntaxhighlight>
 +
 
 +
[[Data Quality Monitoring Reports#Example 10: Outdated / Expired Values (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />
 +
 
 +
===Example 11: Update Rule (OWL DL)===
 +
 
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that instances of a specific class must be updated within a specified interval.
 +
|-
 +
|'''Notional Example:'''||In a location data set, the class foo:Location has instances with address data that have timestamps of the last update. The instances shall not be elder than 1 year 2 months 3 days 5 hours 20 minutes and 30.123 seconds.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:OutdatedInstance]]
 +
|-
 +
|'''Dimension:'''||[[dqm:Timeliness]]
 +
|-
 +
|'''Requirement Type:'''||[[dqm:ClassRequirement]]
 +
|}
 +
 
 +
You can specify a required update interval for instances of a specific property by creating an instance of class [[dqm:UpdateRule]], e.g. as follows:
 +
 
 +
<syntaxhighlight lang="n3">
 +
foo:UpdateRule_Location
 +
      a      dqm:UpdateRule ;
 +
      rdfs:label "Update rule Location"^^xsd:string ;
 +
      dqm:expectedUpdateInterval "P1Y2M3DT5H20M30.123S"^^xsd:duration ;
 +
      dqm:testedClass foo:Class_Location ;
 +
      dqm:testedProperty1 foo:Prop_Location_timestamp .
 +
</syntaxhighlight>
  
== Type 3: Multi-Property Requirements ==
+
<span style="color:red">'''NOTE:'''</span> The tested class must have a property that holds the time of the last update in order to be able to specify this requirement.
  
== Type 4: Custom Requirements ==
+
[[Data Quality Monitoring Reports#Example 11: Outdated / Not Updated Values (OWL DL Design)|Click here to learn how to generate a monitoring report from this]]<br />

Latest revision as of 12:28, 28 March 2012

Contents


What are Data Requirements?

Data requirements are prescribed directives or consensual agreements that define the content and/or structure that constitute high quality data instances and values. Data requirements can thereby be stated by several different individuals or groups of individuals. Moreover, data requirments may also be based on laws, standards, or other directives. They may be agreed upon or contrary to each other.

However, data requirements are required as a prerequisite to measure data quality. Hence, they serve as a benchmark that defines the desired state of data. In the following, we describe how you can express your data requirements via the DQM-Vocabulary.

Types of Data Requirements

Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself.

Table to illustrate used terminology

In Semantic Web environments, we can compare columns to properties, rows to instances, schemata to ontologies, and tables to classes. Data requirements can usually be related to one of these elements. In particular, there are

  1. data requirements related to the values of a single property (column)
  2. data requirements related to the values of multiple properties within an instance (multiple columns in a row)
  3. data requirements related to the instances of a whole class (table)
  4. data requirements related to the ontology elements (schema)

With the DQM-Vocabulary, you can model the first three types of requirements. Schema/ontology requirements are currently not part of the vocabulary, but may be added in future releases. In the following, we explain how Property-, Multi-Property-, Class-, and Custom-Requirements can be modelled with the current version of the DQM-Vocabulary.

Define Tested Elements

Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:

Design Option 1: Classes and Properties as Instances (OWL Full)

Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes dqm:TestedClass or dqm:TestedProperty.

foo:MyClass a dqm:TestedClass
foo:MyProperty a dqm:TestedProperty

Attention: This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.

Design Option 2: Mapping of Classes and Properties to new URIs (OWL DL)

Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes dqm:TestedClass and dqm:TestedProperty.

foo:Class_1 a dqm:TestedClass
                dqm:hasURI "http://www.example.org/MyClass"^^xsd:anyURI
foo:Property_1 a dqm:TestedProperty
                dqm:hasURI "http://www.example.org/MyProperty"^^xsd:anyURI

Sample Dataset used in Examples

The following examples use classes and properties from our sample dataset as dqm:TestedClass and dqm:TestedProperties. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:

Original Class Mapped Instance
foo:Location foo:Class_Location

...with the following datatype properties:

Original Property Mapped Instance
foo:LOCID foo:Prop_Location_ID
foo:STREET foo:Prop_Location_Street
foo:STREETNO foo:Prop_Location_Streetno
foo:ZIP foo:Prop_Location_ZIP
foo:CITY foo:Prop_Location_City
foo:COUNTRY foo:Prop_Location_Country
foo:STATE foo:Prop_Location_State
foo:validThrough foo:Prop_validThrough

In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.

Syntax Of Examples

The following examples show instance data in Turtle/Notation 3 syntax.

Examples of Data Requirements

Example 1: Property Completeness Rule

Task: Specify that a specific property and/or its values must exist for all instances of a specific class.
Notional Example: In a location data set, the property foo:COUNTRY must exist and have a value in all instances of the class foo:Location.
DQ-Problem: dqm:MissingPropertyAndValue dqm:MissingProperty dqm:MissingValue
Dimension: dqm:PropertyCompleteness
Requirement Type: dqm:MultiPropertyRequirement

If you defined your data elements in OWL Full (Option 1), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:


Definition in OWL Full

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass http://www.example.org/MyClass ;
      dqm:testedProperty1 http://www.example.org/MyProperty ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a monitoring report from this

The property dqm:requiredProperty specifies that the property "MyProperty" must exist in each instance. The property dqm:requiredValue specifies that a value must exist for property "MyProperty".

If you mapped your own ontology elements to new URIs (Option 2, OWL DL), then the following example will help you to define a Property Completeness Rule:


Definition in OWL-DL

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass foo:Class_1 ;
      dqm:testedProperty1 foo:Property_1 ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a monitoring report from this

The property dqm:requiredProperty specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property dqm:requiredValue specifies that also a value must exist for property "foo:Property_1".

Example 2: Conditional Property Completeness Rule (1 Condition, OWL DL)

Task: Specify that a specific property and/or its values must exist if another property obtains a specific value.
Notional Example: In a location data set, the property foo:STATE must exist and have a value in all instances of the class foo:Location that have value "USA" for the property foo:COUNTRY.
DQ-Problem: dqm:MissingPropertyAndValue
Dimension: dqm:PropertyCompleteness
Requirement Type: dqm:MultiPropertyRequirement

To define this data requirement, you must perform the following two steps:

1. Define the Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2. Define the Conditional Property Completeness Rule

foo:ConditionalPropertyCompletenessRule_State
      a       dqm:ConditionalPropertyCompletenessRule ;
      rdfs:label "Conditional property completeness rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 3: Syntax Rule (OWL DL)

Task: Specify that values of a specific property must obtain a specific syntax.
Notional Example: In a location data set, the property foo:ZIP must contain values with exactly five digits.
DQ-Problem: dqm:SyntaxViolation
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

You can specify syntax requirements by creating an instance of the class dqm:SyntaxRule, e.g. as follows:

foo:SyntaxRule_ZIP
      a       dqm:SyntaxRule ;
      rdfs:label "Syntax rule ZIP"^^xsd:string ;
      dqm:regex "^[0-9]{5}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Property_ZIP .

Click here to learn how to generate a monitoring report from this

Example 4: Conditional Syntax Rule (1 Condition, OWL DL)

Task: Specify that values of a specific property must obtain a specific syntax if another property obtains a specific value.
Notional Example: In a location data set, the property foo:STATE must contain a value with two letters if the property foo:COUNTRY has the value "USA".
DQ-Problem: dqm:SyntaxViolation
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify a conditional syntax rule, you must perform the following steps:

1.Define Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2.Define Conditional Syntax Rule

foo:ConditionalSyntaxRule_State
      a       dqm:ConditionalSyntaxRule ;
      rdfs:label "Conditional syntax rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:regex "^[A-Z]{2}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 5: Legal Value Range Rule (OWL DL)

Task: Specify valid value ranges for properties that hold numeric values.
Notional Example: In a product data set, the property foo:PRICE can never contain negative values.
DQ-Problem: dqm:OutOfRangeValue
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

You can specify a legal value range for a property by adding the following instance data to the class dqm:LegalValueRangeRule:

foo:LegalValueRangeRule_Price
      a       dqm:LegalValueRangeRule ;
      rdfs:label "Legal value range rule Price"^^xsd:string ;
      dqm:lowerLimit "0.00"^^xsd:float ;
      dqm:testedClass foo:Class_Product ;
      dqm:testedProperty1 foo:Prop_Product_Price .

The class dqm:LegalValueRangeRule has the special properties

to specify the lowest and/or highest allowed value.

Click here to learn how to generate a monitoring report from this

Example 6: Legal Value Rule (OWL DL)

Task: Specify a reference property that holds the allowed values.
Notional Example: In a location data set, the property foo:COUNTRY can only contain values of the trusted property foo:legalValue in the trusted class foo:LegalValueCountry.
DQ-Problem: dqm:IllegalValue
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

In order to specify legal values for a specific property, you need to perform the following steps:

  1. Create a reference data set or use an existing data set that holds the legal values
  2. Specify the class and property which hold the legal values as dqm:TrustedClass and dqm:TrustedProperty
  3. Create an instance of dqm:LegalValueRule, e.g. as follows:
foo:LegalValueRule_Country
      a       dqm:LegalValueRule ;
      rdfs:label "Legal value rule Country"^^xsd:string ;
      dqm:referenceClass foo:TrustedClass_LegalValueCountry ;
      dqm:referenceProperty1 foo:TrustedProperty_LegalValue ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .

Click here to learn how to generate a monitoring report from this

Example 7: Unique Value Rule (OWL DL)

Task: Specify that values of a property must be unique.
Notional Example: In a location data set, the property foo:LOCID of class foo:Location must only contain unique values.
DQ-Problem: dqm:UniquenessViolation
Dimension: dqm:PropertyUniqueness
Requirement Type: dqm:PropertyRequirement

You can specify that values of a property must be unique by creating an instance of the class dqm:UniqueValueRule:

foo:UniqueValueRule_LOCID
      a       dqm:UniqueValueRule ;
      rdfs:label "Unique value rule LOCID"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_ID .

Click here to learn how to generate a monitoring report from this

Example 8: Functional Dependency Value Rule (1 Condition, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second property has a certain value.
Notional Example: In an address data set, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify the dependency between two property values, you must perform the following steps:

  1. Define a condition under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_1
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 1"^^xsd:string ;
      dqm:equals "USA"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" then the country must be     
                         \"USA\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .

Click here to learn how to generate a monitoring report from this

Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second and third property have specific values.
Notional Example: In an address data set, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify the dependency between three property values, you must perform the following steps:

  1. Define both conditions under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_2
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 2"^^xsd:string ;
      dqm:equals "NY"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:hasCondition2 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" and the country value is 
                         \"USA\" then the state must be \"NY\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 10: Expiry Rule (OWL DL)

Task: Specify that instances of a specific class expire.
Notional Example: In a product data set, the class foo:Product has instances with product offerings that expire on a certain date which is specified via the property foo:validThrough.
DQ-Problem: dqm:OutdatedInstance
Dimension: dqm:Timeliness
Requirement Type: dqm:ClassRequirement

You can specify that instances of a class have an expiry date by creating an instance of class dqm:ExpiryRule, e.g. as follows:

foo:ExpiryRule_1
      a       dqm:ExpiryRule ;
      rdfs:label "Expiry rule 1"^^xsd:string ;
      dqm:testedClass foo:Class_Product ;
      dqm:testedProperty1 foo:Prop_Product_validThrough .

Click here to learn how to generate a monitoring report from this

Example 11: Update Rule (OWL DL)

Task: Specify that instances of a specific class must be updated within a specified interval.
Notional Example: In a location data set, the class foo:Location has instances with address data that have timestamps of the last update. The instances shall not be elder than 1 year 2 months 3 days 5 hours 20 minutes and 30.123 seconds.
DQ-Problem: dqm:OutdatedInstance
Dimension: dqm:Timeliness
Requirement Type: dqm:ClassRequirement

You can specify a required update interval for instances of a specific property by creating an instance of class dqm:UpdateRule, e.g. as follows:

foo:UpdateRule_Location
      a       dqm:UpdateRule ;
      rdfs:label "Update rule Location"^^xsd:string ;
      dqm:expectedUpdateInterval "P1Y2M3DT5H20M30.123S"^^xsd:duration ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_timestamp .

NOTE: The tested class must have a property that holds the time of the last update in order to be able to specify this requirement.

Click here to learn how to generate a monitoring report from this

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox