Create Data Requirements

From SemWebQuality.org
(Difference between revisions)
Jump to: navigation, search
(Example 7: Unique Value Rule (OWL DL))
(Example 8: Functional Dependency Value Rule (1 Condition, OWL DL))
Line 306: Line 306:
  
 
{|class="wikitable"
 
{|class="wikitable"
|'''Task:'''||Specify that a value of property A within an instance.
+
|'''Task:'''||Specify that a value of one property must obtain a specific value if a second property has a certain value.
 
|-
 
|-
 
|'''Notional Example:'''||In an address database, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
 
|'''Notional Example:'''||In an address database, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
Line 330: Line 330:
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedClass foo:Class_Location ;
 
       dqm:testedProperty1 foo:Prop_Location_Country .
 
       dqm:testedProperty1 foo:Prop_Location_Country .
 +
</pre>
 +
 +
 +
====Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)====
 +
 +
{|class="wikitable"
 +
|'''Task:'''||Specify that a value of one property must obtain a specific value if a second and third property have specific values.
 +
|-
 +
|'''Notional Example:'''||In an address database, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE.
 +
|-
 +
|'''DQ-Problem:'''|| [[dqm:FunctionalDependencyViolation]]
 +
|-
 +
|'''Dimension:'''||[[dqm:SemanticAccuracy]]
 +
|}
 +
 +
In order to specify the dependency between three property values, you must perform the following steps:
 +
 +
# Define both conditions under which a specific value is always required.
 +
# Create an instance of the class [[dqm:FuncDepValueRule]], e.g. as follows:
 +
 +
<pre>
 +
foo:FuncDepValueRule_2
 +
      a      dqm:FuncDepValueRule ;
 +
      rdfs:label "Func dep value rule 2"^^xsd:string ;
 +
      dqm:equals "NY"^^xsd:string ;
 +
      dqm:hasCondition1 foo:Condition_USA ;
 +
      dqm:hasCondition2 foo:Condition_New_York ;
 +
      dqm:reqDescription "If the city value is \"New York\" and the country value is
 +
                        \"USA\" then the state must be \"NY\"."^^xsd:string ;
 +
      dqm:testedClass foo:Class_Location ;
 +
      dqm:testedProperty1 foo:Prop_Location_State .
 
</pre>
 
</pre>
  

Revision as of 19:09, 20 October 2011

This site is currently under construction!

Contents


What are Data Requirements?

Data requirements are prescribed directives or consensual agreements that define the content and/or structure that constitute high quality data instances and values. Data requirements can thereby be stated by several different individuals or groups of individuals. Moreover, data requirments may also be based on laws, standards, or other directives. They may be agreed upon or contrary to each other.

However, data requirements are required as a prerequisite to measure data quality. Hence, they serve as a benchmark that defines the desired state of data. In the following, we describe how you can express your data requirements via the DQM-Vocabulary.

Types of Data Requirements

Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself.

Table to illustrate used terminology

In Semantic Web environments, we can compare columns to properties, rows to instances, schemata to ontologies, and tables to classes. Data requirements can usually be related to one of these elements. In particular, there are

  1. data requirements related to the values of a single property (column)
  2. data requirements related to the values of multiple properties within an instance (multiple columns in a row)
  3. data requirements related to the instances of a whole class (table)
  4. data requirements related to the ontology elements (schema)

With the DQM-Vocabulary, you can model the first three types of requirements. Schema/ontology requirements are currently not part of the vocabulary, but may be added in future releases. In the following, we explain how Property-, Multi-Property-, Class-, and Custom-Requirements can be modelled with the current version of the DQM-Vocabulary.

Define Tested Elements

Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:

Option 1: Classes and Properties as Instances (OWL Full)

Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes dqm:TestedClass or dqm:TestedProperty.

foo:MyClass a dqm:TestedClass
foo:MyProperty a dqm:TestedProperty

Attention: This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.

Option 2: Mapping of Classes and Properties to new URIs (OWL DL)

Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes dqm:TestedClass and dqm:TestedProperty.

foo:Class_1 a dqm:TestedClass
                dqm:hasURI "http://www.example.org/MyClass"^^xsd:anyURI
foo:Property_1 a dqm:TestedProperty
                dqm:hasURI "http://www.example.org/MyProperty"^^xsd:anyURI

Examples of Data Requirements

Sample Dataset used in Examples

The following examples use classes and properties from our sample dataset as dqm:TestedClass and dqm:TestedProperties. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:

Original Class Mapped Instance
foo:Location foo:Class_Location

...with the following datatype properties:

Original Property Mapped Instance
foo:LOCID foo:Prop_Location_ID
foo:STREET foo:Prop_Location_Street
foo:STREETNO foo:Prop_Location_Streetno
foo:ZIP foo:Prop_Location_ZIP
foo:CITY foo:Prop_Location_City
foo:COUNTRY foo:Prop_Location_Country
foo:STATE foo:Prop_Location_State

In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.

Type 1: Property Requirements

Property requirements are data requirements that are related to values of a single property. The DQM-Vocabulary provides the following property requirements:


Example 1: PropertyCompletenessRule (Minimal Input)

A property completeness rule is a data requirement that specifies that a certain property and/or its value must exist in all instances of a certain class.

If you defined your data elements in OWL Full (Option 1), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:


Definition in OWL Full

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass http://www.example.org/MyClass ;
      dqm:testedProperty1 http://www.example.org/MyProperty ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a problem report from this

The property dqm:requiredProperty specifies that the property "MyProperty" must exist in each instance. The property dqm:requiredValue specifies that a value must exist for property "MyProperty".

If you mapped your own ontology elements to new URIs (Option 2, OWL DL), then the following example will help you to define a Property Completeness Rule:


Definition in OWL-DL

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass foo:Class_1 ;
      dqm:testedProperty1 foo:Property_1 ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a problem report from this


The property dqm:requiredProperty specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property dqm:requiredValue specifies that also a value must exist for property "foo:Property_1".

Congratulations, you can use now generic SPARQL queries to test the completeness of "MyProperty" / "foo:Property_1" in instances of "MyClass" / "foo:Class_1".

Example 2: Conditional Property Completeness Rule

Requirement Description

In this example, we define that the property foo:location_STATE must exist and have a value in all instances of the class foo:Location that have the value "USA" for the property foo:location_COUNTRY.

Procedure

To define this data requirement, you must perform the following two steps:

1. Define the Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2. Define the Conditional Property Completeness Rule

foo:ConditionalPropertyCompletenessRule_State
      a       dqm:ConditionalPropertyCompletenessRule ;
      rdfs:label "Conditional property completeness rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a problem report from this

Example 2: PropertyCompletenessRule with Requirement-Metadata

You can annotate your data requirements with several meta-information, such as information about its provenance, its task-dependency, a natural language description, and how the requirement shall be used. Below you can see an example that makes extensive use of the DQM-Vocabulary regarding the specification of data requirements.

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean ;
      dqm:reqName "Country Completeness in Class Location"^^xsd:string ;
      dqm:reqDescription "Each instance of the class \"Location\" must have a property
                           value for the property \"Country\""^^xsd:string ;
      dqm:reqSource "Christian Fürber"^^xsd:string ;
      dqm:taskDependent "false"^^xsd:boolean ;
      dqm:assessment "true"^^xsd:boolean ;
      dqm:confidence "80"^^rdfs:Literal ;
      dqm:filtering "true"^^xsd:boolean ;
      dqm:validation "true"^^xsd:boolean ;
      dqm:importance "3" ;
      dqm:lastModified "2011-10-10T18:20:55.106+01:00"^^xsd:dateTime ;
      dqm:validFrom "2011-10-10T18:19:32.917+01:00"^^xsd:dateTime ;
      dqm:validUntil "2012-10-10T18:19:57.191+01:00"^^xsd:dateTime .

Example 3: Syntax Rule (OWL DL)

Defines that a value in the property mapped to "foo:Property_ZIP" must have five digits.

foo:SyntaxRule_ZIP
      a       dqm:SyntaxRule ;
      rdfs:label "Syntax rule ZIP"^^xsd:string ;
      dqm:regex "^[0-9]{5}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Property_ZIP .

Click here to learn how to generate a problem report from this

Example 4: Conditional Syntax Rule (OWL DL)

Scenario: Identify values in the property foo:STATE that do not have exactly 2 digits. The rule only applies for instances with the value "USA" in the property foo:COUNTRY.

1.Define Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2.Define Conditional Syntax Rule

foo:ConditionalSyntaxRule_State
      a       dqm:ConditionalSyntaxRule ;
      rdfs:label "Conditional syntax rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:regex "^[A-Z]{2}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Example 5: Legal Value Range Rule (OWL DL)

Task: Specify valid value ranges for properties that hold numeric values.
DQ-Problem: dqm:OutOfRangeValue
Dimension: dqm:SyntacticAccuracy

You can specify a legal value range for a property by adding the following instance data to the class dqm:LegalValueRangeRule:

foo:LegalValueRangeRule_Price
      a       dqm:LegalValueRangeRule ;
      rdfs:label "Legal value range rule Price"^^xsd:string ;
      dqm:lowerLimit "0.00"^^xsd:float ;
      dqm:testedClass foo:Class_Product ;
      dqm:testedProperty1 foo:Prop_Product_Price .

The class dqm:LegalValueRangeRule has the special properties

to specify the lowest and/or highest allowed value.

Example 6: Legal Value Rule (OWL DL)

Task: Specify a reference property that holds the allowed values.
DQ-Problem: dqm:IllegalValue
Dimension: dqm:SyntacticAccuracy

In order to specify legal values for a specific property, you need to perform the following steps:

  1. Create a reference data set or use an existing data set that holds the legal values
  2. Specify the class and property which hold the legal values as dqm:TrustedClass and dqm:TrustedProperty
  3. Create an instance of dqm:LegalValueRule, e.g. as follows:
foo:LegalValueRule_Country
      a       dqm:LegalValueRule ;
      rdfs:label "Legal value rule Country"^^xsd:string ;
      dqm:referenceClass foo:TrustedClass_LegalValueCountry ;
      dqm:referenceProperty1 foo:TrustedProperty_LegalValue ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .

Example 7: Unique Value Rule (OWL DL)

Task: Specify that values of a property must be unique.
DQ-Problem: dqm:UniquenessViolation
Dimension: dqm:PropertyUniqueness

You can specify that values of a property must be unique by creating an instance of the class dqm:UniqueValueRule:

foo:UniqueValueRule_LOCID
      a       dqm:UniqueValueRule ;
      rdfs:label "Unique value rule LOCID"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_ID .

Example 8: Functional Dependency Value Rule (1 Condition, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second property has a certain value.
Notional Example: In an address database, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy

In order to specify the dependency between two property values, you must perform the following steps:

  1. Define a condition under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_1
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 1"^^xsd:string ;
      dqm:equals "USA"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" then the country must be     
                         \"USA\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .


Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second and third property have specific values.
Notional Example: In an address database, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy

In order to specify the dependency between three property values, you must perform the following steps:

  1. Define both conditions under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_2
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 2"^^xsd:string ;
      dqm:equals "NY"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:hasCondition2 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" and the country value is 
                         \"USA\" then the state must be \"NY\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Type 2: Class Requirements

Type 3: Multi-Property Requirements

Type 4: Custom Requirements

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox