Create Data Requirements

From SemWebQuality.org
Revision as of 12:28, 28 March 2012 by Cfuerber (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents


What are Data Requirements?

Data requirements are prescribed directives or consensual agreements that define the content and/or structure that constitute high quality data instances and values. Data requirements can thereby be stated by several different individuals or groups of individuals. Moreover, data requirments may also be based on laws, standards, or other directives. They may be agreed upon or contrary to each other.

However, data requirements are required as a prerequisite to measure data quality. Hence, they serve as a benchmark that defines the desired state of data. In the following, we describe how you can express your data requirements via the DQM-Vocabulary.

Types of Data Requirements

Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself.

Table to illustrate used terminology

In Semantic Web environments, we can compare columns to properties, rows to instances, schemata to ontologies, and tables to classes. Data requirements can usually be related to one of these elements. In particular, there are

  1. data requirements related to the values of a single property (column)
  2. data requirements related to the values of multiple properties within an instance (multiple columns in a row)
  3. data requirements related to the instances of a whole class (table)
  4. data requirements related to the ontology elements (schema)

With the DQM-Vocabulary, you can model the first three types of requirements. Schema/ontology requirements are currently not part of the vocabulary, but may be added in future releases. In the following, we explain how Property-, Multi-Property-, Class-, and Custom-Requirements can be modelled with the current version of the DQM-Vocabulary.

Define Tested Elements

Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:

Design Option 1: Classes and Properties as Instances (OWL Full)

Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes dqm:TestedClass or dqm:TestedProperty.

foo:MyClass a dqm:TestedClass
foo:MyProperty a dqm:TestedProperty

Attention: This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.

Design Option 2: Mapping of Classes and Properties to new URIs (OWL DL)

Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes dqm:TestedClass and dqm:TestedProperty.

foo:Class_1 a dqm:TestedClass
                dqm:hasURI "http://www.example.org/MyClass"^^xsd:anyURI
foo:Property_1 a dqm:TestedProperty
                dqm:hasURI "http://www.example.org/MyProperty"^^xsd:anyURI

Sample Dataset used in Examples

The following examples use classes and properties from our sample dataset as dqm:TestedClass and dqm:TestedProperties. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:

Original Class Mapped Instance
foo:Location foo:Class_Location

...with the following datatype properties:

Original Property Mapped Instance
foo:LOCID foo:Prop_Location_ID
foo:STREET foo:Prop_Location_Street
foo:STREETNO foo:Prop_Location_Streetno
foo:ZIP foo:Prop_Location_ZIP
foo:CITY foo:Prop_Location_City
foo:COUNTRY foo:Prop_Location_Country
foo:STATE foo:Prop_Location_State
foo:validThrough foo:Prop_validThrough

In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.

Syntax Of Examples

The following examples show instance data in Turtle/Notation 3 syntax.

Examples of Data Requirements

Example 1: Property Completeness Rule

Task: Specify that a specific property and/or its values must exist for all instances of a specific class.
Notional Example: In a location data set, the property foo:COUNTRY must exist and have a value in all instances of the class foo:Location.
DQ-Problem: dqm:MissingPropertyAndValue dqm:MissingProperty dqm:MissingValue
Dimension: dqm:PropertyCompleteness
Requirement Type: dqm:MultiPropertyRequirement

If you defined your data elements in OWL Full (Option 1), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:


Definition in OWL Full

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass http://www.example.org/MyClass ;
      dqm:testedProperty1 http://www.example.org/MyProperty ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a monitoring report from this

The property dqm:requiredProperty specifies that the property "MyProperty" must exist in each instance. The property dqm:requiredValue specifies that a value must exist for property "MyProperty".

If you mapped your own ontology elements to new URIs (Option 2, OWL DL), then the following example will help you to define a Property Completeness Rule:


Definition in OWL-DL

foo:PropertyCompletenessRule_1
      a       dqm:PropertyCompletenessRule ;
      dqm:testedClass foo:Class_1 ;
      dqm:testedProperty1 foo:Property_1 ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean .

Click here to learn how to generate a monitoring report from this

The property dqm:requiredProperty specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property dqm:requiredValue specifies that also a value must exist for property "foo:Property_1".

Example 2: Conditional Property Completeness Rule (1 Condition, OWL DL)

Task: Specify that a specific property and/or its values must exist if another property obtains a specific value.
Notional Example: In a location data set, the property foo:STATE must exist and have a value in all instances of the class foo:Location that have value "USA" for the property foo:COUNTRY.
DQ-Problem: dqm:MissingPropertyAndValue
Dimension: dqm:PropertyCompleteness
Requirement Type: dqm:MultiPropertyRequirement

To define this data requirement, you must perform the following two steps:

1. Define the Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2. Define the Conditional Property Completeness Rule

foo:ConditionalPropertyCompletenessRule_State
      a       dqm:ConditionalPropertyCompletenessRule ;
      rdfs:label "Conditional property completeness rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:requiredProperty "true"^^xsd:boolean ;
      dqm:requiredValue "true"^^xsd:boolean ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 3: Syntax Rule (OWL DL)

Task: Specify that values of a specific property must obtain a specific syntax.
Notional Example: In a location data set, the property foo:ZIP must contain values with exactly five digits.
DQ-Problem: dqm:SyntaxViolation
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

You can specify syntax requirements by creating an instance of the class dqm:SyntaxRule, e.g. as follows:

foo:SyntaxRule_ZIP
      a       dqm:SyntaxRule ;
      rdfs:label "Syntax rule ZIP"^^xsd:string ;
      dqm:regex "^[0-9]{5}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Property_ZIP .

Click here to learn how to generate a monitoring report from this

Example 4: Conditional Syntax Rule (1 Condition, OWL DL)

Task: Specify that values of a specific property must obtain a specific syntax if another property obtains a specific value.
Notional Example: In a location data set, the property foo:STATE must contain a value with two letters if the property foo:COUNTRY has the value "USA".
DQ-Problem: dqm:SyntaxViolation
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify a conditional syntax rule, you must perform the following steps:

1.Define Condition

foo:Condition_USA
      a       dqm:Condition ;
      rdfs:label "Condition USA"^^xsd:string ;
      dqm:conditionalProperty foo:Prop_Location_Country ;
      dqm:equals "USA"^^xsd:string .

2.Define Conditional Syntax Rule

foo:ConditionalSyntaxRule_State
      a       dqm:ConditionalSyntaxRule ;
      rdfs:label "Conditional syntax rule State"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:regex "^[A-Z]{2}$"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 5: Legal Value Range Rule (OWL DL)

Task: Specify valid value ranges for properties that hold numeric values.
Notional Example: In a product data set, the property foo:PRICE can never contain negative values.
DQ-Problem: dqm:OutOfRangeValue
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

You can specify a legal value range for a property by adding the following instance data to the class dqm:LegalValueRangeRule:

foo:LegalValueRangeRule_Price
      a       dqm:LegalValueRangeRule ;
      rdfs:label "Legal value range rule Price"^^xsd:string ;
      dqm:lowerLimit "0.00"^^xsd:float ;
      dqm:testedClass foo:Class_Product ;
      dqm:testedProperty1 foo:Prop_Product_Price .

The class dqm:LegalValueRangeRule has the special properties

to specify the lowest and/or highest allowed value.

Click here to learn how to generate a monitoring report from this

Example 6: Legal Value Rule (OWL DL)

Task: Specify a reference property that holds the allowed values.
Notional Example: In a location data set, the property foo:COUNTRY can only contain values of the trusted property foo:legalValue in the trusted class foo:LegalValueCountry.
DQ-Problem: dqm:IllegalValue
Dimension: dqm:SyntacticAccuracy
Requirement Type: dqm:PropertyRequirement

In order to specify legal values for a specific property, you need to perform the following steps:

  1. Create a reference data set or use an existing data set that holds the legal values
  2. Specify the class and property which hold the legal values as dqm:TrustedClass and dqm:TrustedProperty
  3. Create an instance of dqm:LegalValueRule, e.g. as follows:
foo:LegalValueRule_Country
      a       dqm:LegalValueRule ;
      rdfs:label "Legal value rule Country"^^xsd:string ;
      dqm:referenceClass foo:TrustedClass_LegalValueCountry ;
      dqm:referenceProperty1 foo:TrustedProperty_LegalValue ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .

Click here to learn how to generate a monitoring report from this

Example 7: Unique Value Rule (OWL DL)

Task: Specify that values of a property must be unique.
Notional Example: In a location data set, the property foo:LOCID of class foo:Location must only contain unique values.
DQ-Problem: dqm:UniquenessViolation
Dimension: dqm:PropertyUniqueness
Requirement Type: dqm:PropertyRequirement

You can specify that values of a property must be unique by creating an instance of the class dqm:UniqueValueRule:

foo:UniqueValueRule_LOCID
      a       dqm:UniqueValueRule ;
      rdfs:label "Unique value rule LOCID"^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_ID .

Click here to learn how to generate a monitoring report from this

Example 8: Functional Dependency Value Rule (1 Condition, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second property has a certain value.
Notional Example: In an address data set, the city name "New York" must always have the value "USA" for the property foo:COUNTRY.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify the dependency between two property values, you must perform the following steps:

  1. Define a condition under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_1
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 1"^^xsd:string ;
      dqm:equals "USA"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" then the country must be     
                         \"USA\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_Country .

Click here to learn how to generate a monitoring report from this

Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)

Task: Specify that a value of one property must obtain a specific value if a second and third property have specific values.
Notional Example: In an address data set, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE.
DQ-Problem: dqm:FunctionalDependencyViolation
Dimension: dqm:SemanticAccuracy
Requirement Type: dqm:MultiPropertyRequirement

In order to specify the dependency between three property values, you must perform the following steps:

  1. Define both conditions under which a specific value is always required.
  2. Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_2
      a       dqm:FuncDepValueRule ;
      rdfs:label "Func dep value rule 2"^^xsd:string ;
      dqm:equals "NY"^^xsd:string ;
      dqm:hasCondition1 foo:Condition_USA ;
      dqm:hasCondition2 foo:Condition_New_York ;
      dqm:reqDescription "If the city value is \"New York\" and the country value is 
                         \"USA\" then the state must be \"NY\"."^^xsd:string ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_State .

Click here to learn how to generate a monitoring report from this

Example 10: Expiry Rule (OWL DL)

Task: Specify that instances of a specific class expire.
Notional Example: In a product data set, the class foo:Product has instances with product offerings that expire on a certain date which is specified via the property foo:validThrough.
DQ-Problem: dqm:OutdatedInstance
Dimension: dqm:Timeliness
Requirement Type: dqm:ClassRequirement

You can specify that instances of a class have an expiry date by creating an instance of class dqm:ExpiryRule, e.g. as follows:

foo:ExpiryRule_1
      a       dqm:ExpiryRule ;
      rdfs:label "Expiry rule 1"^^xsd:string ;
      dqm:testedClass foo:Class_Product ;
      dqm:testedProperty1 foo:Prop_Product_validThrough .

Click here to learn how to generate a monitoring report from this

Example 11: Update Rule (OWL DL)

Task: Specify that instances of a specific class must be updated within a specified interval.
Notional Example: In a location data set, the class foo:Location has instances with address data that have timestamps of the last update. The instances shall not be elder than 1 year 2 months 3 days 5 hours 20 minutes and 30.123 seconds.
DQ-Problem: dqm:OutdatedInstance
Dimension: dqm:Timeliness
Requirement Type: dqm:ClassRequirement

You can specify a required update interval for instances of a specific property by creating an instance of class dqm:UpdateRule, e.g. as follows:

foo:UpdateRule_Location
      a       dqm:UpdateRule ;
      rdfs:label "Update rule Location"^^xsd:string ;
      dqm:expectedUpdateInterval "P1Y2M3DT5H20M30.123S"^^xsd:duration ;
      dqm:testedClass foo:Class_Location ;
      dqm:testedProperty1 foo:Prop_Location_timestamp .

NOTE: The tested class must have a property that holds the time of the last update in order to be able to specify this requirement.

Click here to learn how to generate a monitoring report from this

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox