Create Data Requirements
What are Data Requirements?
Data requirements are prescribed directives or consensual agreements that define the content and/or structure that constitute high quality data instances and values. Data requirements can thereby be stated by several different individuals or groups of individuals. Moreover, data requirments may also be based on laws, standards, or other directives. They may be agreed upon or contrary to each other.
However, data requirements are required as a prerequisite to measure data quality. Hence, they serve as a benchmark that defines the desired state of data. In the following, we describe how you can express your data requirements via the DQM-Vocabulary.
Types of Data Requirements
Data requirements usually refer to different data items. When we look at a table we usually have at least four types of data items, (1) columns, (2) rows, (3) schemata, and (4) the table/spreadsheet itself.
In Semantic Web environments, we can compare columns to properties, rows to instances, schemata to ontologies, and tables to classes. Data requirements can usually be related to one of these elements. In particular, there are
- data requirements related to the values of a single property (column)
- data requirements related to the values of multiple properties within an instance (multiple columns in a row)
- data requirements related to the instances of a whole class (table)
- data requirements related to the ontology elements (schema)
With the DQM-Vocabulary, you can model the first three types of requirements. Schema/ontology requirements are currently not part of the vocabulary, but may be added in future releases. In the following, we explain how Property-, Multi-Property-, Class-, and Custom-Requirements can be modelled with the current version of the DQM-Vocabulary.
Define Tested Elements
Before you can use your data with the DQM-Vocabulary, you have to declare the elements of your ontology that shall be used in the DQM-Vocabulary. You have two options to do this with impact on decidablility of potential reasoning with your knowledge base:
Design Option 1: Classes and Properties as Instances (OWL Full)
Classes and properties that shall be tested for data requirement violations are defined as direct instances of the classes dqm:TestedClass or dqm:TestedProperty.
foo:MyClass a dqm:TestedClass
foo:MyProperty a dqm:TestedProperty
Attention: This will make your knowledge base OWL Full which is potentially not useful if you plan to use reasoning.
Design Option 2: Mapping of Classes and Properties to new URIs (OWL DL)
Classes and properties that shall be tested for data requirement violations are mapped to new instances of the classes dqm:TestedClass and dqm:TestedProperty.
foo:Class_1 a dqm:TestedClass dqm:hasURI "http://www.example.org/MyClass"^^xsd:anyURI
foo:Property_1 a dqm:TestedProperty dqm:hasURI "http://www.example.org/MyProperty"^^xsd:anyURI
Sample Dataset used in Examples
The following examples use classes and properties from our sample dataset as dqm:TestedClass and dqm:TestedProperties. Requirements specified in OWL DL will use the mapped instances, while requirements specified in OWL Full will use the original classes and properties. The data set contains the following classes:
Original Class | Mapped Instance |
---|---|
foo:Location | foo:Class_Location |
...with the following datatype properties:
Original Property | Mapped Instance |
---|---|
foo:LOCID | foo:Prop_Location_ID |
foo:STREET | foo:Prop_Location_Street |
foo:STREETNO | foo:Prop_Location_Streetno |
foo:ZIP | foo:Prop_Location_ZIP |
foo:CITY | foo:Prop_Location_City |
foo:COUNTRY | foo:Prop_Location_Country |
foo:STATE | foo:Prop_Location_State |
foo:validThrough | foo:Prop_validThrough |
In order to apply the example to your own data, you need to exchange the sample classes and properties used in the data requirements by your own.
Syntax Of Examples
The following examples show instance data in Turtle/Notation 3 syntax.
Examples of Data Requirements
Example 1: Property Completeness Rule
Task: | Specify that a specific property and/or its values must exist for all instances of a specific class. |
Notional Example: | In a location data set, the property foo:COUNTRY must exist and have a value in all instances of the class foo:Location. |
DQ-Problem: | dqm:MissingPropertyAndValue dqm:MissingProperty dqm:MissingValue |
Dimension: | dqm:PropertyCompleteness |
Requirement Type: | dqm:MultiPropertyRequirement |
If you defined your data elements in OWL Full (Option 1), then you can simply use the URIs of your ontology in the definition of the Property Completeness Rule as follows:
Definition in OWL Full
foo:PropertyCompletenessRule_1 a dqm:PropertyCompletenessRule ; dqm:testedClass http://www.example.org/MyClass ; dqm:testedProperty1 http://www.example.org/MyProperty ; dqm:requiredProperty "true"^^xsd:boolean ; dqm:requiredValue "true"^^xsd:boolean .
Click here to learn how to generate a monitoring report from this
The property dqm:requiredProperty specifies that the property "MyProperty" must exist in each instance. The property dqm:requiredValue specifies that a value must exist for property "MyProperty".
If you mapped your own ontology elements to new URIs (Option 2, OWL DL), then the following example will help you to define a Property Completeness Rule:
Definition in OWL-DL
foo:PropertyCompletenessRule_1 a dqm:PropertyCompletenessRule ; dqm:testedClass foo:Class_1 ; dqm:testedProperty1 foo:Property_1 ; dqm:requiredProperty "true"^^xsd:boolean ; dqm:requiredValue "true"^^xsd:boolean .
Click here to learn how to generate a monitoring report from this
The property dqm:requiredProperty specifies that the property "MyProperty" which is mapped to "foo:Property_1" must exist in each instance of the class "MyClass" which is mapped to "foo:Class_1". The property dqm:requiredValue specifies that also a value must exist for property "foo:Property_1".
Example 2: Conditional Property Completeness Rule (1 Condition, OWL DL)
Task: | Specify that a specific property and/or its values must exist if another property obtains a specific value. |
Notional Example: | In a location data set, the property foo:STATE must exist and have a value in all instances of the class foo:Location that have value "USA" for the property foo:COUNTRY. |
DQ-Problem: | dqm:MissingPropertyAndValue |
Dimension: | dqm:PropertyCompleteness |
Requirement Type: | dqm:MultiPropertyRequirement |
To define this data requirement, you must perform the following two steps:
1. Define the Condition
foo:Condition_USA a dqm:Condition ; rdfs:label "Condition USA"^^xsd:string ; dqm:conditionalProperty foo:Prop_Location_Country ; dqm:equals "USA"^^xsd:string .
2. Define the Conditional Property Completeness Rule
foo:ConditionalPropertyCompletenessRule_State a dqm:ConditionalPropertyCompletenessRule ; rdfs:label "Conditional property completeness rule State"^^xsd:string ; dqm:hasCondition1 foo:Condition_USA ; dqm:requiredProperty "true"^^xsd:boolean ; dqm:requiredValue "true"^^xsd:boolean ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_State .
Click here to learn how to generate a monitoring report from this
Example 3: Syntax Rule (OWL DL)
Task: | Specify that values of a specific property must obtain a specific syntax. |
Notional Example: | In a location data set, the property foo:ZIP must contain values with exactly five digits. |
DQ-Problem: | dqm:SyntaxViolation |
Dimension: | dqm:SyntacticAccuracy |
Requirement Type: | dqm:PropertyRequirement |
You can specify syntax requirements by creating an instance of the class dqm:SyntaxRule, e.g. as follows:
foo:SyntaxRule_ZIP a dqm:SyntaxRule ; rdfs:label "Syntax rule ZIP"^^xsd:string ; dqm:regex "^[0-9]{5}$"^^xsd:string ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Property_ZIP .
Click here to learn how to generate a monitoring report from this
Example 4: Conditional Syntax Rule (1 Condition, OWL DL)
Task: | Specify that values of a specific property must obtain a specific syntax if another property obtains a specific value. |
Notional Example: | In a location data set, the property foo:STATE must contain a value with two letters if the property foo:COUNTRY has the value "USA". |
DQ-Problem: | dqm:SyntaxViolation |
Dimension: | dqm:SyntacticAccuracy |
Requirement Type: | dqm:MultiPropertyRequirement |
In order to specify a conditional syntax rule, you must perform the following steps:
1.Define Condition
foo:Condition_USA a dqm:Condition ; rdfs:label "Condition USA"^^xsd:string ; dqm:conditionalProperty foo:Prop_Location_Country ; dqm:equals "USA"^^xsd:string .
2.Define Conditional Syntax Rule
foo:ConditionalSyntaxRule_State a dqm:ConditionalSyntaxRule ; rdfs:label "Conditional syntax rule State"^^xsd:string ; dqm:hasCondition1 foo:Condition_USA ; dqm:regex "^[A-Z]{2}$"^^xsd:string ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_State .
Click here to learn how to generate a monitoring report from this
Example 5: Legal Value Range Rule (OWL DL)
Task: | Specify valid value ranges for properties that hold numeric values. |
Notional Example: | In a product data set, the property foo:PRICE can never contain negative values. |
DQ-Problem: | dqm:OutOfRangeValue |
Dimension: | dqm:SyntacticAccuracy |
Requirement Type: | dqm:PropertyRequirement |
You can specify a legal value range for a property by adding the following instance data to the class dqm:LegalValueRangeRule:
foo:LegalValueRangeRule_Price a dqm:LegalValueRangeRule ; rdfs:label "Legal value range rule Price"^^xsd:string ; dqm:lowerLimit "0.00"^^xsd:float ; dqm:testedClass foo:Class_Product ; dqm:testedProperty1 foo:Prop_Product_Price .
The class dqm:LegalValueRangeRule has the special properties
to specify the lowest and/or highest allowed value.
Click here to learn how to generate a monitoring report from this
Example 6: Legal Value Rule (OWL DL)
Task: | Specify a reference property that holds the allowed values. |
Notional Example: | In a location data set, the property foo:COUNTRY can only contain values of the trusted property foo:legalValue in the trusted class foo:LegalValueCountry. |
DQ-Problem: | dqm:IllegalValue |
Dimension: | dqm:SyntacticAccuracy |
Requirement Type: | dqm:PropertyRequirement |
In order to specify legal values for a specific property, you need to perform the following steps:
- Create a reference data set or use an existing data set that holds the legal values
- Specify the class and property which hold the legal values as dqm:TrustedClass and dqm:TrustedProperty
- Create an instance of dqm:LegalValueRule, e.g. as follows:
foo:LegalValueRule_Country a dqm:LegalValueRule ; rdfs:label "Legal value rule Country"^^xsd:string ; dqm:referenceClass foo:TrustedClass_LegalValueCountry ; dqm:referenceProperty1 foo:TrustedProperty_LegalValue ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_Country .
Click here to learn how to generate a monitoring report from this
Example 7: Unique Value Rule (OWL DL)
Task: | Specify that values of a property must be unique. |
Notional Example: | In a location data set, the property foo:LOCID of class foo:Location must only contain unique values. |
DQ-Problem: | dqm:UniquenessViolation |
Dimension: | dqm:PropertyUniqueness |
Requirement Type: | dqm:PropertyRequirement |
You can specify that values of a property must be unique by creating an instance of the class dqm:UniqueValueRule:
foo:UniqueValueRule_LOCID a dqm:UniqueValueRule ; rdfs:label "Unique value rule LOCID"^^xsd:string ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_ID .
Click here to learn how to generate a monitoring report from this
Example 8: Functional Dependency Value Rule (1 Condition, OWL DL)
Task: | Specify that a value of one property must obtain a specific value if a second property has a certain value. |
Notional Example: | In an address data set, the city name "New York" must always have the value "USA" for the property foo:COUNTRY. |
DQ-Problem: | dqm:FunctionalDependencyViolation |
Dimension: | dqm:SemanticAccuracy |
Requirement Type: | dqm:MultiPropertyRequirement |
In order to specify the dependency between two property values, you must perform the following steps:
- Define a condition under which a specific value is always required.
- Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_1 a dqm:FuncDepValueRule ; rdfs:label "Func dep value rule 1"^^xsd:string ; dqm:equals "USA"^^xsd:string ; dqm:hasCondition1 foo:Condition_New_York ; dqm:reqDescription "If the city value is \"New York\" then the country must be \"USA\"."^^xsd:string ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_Country .
Click here to learn how to generate a monitoring report from this
Example 9: Functional Dependency Value Rule (2 Conditions, OWL DL)
Task: | Specify that a value of one property must obtain a specific value if a second and third property have specific values. |
Notional Example: | In an address data set, the city "New York" in the country "USA" must always have the value "NY" for the property foo:STATE. |
DQ-Problem: | dqm:FunctionalDependencyViolation |
Dimension: | dqm:SemanticAccuracy |
Requirement Type: | dqm:MultiPropertyRequirement |
In order to specify the dependency between three property values, you must perform the following steps:
- Define both conditions under which a specific value is always required.
- Create an instance of the class dqm:FuncDepValueRule, e.g. as follows:
foo:FuncDepValueRule_2 a dqm:FuncDepValueRule ; rdfs:label "Func dep value rule 2"^^xsd:string ; dqm:equals "NY"^^xsd:string ; dqm:hasCondition1 foo:Condition_USA ; dqm:hasCondition2 foo:Condition_New_York ; dqm:reqDescription "If the city value is \"New York\" and the country value is \"USA\" then the state must be \"NY\"."^^xsd:string ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_State .
Click here to learn how to generate a monitoring report from this
Example 10: Expiry Rule (OWL DL)
Task: | Specify that instances of a specific class expire. |
Notional Example: | In a product data set, the class foo:Product has instances with product offerings that expire on a certain date which is specified via the property foo:validThrough. |
DQ-Problem: | dqm:OutdatedInstance |
Dimension: | dqm:Timeliness |
Requirement Type: | dqm:ClassRequirement |
You can specify that instances of a class have an expiry date by creating an instance of class dqm:ExpiryRule, e.g. as follows:
foo:ExpiryRule_1 a dqm:ExpiryRule ; rdfs:label "Expiry rule 1"^^xsd:string ; dqm:testedClass foo:Class_Product ; dqm:testedProperty1 foo:Prop_Product_validThrough .
Click here to learn how to generate a monitoring report from this
Example 11: Update Rule (OWL DL)
Task: | Specify that instances of a specific class must be updated within a specified interval. |
Notional Example: | In a location data set, the class foo:Location has instances with address data that have timestamps of the last update. The instances shall not be elder than 1 year 2 months 3 days 5 hours 20 minutes and 30.123 seconds. |
DQ-Problem: | dqm:OutdatedInstance |
Dimension: | dqm:Timeliness |
Requirement Type: | dqm:ClassRequirement |
You can specify a required update interval for instances of a specific property by creating an instance of class dqm:UpdateRule, e.g. as follows:
foo:UpdateRule_Location a dqm:UpdateRule ; rdfs:label "Update rule Location"^^xsd:string ; dqm:expectedUpdateInterval "P1Y2M3DT5H20M30.123S"^^xsd:duration ; dqm:testedClass foo:Class_Location ; dqm:testedProperty1 foo:Prop_Location_timestamp .
NOTE: The tested class must have a property that holds the time of the last update in order to be able to specify this requirement.
Click here to learn how to generate a monitoring report from this