Quality Rules Concept¶
An important part of key data management is quality control of the data. Unidata system has several quality control tools, each of them is used to solve certain problems.
The main tool is Data Quality Rule. The general idea of quality rules is to check the attribute values of records for compliance with the specified conditions. If the attribute value does not meet the conditions, then a quality error is created, which should be corrected.
Quality rules are configured for each entity/lookup entity attribute separately. When a rule is created, you need to specify its name (unique for each rule), a description, the conditions under which it is triggered, and for which source systems it will be applied.
Data Processing Functions¶
Each quality rule is based on a data processing function. A function takes data as input, processes it in a certain way, and outputs parameters.
Standard: preset functions that come with the product.
Custom: created by a customer for personal purposes.
Simple functions: performing an action consisting of one step (e.g., convert text to uppercase).
Composite: consisting of 2 or more steps, usually created from several simple functions (e.g., search through text for one of several keywords and add a special prefix to a particular word).
Quality Rule Modes¶
Each quality rule may work in one of three modes. When developing the data processing functions, you should check the mode in which the quality rule for which you are creating the function will work.
“Validation”. Checking of the attribute’s value for compliance with the specified rules. The purpose of validation is to make sure that the attribute’s value is correct, so an error will be made on the basis of the check (See the note below). Example of validation: whether the attribute “Phone number” is filled in. If not, an error is generated explaining the problem.
“Enrichment”. Changing data while the function is running. The purpose of enrichment is to transform the data to be filled in unified, or to complete it with new values. When enrichment is activated, saving a record leads to creating a new or updated original record. Examples of enrichment: 1) automatic conversion of the first character from lowercase to uppercase for the “Name” attribute; 2) filling the “Phone number” attribute from the “Comment” attribute contents.
“Validation + enrichment”. Combination of two types. If there is a validation error, it will also be impossible to save the record (See the note below).
Each quality rule contains critical categories. Criticality allows you to determine how serious an error is, which will affect the formation of an entity statistics, and will help data steward to prioritize the errors.
An order of rules in an entity/lookup entity is important. The first quality rule will be executed first, the second the next, and so on. Changing the order of the rules in the user interface is available.
Saved quality rules can be tested on existing or custom (abstract) records.
Important
The publication of a record with validation errors depends on the critical category of the error. At the RED level of the critical and in the Origin execution phase, it is not possible to save a record with a validation error. In all other cases, for example, at the RED criticality level and the basic execution phase, a record with a validation error will be published.
Recommended rules for data enrichment:
Remove extra spaces, leaving only one space between words;
Convert text to uppercase;
Convert double hyphens into single;
Remove space between a hyphen and a word.
How Quality Rules Work¶
After quality rules are created, they are activated by changes in entities/lookup entities.
An algorithm to activate quality rules depends on pipelines settings. The pipeline of quality rules describes at which point the quality rules are checked and at which point errors are shown to a user.
Example of the algorithm:
When you try to publish a record’s draft, these conditions will be checked:
If there are no quality rule errors: publication is successful.
If an error is found for a validation rule: publication is may be rejected (depends on critical level and execution phase).
If an error is found for a rule with enrichment: the data in the record will be changed according to the validation rules.
If an error is found for a validation and enrichment rule: the validation actions may not allow the record to be published (depends on critical level and execution phase). The enrichment actions will change the data according to the validation rules.
When you create a rule, you specify its name (unique for each rule), a description, the conditions under which it is started, and for which source systems it will be applied.
The algorithm is also influenced by:
Execution context. It is used to limit the action of the data processing function used by the quality rule to certain attributes. Possible options: global or local.
Rule execution mode. It is used to select actions with quality errors found. Possible options: validation (checking attribute’s value for compliance with specified rules), enrichment (changing data during function running), validation + enrichment.
Example of Usage¶
After loading data from several sources, it became clear that records from each source had their own rules for filling in the attributes.
In the first source, all data were entered in upper case.
In the second source, the “Comment” attribute was mandatory and often was filled in with empty data.
In the third source, the “Contact person” attribute was represented by two attributes: “Contact person’s name” and “Contact person’s phone number”.
To bring the data into a single rule, the data administrator must do the following procedure:
Create a data quality rule that contains information about the rule mode, source system, data processing function used, etc.
Place the rule in an existing or a new rule set.
Assign the set to an entity/lookup entity.
You may also need to create your own data processing function.
The system administrator should also configure pipelines to specify at what point of data processing the rules should be triggered. For example, enrichment rules can be triggered after a record is saved, and validation rules can be triggered before an attempt to save.
The data operator will get data quality errors while performing actions that have been foreseen by the system administrator and the data administrator.