Quality Rules Concept

An important part of key data management is quality control of the data. Unidata platform has several quality control tools, each of them is used to solve certain problems.

The main tool is data quality rule. The general idea of quality rules is to check the attribute values of records for compliance with the specified conditions. If the attribute value does not meet the conditions, then a quality error is created, which should be corrected.

Quality rules are configured for each entity/lookup entity separately. When a rule is created, you need to specify its name (unique for each rule), a description, the conditions under which it is triggered, and for which data sources it will be applied.

Each quality rule is based on a data processing function. A function takes data as input, processes it in a certain way, and outputs parameters.

  • Functions are divided into standard and user functions, simple and composite. Standard: preset functions that come with the product. Custom: created by a customer for personal purposes. Simple functions: performing an action consisting of one step. Composite: consisting of 2 or more steps, usually created from several simple functions.
  • Examples of functions. Simple: convert text to uppercase. Composite: search through text for one of several keywords and add a special prefix to a particular word.

Each quality rule may work in one of three modes. When developing the data processing functions, you should check the mode in which the quality rule for which you are creating the function will work.

  • Validation”. Checking of the attribute's value for compliance with the specified rules. The purpose of validation is to make sure that the attribute's value is correct, so an error will be made on the basis of the check. It is not possible to save a record with a validation error. Example of validation: whether the attribute “Phone number” is filled in. If not, an error is generated explaining the problem.
  • Enrichment”. Changing data while the function is running. The purpose of enrichment is to transform the data to be filled in unified, or to complete it with new values. When enrichment is activated, saving a record leads to creating a new or updated original record. Examples of enrichment: 1) automatic conversion of the first character from lowercase to uppercase for the “Name” attribute; 2) filling the “Phone number” attribute from the “Comment” attribute contents.
  • Validation + enrichment”. Combination of two types. If there is a validation error, it will also be impossible to save the record.

Each quality rule contains critical categories. Criticality allows you to determine how serious an error is, which will affect the formation of an entity statistics, and will help data steward to prioritize the errors.

An order of rules in an entity/lookup entity is important. The first quality rule will be executed first, the second the next, and so on. Changing the order of the rules in the user interface is available.

Saved quality rules can be tested on existing or custom (abstract) records.

Recommended rules for data enrichment:

  • Remove extra spaces, leaving only one space between words;
  • Convert text to uppercase;
  • Convert double hyphens into single;
  • Remove space between a hyphen and a word.

Example of Usage

After loading data from several sources, it became clear that records from each source had their own rules for filling in the attributes.

  • In the first source, all data were entered in upper case.
  • In the second source, the “Comment” attribute was mandatory and often was filled in with empty data.
  • In the third source, the “Contact person” attribute was represented by two attributes: “Contact person's name” and “Contact person's phone number”.

To bring the data into a single rule, the data administrator must do the following procedure:

  • Create a data quality rule that contains information about the rule mode, source system, data processing function used, etc.
  • Place the rule in an existing or a new rule set.
  • Assign the set to an entity/lookup entity.
  • You may also need to create your own data processing function.

The system administrator should also configure pipelines to specify at what point of data processing the rules should be triggered. For example, enrichment rules can be triggered after a record is saved, and validation rules can be triggered before an attempt to save.

The data operator will get data quality errors while performing actions that have been foreseen by the system administrator and the data administrator.