Work on Projects¶
This page describes the general logic of working on projects in Unidata system: from a project solution to data loading. The work is performed in several stages. The stages described below are recommended and reflect a balanced method of implementing the Unidata system. In addition to mentioned activities there can also be performed: additional activities, administrative, etc.
General Steps¶
Step 1. Work on project solution
This stage allows you to prepare all the necessary data to implement a project in the Unidata system. This stage can be omitted, but is recommended to proceed.
Project analysis. Project goals are formed, key points and key parameters are described, scope of work, etc.
Data profiling. All of the information systems that will act as source systems for the system are identified and analyzed. Data profiling is performed (for example, using Ataccama), which results in determining data formats, which data is unique, which data is suitable for lookup entities/enumerations, the availability of relations, etc.
Analytics: basic system objects. Describes what the data model will look like, including entities/lookup entities with attributes, relations, quality rules, etc. The data model description also includes information about from which source systems the information comes to the entities/lookup entities, what types of attributes will be assigned to each attribute of each entity/lookup entity. Describes roles, role access rights, etc.
Integration schema (optional). Description of the information systems interaction, data exchange paths, pipelines, etc.
Step 2. Project implementation
This stage is focused on working with the Unidata system.
User roles and access rights. The newly installed system contains nothing but a superuser account (an administrator with the fullest possible access rights). As first steps, it is recommended to create roles and assign access rights to the roles. This task can be performed in several steps.
The first step may be the assignment of access rights for roles of system administrators. The second step is the assignment of access rights for data stewards.
The second step is recommended to perform after the publication of final version of the data model, because the assignment of access rights to entities/lookup entities is possible only if they are available.
User accounts. Create user accounts and assign the appropriate roles to them. Similar to roles, it is recommended to perform in several steps: first create users-administrators, then data stewards.
Data model: entities/lookup entities, their attributes and relations. General task: complete data model with entities/lookup entities. The completion can be done both on the basis of the analytics performed in step 1, and without prior preparation. The basic steps of model completing:
Create entity/lookup entity. Configure main properties of a new object, such as name, type, method of external key generation, etc. It is recommended to create lookup entities first, and then the entities. And you should start with the entities, to which the other entities refer.
Create attribute. When adding a new attribute, you should specify a type of attribute, a value type, a function of the attribute, and the attribute availability to users. Required code attributes must be allocated for the lookup entity, and the type of code attribute must be defined.
Create relations. If the entities/lookup entities were created in the recommended sequence, creating relations will not require returning to other entities/lookup entities and creating entities/lookup entities specifically for the required relation. You can also create all entities/lookup entities first, without filling the “Relations” tab, and then set all relations in the next iteration.
Consolidation setup. Setting a weight to each source system from which data comes into the entity/lookup entity. The weight indicates the level of confidence in the source, and the higher the number, the higher the level. The “unidata” source system (the data created in the system) has a value of 100. Weight must be non-zero. Customization is available both for the entity/lookup entity and for each attribute individually.
Record view. You need to visualize how the record card will be displayed. Group the attributes, place all the blocks with attributes/relations. Can be performed at the stage when all entities/lookup entities are created and data is loaded.
Data Model: Quality Rules. Rules can be created either in the process of creating an entity/lookup entity or separately. Read Quality Rules for details. For Community Edition: validation rules are processed, but not displayed in the user interface.
Duplicate search rules. It is recommended to start working with rules after publishing the final version of the data model. Rules are executed sequentially, so it is necessary to determine the correct sequence of applying rules within each entity/lookup entity. The recommended order is from less precise to more precise. Important: To speed up the initial loading, it is recommended to turn off the duplicate search rules.
Workflows (for Standart & Enterprise Editions). It is recommended to start working with workflows after publishing data model. Read more about Workflows below. For Standart Edition: There is no assignment to entities/lookup entities; tasks are created only manually.
Units of measurement. It is recommended to start setting units of measurement after publishing the data model. Information about all used units of measurement is collected during profiling and analytics (step 1). Then, for the completed data model, create lists of measurement units, where you assign basic units. For example, goods from different customers can be purchased in different currencies, and then the units will reduce all currencies to the basic (for example, dollars). This makes possible to search for records by basic unit of measure. Units of measurement are configured outside the user interface, in an .xml file that is imported into the system after configuration.
Loading data into the system (first import). The data is placed on the information structure created during step 2.
Quality Rules¶
It is recommended to set up quality rules during the first stages of the Unidata system implementation, simultaneously with the data model setup.
Before you start:
Analysis of transferring data to the Unidata system has been performed: description of the data model, data processing functions, quality rules, etc. are ready.
Created data processing functions to be used in quality rules. The project may require both standard functions and custom functions (simple or composite). It is important that functions are created with the future operation mode of the quality rule in which the function will be used.
The function to be used in a quality rule running in validation mode must have at least 1 logical outgoing port (the port can be empty).
The function to be used in a rule running in enrichment mode can have outgoing ports of any type.
The function to be used in a rule combining validation + enrichment modes must have: at least 1 logical outgoing port and at least 1 port of any type.
You can use one of the ways to create quality rules.
Way 1:
Create entity/lookup entity. Create basic properties.
Creatre attributes of the entity/lookup entity.
Create quality rules for theentity/lookup entity.
Save entity/lookup entity. Go to the next entity/lookup entity.
Way 2:
Create a set of entities/lookup entities, properties and attributes.
Optional: a data model can be published for testing.
Create a set of quality rules for each entity/lookup entity.
The first way is more suitable if you know in advance what the data model will look like and what quality rules it will contain. The second way is more suitable if the quality rules are configured and adjusted as the data model changes or is refined. For example, if in the process of implementing the Unidata system, the original analytics results change.
Recommendations before downloading data. One of the following methods should be used:
All quality rules should be turned off before the first loading data from different systems. Otherwise, some of the data will not be loaded due to validation errors.
Or those quality rules that should be applied during the loading phase (such as normalization and cleanup) should be turned on, so that the data is loaded correctly and does not need to be reloaded.
Workflows¶
Workflows can be set up in several steps, distributed in time.
Before you start:
User roles are created (or the logical names of future roles are defined).
First step. Description and connection. At this stage two files are created:
.xml-file which, based on BPMN 2.0 notation, describes the course of workflow step by step (including the user roles responsible for each step). The steps of the workflow will later become individual tasks. The description is created with Activiti.
Auxiliary Java class.
The files are placed in the the Unidata system catalogs and are also prescribed in the configuration.
The first step is not tied to other system configuration work, and can be performed immediately after system deployment.
Second step. Assignment. Workflows can be assigned to record change events. Accordingly, it is recommended to assign workflows:
When data model is ready and published.
The presence or absence of data at the start of workflows should be determined by the tasks being solved within the project. For example, you can download data from multiple information systems, process data using quality rules, and only then assign workflows to track new data changes.