Working on projects

This page describes the general logic of working on projects in the Unidata platform: from a project solution to data loading. The work is performed in several stages. The stages described below are recommended and reflect a balanced method of implementing the Unidata platform. In addition to mentioned activities there can also be performed: additional activities, administrative, etc.

General steps

Step 1. Work on project solution

This stage allows you to prepare all the necessary data to implement a project in the Unidata platform. This stage can be omitted, but is recommended to proceed.

  • Project analysis. Project goals are formed, key points and key parameters are described, scope of work, etc.
  • Data profiling. All of the information systems that will act as data sources for the platform are identified and analyzed. Data profiling is performed (for example, using Ataccama), which results in determining data formats, which data is unique, which data is suitable for lookup entities/classifiers/enumerations, the availability of relations, etc.
  • Analytics: basic platform objects. Describes what the data model will look like, including entities/lookup entities with attributes, relations, quality rules, etc. The data model description also includes information about from which data sources the information comes to the entities/lookup entities; what types of attributes will be assigned to each attribute of each entity/lookup entity. Describes classifiers, roles, role access rights, etc.
  • Integration scheme (optional). Description of the information systems interaction, data exchange paths, pipelines, etc.

Step 2. Project implementation

This stage is focused on working with the Unidata platform.

  • User roles and access rights. The newly installed platform contains nothing but a superuser account (an administrator with the fullest possible access rights). As first steps, it is recommended to create roles and assign access rights to the roles. This task can be performed in several steps. The first step may be the assignment of access rights for roles of system administrators. The second step is the assignment of access rights for data stewards. The second step is recommended to perform after the publication of final version of the data model and classifiers, because the assignment of access rights to entities/lookup entities and classifiers is possible only if they are available.
  • User accounts. Create user accounts and assign the appropriate roles to them. Similar to roles, it is recommended to perform in several steps: first create users-administrators, then data stewards.
  • Data model: entities/lookup entities, their attributes and relations. General task: complete data model with entities/lookup entities. The completion can be done both on the basis of the analytics performed in step 1, and without prior preparation. The basic steps of model completing:
    • Create entity/lookup entity. Configure main properties of a new object, such as name, type, method of external key generation, etc. A classifier is selected and its properties are set at the stage when the classifiers are ready. It is recommended to create lookup entities first, and then the entities. And you should start with the entities, to which the other entities refer.
    • Create attribute. When adding a new attribute, you should specify a type of attribute, a value type, a function of the attribute, and the attribute availability to users. Required code attributes must be allocated for the lookup entity, and the type of code attribute must be defined.
    • Create relations. If the entities/lookup entities were created in the recommended sequence, creating relations will not require returning to other entities/lookup entities and creating entities/lookup entities specifically for the required relation. You can also create all entities/lookup entities first, without filling the “Relations” tab, and then set all relations in the next iteration.
    • Consolidation setup. Setting a weight to each data source from which data comes into the entity/lookup entity. The weight indicates the level of confidence in the source, and the higher the number, the higher the level. The system data source (the data created in the platform) has a value of 100. The weight must be non-zero. Customization is available both for the entity/lookup entity and for each attribute individually.
    • Record view. You need to visualize how the record card will be displayed. Group the attributes, place all the blocks with attributes/relations/classifiers. Can be performed at the stage when all entities/lookup entities, classifiers are created and data is loaded.
  • *Only for Unidata EE. Data Model: Quality Rules. The rules can be created either in the process of creating an entity/lookup entity or separately. Read below for details. * Duplicate search rules. It is recommended to start working with rules after publishing the final version of the data model. Rules are executed sequentially, so it is necessary to determine the correct sequence of applying rules within each entity/lookup entity. The recommended order is from less precise to more precise. Important: To speed up the initial loading, it is recommended to turn off the duplicate search rules. Only for Unidata EE. Workflows. It is recommended to start working with workflows after publishing data model and creating classifiers. Read more about workflows below. * Units of measurement. It is recommended to start setting units of measurement after publishing the data model. Information about all used units of measurement is collected during profiling and analytics (step 1). Then, for the completed data model, create lists of measurement units, where you assign basic units. For example, goods from different customers can be purchased in different currencies, and then the units will reduce all currencies to the basic (for example, dollars). This makes possible to search for records by basic unit of measure. Units of measurement are configured outside the user interface, in an .xml file that is imported into the platform after configuration. * Load data (first import) into the system. The data is placed on the information structure created during step 2. ===== Quality rules ===== It is recommended to set up quality rules during the first stages of the Unidata platform implementation, simultaneously with the data model setup. Pre-conditions: * Analysis of transferring data to the Unidata platform has been performed: description of the data model, data processing functions, quality rules, classifiers, etc. are ready. * Created data processing functions to be used in quality rules. The project may require both standard functions and custom functions (simple or composite). It is important that functions are created with the future operation mode of the quality rule in which the function will be used. * The function to be used in a quality rule running in validation mode must have at least 1 logical outgoing port (the port can be empty). * The function to be used in a rule running in enrichment mode can have outgoing ports of any type. * The function to be used in a rule combining validation + enrichment modes must have: at least 1 logical outgoing port and at least 1 port of any type. You can use one of the ways to create quality rules. Way 1: * Create entity/lookup entity. Create basic properties. * Creatre attributes of the entity/lookup entity. * Create quality rules for theentity/lookup entity. * Save entity/lookup entity. Go to the next entity/lookup entity. Way 2: * Create a set of entities/lookup entities, properties and attributes. * Optional: a data model can be published for testing. * Create a set of quality rules for each entity/lookup entity. The first way is more suitable if you know in advance what the data model will look like and what quality rules it will contain. The second way is more suitable if the quality rules are configured and adjusted as the data model changes or is refined. For example, if in the process of implementing the Unidata platform, the original analytics results change. Recommendations before downloading data. One of the following methods should be used: * All quality rules should be turned off before the first loading data from different systems. Otherwise, some of the data will not be loaded due to validation errors. * Or those quality rules that should be applied during the loading phase (such as normalization and cleanup) should be turned on, so that the data is loaded correctly and does not need to be reloaded. ===== Workflows ===== <html><div style=“margin-top: -5px”><button class='ud-red-button'; disabled=“disabled”>Unidata EE</button></div></html> Workflows can be set up in several steps, distributed in time. Pre-conditions: * User roles are created (or the logical names of future roles are defined). First step. Description and connection. At this stage two files are created: *.xml-file which, based on BPMN 2.0 notation, describes the course of workflow step by step (including the user roles responsible for each step). The steps of the workflow will later become individual tasks. The description is created with Activiti. * Auxiliary Java class. The files are placed in the the Unidata platform catalogs and are also prescribed in the configuration. The first step is not tied to other platform configuration work, and can be performed immediately after system deployment. Second step. Assignment.** Workflows can be assigned to record change events and to classifier change events. Accordingly, it is recommended to assign workflows:
  • When data model is ready and published.
  • When classifiers involved in the workflows are created (if the project includes).

The presence or absence of data at the start of workflows should be determined by the tasks being solved within the project. For example, you can download data from multiple information systems, process data using quality rules, classify data, and only then assign workflows to track new data changes.