markerMaatwerkoplossingen

markerBetrokken & direct

markerResultaatgericht

markerSamen voor de beste oplossing


Author:           Mesa Jajan

Building the Data Vault with DDM Studio 3

Introduction
The Entity Relation Diagram of Adventureworks has been converted into a Data Vault, as described in the previous article about modeling the Data Vault. The next step after modeling is the creation of the Data Vault. DDM Studio 3 will be used as a tool to set up the Data Vault. There are two different Data Vault modules that one can choose. There is the ‘Classic’ module and the ‘Surrogate key’. I chose the classic version  and René, my colleague, chose the Surrogate key module. This article describes the detailed process of creating HUBs, LINKs and Satellites in DDM Studio 3. Starting with an introduction about DDM Studio 3 and explaining the functionality of the creation of the Data Vault.

DDM Studio 3
DDM Studio 3 is a multi-enterprise metadata driven technology used for setting up and maintaining multiple Data Warehouses in a Microsoft SQL Server 2008R2/2012 environment. DDM Studio 3 is suited for Data Vault modeling and Dimensional Modeling. For a further general explanation about modeling a Data Warehouse with DDM Studio see this whitepaper. Dimensional Modeling uses star schemas to build a Data Warehouse, which is optimized for reporting tools like Business Objects, Crystal Reports, Microsoft Reporting Services and many other reporting tools and OLAP tools like Microsoft Analysis Services and Business Objects OLAP Intelligence. DDM Studio 3 is a Server side tool. It contains the engine to perform the E(xtract)T(ransform)L(oad) process and a Web-Interface to design the multi-Dimensional data models and Data Vault. DDM Studio 3 consists of two components:

  • A Web-Interface to maintain the metadata of Dimensions, Facts, Hubs, Links and Satellites.
  • A Back-End engine to load the data into the Data Warehouse.

This article will mainly focus on the Web-Interface to set up the HUBs and LINKs for a Data Vault. For more information about DDM Studio 3 please visit our website.

DDM Studio 3 structure
The following databases are created on the Microsoft SQL Server 2008R2/2012 environment prior to start using the Web-Interface.

AnySourceDB
Data from the OLTP databases and other data sources is transferred in rough format into the AnySourceDB which is part of the ODS. The data is not manipulated and multiple sources do not require merging in an intermediate database prior to loading. The name of the database has not been set to a certain name and can be changed during installation. The name SourceDB was chosen.

StagingDB
After transferring the data from the OLTP databases and other data sources to the database, the data is processed in the StagingArea database: StagingDB, where it is possible to perform cleansing of data, data enrichment and data transformations. According to the Data Vault method one transfers the data 1:1 from the source to the database. DDM Studio makes it possible perform transformations before loading the actual Data Vault. The name StagingDB is not a mandatory database name and can be changed during installation.

DataVaultDB
DDM Studio delivers an exact copy of the Data Vault to the Data Vault database: DatavaultDB. The name of the DataVaultDB database can be changed during installation. The DataVaultDB is the actual Data Vault.

DDM_Metadata
The whole process is controlled and monitored by a metadata layer. This is the layer in which all the metadata of the sources, HUBs, LINKs, Satellites and indexes is maintained. The name of the DDM_MetaData database is a fixed name.

DDM_LicenseKey
The purpose of the DDM Studio database is to distribute the license key in a flexible manner. The encrypted licensekey for the installed version of DDM Studio 3 is stored in this database and is distributed by Ocean BI B.V. The name of the DDM_LicenseKey database is a fixed name.

DDM Studio 3 overview

Step 1: Load data into the Any SourceDB
Step 2: Build your Staging Area Delta Objects
Step 3: Establish the HUBs
Step 4: Establish the LINKs
Step 5: Establish the Satellites
Step 6: Load HUBs and LINKs and the surrounding Satellites

Step 1: Load data into the SourceDB
One can choose from any database to load into the SourceDB. In most of the cases this will be a simple copy of the tables from the OLTP databases and other data sources into a database. This is done because the sources will not lose any performance during the several stages of loading and transforming the data into the Data Warehouse. The Adventureworks database provided by Microsoft is selected. All the tables are loaded because it is not yet clear about which content to use for the Data Vault. After creating the Data Vault, it is possible to remove the tables from the SourceDB that are not necessary. Because the Adventureworks database resides on the same instance as the DDM Studio Repository the metadata of the tables and views are imported in the Adventureworks database into the DDM Studio Repository.

Data Vault DDM Studio

After loading the source database in the DDM Studio Repository, it is possible to see the available Source Objects in the database.

DDM Studio

Step 2: Build the Staging Area Delta Objects
Data from the source tables can be cleansed, enriched and transformed before it is suitable to load into the Data Vault. This is not common used in the Data Vault method. But it is a possibility that DDM Studio 3 offers.

DDM Studio

Creating homogenous data sets is done by the Staging Area Delta (SAD) objects. Creating these objects is the first step. It is quite logical to select the SAD’s out of the source tables that will be used in the HUBs and LINKs. In this example one can see the ‘Address’ SAD which will be used later on as input the HUB with the equal name.

DDM Studio

Especially for the LINKs, the SAD’s turn out to be convenient. The composite Natural Keys for the LINKs is transformed in the definition of the SAD’s. In this example below one can see the ‘ContactCreditCard’ SAD. By selecting different mapping rules, a SQL script gets automatically generated. The SADs are physically created in the StagingDB. The physical creation one executes from the UI.

DDM Studio

Step 3: Establish the HUBs

Creating HUBs with DDM Studio is very straightforward. One selects the add button and the details pane appears.

DDM studio HUB

In the details pane one starts customizing the HUB.

DDM studio HUB

The following is mandatory for the creation of a HUB: Name, Surrogate Key and Natural Key. The HUB is automatically created as a table because it becomes a physical object in our Data Vault. After inserting the details, the ‘fields’ pane appears. In this overview, the details of the metadata of the HUBs are set. Three fields are automatically generated by DDM Studio 3. The Surrogate Key, the Natural Key and the DataSource (Record Source). As described in the article about the Data Vault, these are mandatory fields in HUBs. This is the example of the ‘Contact’ HUB. ‘ContactID’ is chosen as the name for the Natural Key. The information that is provided at the Field Details pane is shown in de next image.

DDM studio HUB

The final step to complete the HUB is filling in the Transformation View. Here one selects the fields from the SAD’s that are created previously and are saved in the StagingDB. Through a simple menu with mapping rules, a SQL script gets generated. One could choose previously (see Details) from creating the Transformation View as a table or a view. For performance reasons it is wise to create Views which are now also chosen.

DDM studio HUB

For the classic version, defining the Surrogate Key is not necessary. The desired field as described above is chosen as the Natural Key. The DataSource is in this case ‘Adventureworks’. The information about the DataSource has already been defined when creating the SADs. An extra field gets automatically generated when one opens the Transformation View. This is the ‘ExampleSatelliteAttribute’. This proves that a Satellite has automatically been generated.

Step 4: Establish the Links

Creating LINKs with DDM Studio is very straightforward. One selects the add button and the details pane appears.

DDM studio Link

In the details pane one starts customizing the LINK.

DDM studio LINK

Filling in the name is mandatory for the creation of a LINK: Name, Surrogate Key and Natural Key. The LINK is automatically created as a table because it becomes a physical object in the Data Vault. After inserting the details, the ‘fields’ pane appears. In this overview, one can set the details of the metadata of the LINKs. Three fields are automatically generated by DDM Studio 3. The Surrogate Key, the Natural Key and the DataSource (Record Source). As described in the article about the Data Vault, these are mandatory fields in LINKs. Here the example of the ‘Address’ LINK is shown. ‘AddressID’ is chosen as the name for the Natural Key. The information that is provided at the Field Details pane is shown in de next image.

DDM studio LINK

The final step to complete the LINK is filling in the Transformation View. Here the fields from the SAD’s are selected that were created previously and are saved in the StagingDB. Through a simple menu consisting of mapping rules, a SQL script gets generated. One could choose previously from creating the Transformation View as a table or a view. For performance reasons it is wise to create Views which is also chosen for the LINKs just as the HUBs.

DDM studio Link

For the classic version, defining the Surrogate Key is not necessary. So the desired field was chose for the Natural Key as described above, the DataSource is in this case ‘Adventureworks’.  This LINK has one Foreign Key; FK_StateProvinceID.  See the ERD model in the previous article for this relation. When creating the Transformation View the field ‘ExampleSatelliteAttribute’ gets automatically generated. This field depicts a fictitious relation between this LINK and its Satellite.

Step 5: Establish the Satellites
Creating Satellites is done in the Satellite section. At this section an automatically generated overview is shown. The HUBs and LINKs that were created previously are visible.

DDM SAT

Here the ‘Contact’ HUB is selected as described in step 3.

DDM Studio SAT

The name given to this Satellite is; Contact_Name because attributes will be added to this Satellite that contain attributes related to the name of the contacts listed in the database.  The first attribute to add to this Satellite is ‘Title’.

DDM Studio SAT

This Satellite gets completed with the attributes ‘Firstname’, ‘MiddleName’, ‘LastName’. And a second Satellite gets created; Contact_Email.

DDM studio SAT

This is the overview of the Contact HUB with its Satellites:

DDM Satellite

Step 6: Load HUBs and LINKs and the surrounding Satellites
Loading the HUBs and LINKs are executed through the UI of DDM Studio. One simply selects the desired HUBs or LINKs and select ‘Hub Load’ or ‘Link Load’

DDM studio HUB

DDM studio LINK

Finally one checks the Data Vault in SQL Server to see whether the objects are build and filled. While loading the HUBs and LINKs, the Satellites also get loaded automatically.

DDM studio

Conclusion

Building a Data Vault starts with modeling a Data Vault. After modeling one can choose for different tooling to build the physical Building a Data Vault with DDM Studio is straightforward.