Schema is a logical description of the whole database. It includes the name and description of information of all sorts including all related records-objects and aggregates. Much like a database, DW additionally requires to preserve a schema. Database makes use of relational model, even as DW uses Star, Snowflake, and Fact Constellation schema (Galaxy schema).
Star Schema
In a Star Schema, there are more than one dimension tables in de-normalized form which are joined to handiest one fact table. These tables are joined in a logical way to satisfy a few enterprise requirement for evaluation motive. These schemas are multidimensional systems which are used to create reports using BI reporting equipment.
Dimensions in Star schemas contain a set of attributes and Fact tables include foreign keys for all dimensions and size values.
In the above Star Schema, there's a fact table “Sales Fact” at the middle and is joined to four measurement tables the usage of primary keys. Dimension tables aren't further normalized and this joining of tables is referred to as Star Schema in DW.
Fact desk additionally carries measure values − dollar_sold and units_sold.
Snowflakes Schema
In a Snowflakes Schema, there are a couple of dimension tables in normalized form which might be joined to best one truth desk. These tables are joined in a logical manner to fulfill some enterprise requirement for analysis motive.
Only distinction between a Star and Snowflakes schema is that measurement tables are similarly normalized. The normalization splits up the records into extra tables. Due to normalization inside the Snowflake schema, the statistics redundancy is decreased with out losing any records and consequently it becomes easy to keep and saves garage area.
In above Snowflakes Schema example, Product and Customer table are similarly normalized to store garage area. Sometimes, it additionally affords overall performance optimization when you execute a query that requires processing of rows at once in normalized desk so it doesn’t system rows in number one Dimension table and springs immediately to Normalized table in Schema.
Granularity
Granularity in a table represents the level of data stored inside the desk. High granularity of records approach that records is at or close to the transaction level, which has more detail. Low granularity method that records has low level of records.
A truth desk is generally designed at a low degree of granularity. This manner that we need to find the bottom stage of information that may be stored in a truth table. In date dimension, the granularity degree will be 12 months, month, sector, duration, week, and day.
The process of defining granularity includes steps −
- Determining the size which might be to be protected.
- Determining the vicinity to place the hierarchy of every size of statistics.
Slowly Changing Dimensions
Slowly converting dimensions talk to converting value of an attribute over time. It is one of the common principles in DW.
Example
Andy is an worker of XYZ Inc. He changed into first placed in New York City in July 2015. Original entry within the employee research desk has the subsequent report −
Employee ID | 10001 |
---|---|
Name | Andy |
Location | New York |
At a later date, he has relocated to LA, California. How need to XYZ Inc. Now alter its employee table to mirror this variation?
This is referred to as "Slowly Changing Dimension" concept.
There are 3 ways to clear up this type of hassle −
Solution 1
The new report replaces the unique document. No hint of the antique report exists.
Slowly Changing Dimension, the new statistics truly overwrites the authentic information. In other phrases, no history is kept.
Employee ID | 10001 |
---|---|
Name | Andy |
Location | LA, California |
Benefit − This is the easiest way to address the Slowly Changing Dimension trouble as there's no want to maintain track of the old facts.
Disadvantage − All historical statistics is lost.
Use − Solution 1 must be used whilst it isn't always required for DW to maintain music of historic statistics.
Solution 2
A new document is entered into the Employee dimension table. So the employee, Andy, is handled as two people.
A new report is delivered to the table to represent the brand new records and each the authentic and new document could be present. The new report gets its personal number one key as follows −
Employee ID | 10001 | 10002 |
---|---|---|
Name | Andy | Andy |
Location | New York | LA, California |
Benefit − This technique allows us to save all of the historical facts.
Disadvantage − Size of the desk grows quicker. When the number of rows for the desk may be very high, space and performance of table can be a difficulty.
Use − Solution 2 ought to be used while it's miles essential for DW to maintain historical facts.
Solution 3
The original document in Employee size is changed to mirror the exchange.
There might be columns to indicate the unique attribute, one shows unique fee and different indicates the new fee. There will also be a column that suggests whilst the modern fee turns into lively.
Employee ID | Name | Original Location | New Location | Date Moved |
---|---|---|---|---|
10001 | Andy | New York | LA, California | July 2015 |
Benefits − This does now not increase the dimensions of the table, considering new statistics is up to date. This permits us to preserve historic records.
Disadvantage − This approach doesn’t hold all history when an characteristic price is changed extra than as soon as.
Use − Solution three should handiest be used while it's miles required for DW to hold facts of ancient changes.
Normalization
Normalization is the procedure of decomposing a table into much less redundant smaller tables with out losing any records. So Database normalization is the process of organizing the attributes and tables of a database to limit records redundancy (duplicate records).
Purpose of Normalization
- It is used to get rid of certain kinds of statistics (redundancy/ replication) to improve consistency.
- It offers maximum flexibility to satisfy future statistics wishes by way of retaining tables similar to object sorts of their simplified bureaucracy.
- It produces a clearer and readable records model.
Advantages
- Data integrity.
- Enhances records consistency.
- Reduces information redundancy and space required.
- Reduces update value.
- Maximum flexibility in responding to advert-hoc queries.
- Reduces the overall variety of rows in line with block.
Disadvantages
Slow overall performance of queries in database due to the fact joins must be finished to retrieve applicable records from several normalized tables.
You should apprehend the information version so that it will carry out right joins among numerous tables.
Example
In the above instance, the table inside the inexperienced block represents a normalized table of the one in the crimson block. The table in green block is much less redundant and also with less range of rows with out dropping any records.