Data Warehousing and BI Interview Questions and Answers
Q1. What is facts warehouse?
Ans: A data warehouse is a electronic storage of an Organization's ancient facts for the reason of Data Analytics, which includes reporting, analysis and different understanding discovery sports.
Other than Data Analytics, a information warehouse can also be used for the motive of information integration, master facts control and so forth.
According to Bill Inmon, a datawarehouse need to be concern-oriented, non-volatile, incorporated and time-variation.
Non-risky method that the statistics as soon as loaded in the warehouse will no longer get deleted later. Time-variant means the statistics will change with respect to time.
Q2. What is supposed with the aid of Data Analytics?
Ans: Data analytics (DA) is the science of inspecting raw data with the cause of drawing conclusions approximately that information. A facts warehouse is often built to permit Data Analytics
Q3. What are the advantages of records warehouse?
Ans: A records warehouse allows to integrate facts (see Data integration) and save them historically in order that we will analyze exclusive elements of enterprise inclusive of, performance evaluation, trend, prediction and so forth. Over a given time body and use the result of our evaluation to enhance the efficiency of business processes.
Q4. Why Data Warehouse is used?
Ans: For a long time within the past and also even nowadays, Data warehouses are built to facilitate reporting on one of a kind key enterprise procedures of an employer, called KPI. Today we frequently call this entire technique of reporting information from facts warehouses as "Data Analytics". Data warehouses also help to integrate information from extraordinary sources and show a unmarried-factor-of-reality values about the commercial enterprise measures (e.G. Allowing Master Data Management).Data warehouse can be similarly used for records mining which allows fashion prediction, forecasts, pattern popularity etc.
Q5. What is the distinction among OLTP and OLAP?
Ans: OLTP is the transaction device that collects commercial enterprise records. Whereas OLAP is the reporting and evaluation gadget on that information.
OLTP structures are optimized for INSERT, UPDATE operations and therefore enormously normalized. On the other hand, OLAP structures are deliberately denormalized for immediate facts retrieval through SELECT operations.
In a departmental keep, while we pay the fees on the test-out counter, the sales character on the counter keys-in all of the records right into a "Point-Of-Sales" machine. That records is transaction facts and the related machine is a OLTP machine.
On the opposite hand, the supervisor of the store may want to view a record on out-of-stock substances, so that he can vicinity buy order for them. Such record will pop out from OLAP device.
Q6. What is records mart?
Ans: Data marts are typically designed for a unmarried problem region. An business enterprise might also have facts touching on different departments like Finance, HR, Marketing and many others. Saved in statistics warehouse and every branch might also have separate records marts. These information marts may be constructed on pinnacle of the statistics warehouse.
Q7. What is ER model?
Ans: ER version or entity-dating version is a particular method of facts modeling wherein the aim of modeling is to normalize the information with the aid of lowering redundancy. This is extraordinary than dimensional modeling where the main goal is to enhance the information retrieval mechanism.
Q8. What is dimensional modeling?
Ans: Dimensional version includes measurement and fact tables. Fact tables shop one of a kind transactional measurements and the foreign keys from measurement tables that qualifies the statistics. The goal of Dimensional version is not to obtain excessive diploma of normalization however to facilitate easy and faster data retrieval.
Ralph Kimball is one of the most powerful proponents of this very famous facts modeling approach that's regularly used in many organization stage statistics warehouses.
If you want to study a short and easy guide on dimensional modeling, please take a look at our Guide to dimensional modeling.
Q9. What is measurement?
Ans: A size is some thing that qualifies a quantity (measure).
For an example, consider this: If I just say… “20kg”, it does now not suggest anything. But if I say, "20kg of Rice (Product) is offered to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, patron and dates are some dimension that qualified the degree - 20kg.
Dimensions are jointly impartial. Technically talking, a measurement is a records detail that categorizes every item in a facts set into non-overlapping regions.
Q10. What is Fact?
Ans: A fact is some thing that is quantifiable (Or measurable). Facts are commonly (but not continually) numerical values that can be aggregated.
Q11. What are additive, semi-additive and non-additive measures?
Ans: Non-additive Measures:
Non-additive measures are the ones which can't be used interior any numeric aggregation characteristic (e.G. SUM(), AVG() etc.). One example of non-additive fact is any form of ratio or percentage. Example, 5% earnings margin, revenue to asset ratio and so on. A non-numerical information can also be a non-additive degree while that information is stored in truth tables, e.G. A few form of varchar flags inside the truth table.
Semi Additive Measures:
Semi-additive measures are those where handiest a subset of aggregation characteristic may be implemented. Let’s say account balance. A sum() feature on balance does not provide a useful result however max() or min() stability is probably beneficial. Consider price price or currency price. Sum is meaningless on rate; however, average function is probably beneficial.
Additive measures can be used with any aggregation characteristic like Sum(), Avg() etc. Example is Sales Quantity and so on.
At this point, I will request you to pause and make a while to read this newsletter on "Classifying facts for a hit modeling". This article lets you recognize the variations between dimensional facts/ actual data and many others. From a essential attitude
Q12. What is Star-schema?
Ans: This schema is utilized in statistics warehouse fashions in which one centralized reality table references quantity of measurement tables so as the keys (number one key) from all of the size tables flow into the truth desk (as foreign key) wherein measures are stored. This entity-courting diagram looks like a celeb, hence the call.
Consider a truth table that stores sales quantity for each product and purchaser on a positive time. Sales amount may be the degree here and keys from consumer, product and time measurement tables will flow into the truth desk.
Q13. What is snow-flake schema?
Ans: This is any other logical arrangement of tables in dimensional modeling in which a centralized truth table references quantity of other dimension tables; but, the ones dimension tables are in addition normalized into more than one related tables.
Consider a fact table that stores sales quantity for each product and patron on a sure time. Sales quantity will be the measure right here and keys from patron, product and time dimension tables will circulate the truth desk. Additionally all the goods can be further grouped beneath exceptional product families stored in a exceptional desk in order that primary key of product family tables also is going into the product table as a foreign key. Such construct can be called a snow-flake schema as product desk is further snow-flaked into product own family.
Snow-flake increases degree of normalization in the layout.
Q14. What are the special styles of measurement?
Ans: In a information warehouse version, dimension can be of following kinds,
Role Playing Dimension
Based on how frequently the records interior a size adjustments, we can further classify dimension as:
Unchanging or static size (UCD)
Slowly changing measurement (SCD)
Rapidly changing Dimension (RCD)
Q15. What is a 'Conformed Dimension'?
Ans: A conformed measurement is the size that is shared throughout more than one problem place. Consider 'Customer' measurement. Both marketing and sales branch may additionally use the same patron dimension table of their reports. Similarly, a 'Time' or 'Date' measurement can be shared via special difficulty areas. These dimensions are conformed dimension.
Theoretically, dimensions which can be both equal or strict mathematical subsets of one another are said to be conformed.
Q16. What is degenerated measurement?
Ans: A degenerated measurement is a measurement that is derived from fact table and does no longer have its very own measurement table.
A size key, which includes transaction number, receipt variety, Invoice number and so forth. Does not have any extra associated attributes and for this reason can't be designed as a measurement desk.
Q17. What is junk dimension?
Ans: A junk dimension is a grouping of generally low-cardinality attributes (flags, indicators and many others.) so that those can be eliminated from other tables and can be junked into an summary dimension desk.
These junk dimension attributes may not be associated. The handiest motive of this desk is to store all the combinations of the dimensional attributes which you couldn't healthy into the distinct dimension tables otherwise. Junk dimensions are regularly used to put into effect Rapidly Changing Dimensions in records warehouse.
Q18. What is a role-playing dimension?
Ans: Dimensions are often reused for multiple applications inside the identical database with specific contextual meaning. For example, a "Date" size can be used for "Date of Sale", in addition to "Date of Delivery", or "Date of Hire". This is often known as a 'function-gambling measurement'
Q19. What is SCD?
Ans: SCD stands for slowly changing size, i.E. The dimensions in which statistics is slowly changing. These may be of many sorts, e.G. Type 0, Type 1, Type 2, Type three and Type 6, even though Type 1, 2 and 3 are most common.
Q20. What is rapidly changing dimension?
Ans: This is a measurement where information changes rapidly.
A Type 0 measurement is where dimensional adjustments are not taken into consideration. This does now not mean that the attributes of the measurement do no longer trade in actual business scenario. It just method that, even though the value of the attributes exchange, history isn't always stored and the table holds all the previous data.
A type 1 measurement is in which records isn't maintained and the desk usually suggests the latest information. This efficiently method that such dimension table is usually updated with current facts every time there is a change, and because of this update, we lose the preceding values.
A type 2 dimension desk tracks the ancient modifications through developing separate rows within the table with exceptional surrogate keys. Consider there is a client C1 below organization G1 first and afterward the purchaser is modified to organization G2. Then there can be two separate records in size table like underneath,
Key Customer Group Start Date End Date
1 C1 G1 1st Jan 2000 31st Dec 2005
2 C1 G2 1st Jan 2006 NULL
Note that separate surrogate keys are generated for the two statistics. NULL cease date inside the 2nd row denotes that the report is the modern-day file. Also notice that, in preference to start and cease dates, one could also hold model number column (1, 2 … and so on.) to denote exceptional variations of the record.
A kind three measurement stored the records in a separate column as opposed to separate rows. So not like a type 2 size that's vertically developing, a type three dimension is horizontally developing. See the instance beneath,
Key Customer Previous Group Current Group
1 C1 G1 G2
This is handiest good while you need no longer keep many consecutive histories and while date of alternate isn't required to be stored.
A type 6 size is a hybrid of type 1, 2 and 3 (1+2+3) which acts very similar to type 2, but only you upload one extra column to indicate which document is the cutting-edge document.
Key Customer Group Start Date End Date Current Flag
1 C1 G1 1st Jan 2000 thirty first Dec 2005 N
2 C1 G2 1st Jan 2006 NULL Y
Q21. What is a mini size?
Ans: Mini dimensions can be used to deal with swiftly converting size situation. If a measurement has a big range of rapidly converting attributes it is better to separate those attributes in one-of-a-kind table called mini size. This is executed due to the fact if the primary measurement table is designed as SCD type 2, the desk will quickly outgrow in length and create overall performance issues. It is better to segregate the rapidly converting contributors in one of a kind desk thereby retaining the principle size table small and performing.
Consider a school, in which a single pupil may be taught with the aid of many teachers and a single teacher may additionally have many college students. To model this case in dimensional model, one would possibly introduce a truth-less-truth table becoming a member of teacher and pupil keys. Such a reality desk will then be capable of answer queries like,
Who are the scholars taught by using a specific teacher?
Which teacher teaches maximum students?
Which pupil has maximum number of teachers?And so on. And so forth.
Q22. What is a coverage reality?
Ans: A truth-much less-fact table can most effective solution 'constructive' queries (superb question) but can't solution a terrible question. Again bear in mind the instance in the above example. A truth-much less truth containing the keys of tutors and students cannot solution a question like below,
Which teacher did now not teach any scholar?
Which pupil was now not taught with the aid of any trainer?
Why now not? Because reality-less truth table handiest shops the high quality eventualities (like scholar being taught with the aid of a educate) but if there is a scholar who is not being taught with the aid of a teacher, then that scholar's key does not seem in this table, thereby decreasing the insurance of the desk.
Coverage fact desk attempts to answer this - often via including an additional flag column. Flag = zero indicates a poor circumstance and flag = 1 suggests a high quality condition. To apprehend this better, let's recall a category wherein there are one hundred college students and five teachers. So coverage truth table will ideally keep 100 X 5 = 500 statistics (all mixtures) and if a certain trainer is not coaching a positive student, the corresponding flag for that document could be zero.
Q23. What is aggregation and what's the gain of aggregation?
Ans: A records warehouse normally captures records with identical degree of information as available in source. The "diploma of detail" is called as granularity. But all reporting necessities from that statistics warehouse do no longer need the same diploma of details.
To apprehend this, let's recall an instance from retail commercial enterprise. A sure retail chain has 500 stores accross Europe. All the shops report element stage transactions concerning the products they sale and those facts are captured in a facts warehouse.
Each shop supervisor can get admission to the statistics warehouse and they are able to see which merchandise are bought by way of whom and in what quantity on any given date. Thus the facts warehouse enables the store managers with the element degree information that can be used for inventory control, fashion prediction and so on.
Now think about the CEO of that retail chain. He does now not surely care approximately which sure income female in London sold the best wide variety of chopsticks or which store is the high-quality vendor of 'brown breads'. All he is fascinated is, perhaps to test the share boom of his revenue margin throughout Europe. Or can be 12 months to yr sales boom on japanese Europe. Such statistics is aggregated in nature. Because Sales of goods in East Europe is derived by way of summing up the person sales statistics from each shop in East Europe.
Therefore, to guide one of a kind stages of facts warehouse users, records aggregation is wanted.
Q24. What is slicing-dicing?
Ans: Slicing manner displaying the slice of a information, given a sure set of size (e.G. Product) and cost (e.G. Brown Bread) and measures (e.G. Income).
Dicing way viewing the slice with recognize to one of a kind dimensions and in distinct stage of aggregations.
Slicing and dicing operations are part of pivoting.
Q25. What is drill-via?
Ans: Drill through is the manner of going to the detail degree statistics from summary facts.
Consider the above instance on retail stores. If the CEO unearths out that income in East Europe has declined this 12 months as compared to closing year, he then would possibly need to understand the root reason of the decrease. For this, he can also begin drilling through his report to extra element stage and ultimately discover that even though character keep sales has sincerely increased, the general sales discern has reduced due to the fact a sure keep in Turkey has stopped operating the enterprise. The detail level of information, which CEO become now not a great deal involved on in advance, has this time helped him to pin point the basis cause of declined income. And the method he has accompanied to reap the info from the aggregated statistics is referred to as drill through.
Q26. What are incident and image statistics?
Ans: A fact desk stores some kind of measurements. Usually these measurements are stored (or captured) towards a specific time and those measurements range with respect to time. Now it might so manifest that the commercial enterprise may not capable of capture all of its measures always for each factor in time. Then the ones unavailable measurements can be saved empty (Null) or can be filled up with the last available measurements. The first case is the instance of incident fact and the second is the example of picture reality.
Q27. What is a truth-less-truth?
Ans: A fact table that doesn't include any measure is referred to as a reality-much less fact. This desk will simplest comprise keys from distinctive size tables. This is regularly used to solve a many-to-many cardinality difficulty.