Top 50 Etl Testing Interview Questions
Q1. How To Determine What Records To Extract?
When addressing a desk a few dimension key should reflect the want for a record to get extracted. Mostly it is going to be from time measurement (e.G. Date >= 1st of modern-day month) or a traction flag (e.G. Order Invoiced Stat). Foolproof would be including an archive flag to record which gets reset while report modifications.
Q2. Define Non-additive Facts?
Non additive statistics are information that can not be summed up for any dimensions present in reality table. These columns can not be introduced for generating any results.
Q3. What Is The Difference Between Power Center & Power Mart?
PowerCenter - ability to organize repositories right into a statistics mart domain and proportion metadata throughout repositories.
PowerMart - best local repository may be created.
Q4. What Is The Difference Between Joiner And Lookup?
Joiner is used to join two or greater tables to retrieve statistics from tables(much like joins in square).
Look up is used to test and evaluate supply desk and target desk .(similar to correlated sub-question in square).
Q5. Compare Etl & Manual Development?
These are a few variations b/w manual and ETL development.
ETL
The manner of extracting facts from multiple resources.(ex. Flatfiles, XML, COBOL, SAP and many others) is more easier with the help of equipment.
High and clear visibility of logic.
Contains Meta data and adjustments may be achieved easily.
Error managing, log precis and load development makes lifestyles simpler for developer and maintainer.
Can handle Historic records thoroughly.
Manual
Loading the information apart from flat files and oracle desk want greater attempt.
Complicated and not so user friendly visibility of logic.
No Meta statistics concept and modifications wishes extra effort.
Want maximum attempt from preservation point of view.
As facts grows the processing time degrades.
Q6. How Do We Call Shell Scripts From Informatica?
Specify the Full direction of the Shell script the "Post consultation residences of session/workflow".
Q7. What Is Data Cleaning?
Data cleansing is likewise known as data scrubbing.
Data cleaning is a process which guarantees the set of statistics is correct and accurate. Data accuracy and consistency, records integration is checked for the duration of facts cleansing. Data cleaning may be implemented for a fixed of statistics or multiple units of records which need to be merged.
Q8. What Are Parameter Files? Where Do We Use Them?
Parameter record defines the price for parameter and variable utilized in a workflow, worklet or session.
Q9. What Is Data Purging?
Deleting information from facts warehouse is known as information purging. Usually junk records like rows with null values or spaces are wiped clean up.
Data purging is the manner of cleaning this type of junk values.
Q10. What Is Full Load & Incremental Or Refresh Load?
Full Load: completely erasing the contents of 1 or more tables and reloading with clean facts.
Incremental Load: making use of ongoing changes to one or extra tables based on a predefined schedule.
Q11. Can We Use Procedural Logic Inside Infromatica? If Yes How , If No How Can We Use External Procedural Logic In Informatica?
Yes, you can use advanced external trformation, You can use c++ language on unix and c++, vb vc++ on home windows server.
Q12. What Is Latest Version Of Power Center / Power Mart?
The Latest Version is 7.2
Q13. What Is A Mapping, Session, Worklet, Workflow, Mapplet?
A mapping represents dataflow from sources to goals.
A mapplet creates or configures a hard and fast of trformations.
A workflow is a hard and fast of commands that inform the Informatica server how to execute the duties.
A worklet is an item that represents a fixed of obligations.
A session is a hard and fast of instructions that describe how and while to move information from sources to targets.
Q14. What Are The Different Versions Of Informatica?
Here are a few popular variations of Informatica.
Informatica Powercenter four.1,
Informatica Powercenter five.1,
Powercenter Informatica 6.1.2,
Informatica Powercenter 7.1.2,
Informatica Powercenter 8.1,
Informatica Powercenter eight.5,
Informatica Powercenter eight.@
Q15. What Is Data Wearhousing?
A information warehouse may be taken into consideration as a garage region in which relevant information is stored no matter the supply.
Data warehousing merges statistics from more than one resources into an smooth and whole form.
Q16. If A Flat File Contains one thousand Records How Can I Get First And Last Records Only?
By the use of Aggregator trformation with first and closing features we are able to get first and last record.
Q17. What Are Non-additive Facts In Detail?
A truth may be measure, metric or a greenback price. Measure and metric are non additive statistics.
Dollar value is additive reality. If we need to discover the quantity for a selected region for a particular time frame, we will upload the dollar quantities and give you the overall amount.
A non additive reality, for eg; measure peak(s) for 'citizens by way of geographical location' , when we rollup 'metropolis' statistics to 'nation' degree facts we have to no longer add heights of the residents rather we may need to apply it to derive 'remember'.
Q18. Can Informatica Load Heterogeneous Targets From Heterogeneous Sources?
No, In Informatica five.2 and
Yes, in Informatica 6.1 and later.
Q19. What Is Data Cube Technology Used For?
Data cubes are typically used for easy interpretation of data. It is used to represent facts along side dimensions as some measures of business wishes. Each measurement of the dice represents some attribute of the database. E.G income in keeping with day, month or 12 months.
Q20. What Are Active Trformation / Passive Trformations?
Active trformation can alternate the number of rows that bypass through it. (Decrease or increase rows)
Passive trformation can't exchange the number of rows that skip via it.
Q21. Define Slowly Changing Dimensions (scd)?
SCD are dimensions whose information adjustments very slowly.
Eg: city or an employee.
This measurement will exchange very slowly.
The row of this information inside the measurement may be either replaced completely with none song of antique document OR a brand new row can be inserted, OR the change can be tracked.
Q22. What Are Snapshots? What Are Materialized Views & Where Do We Use Them? What Is A Materialized View Log?
Snapshots are examine-simplest copies of a master desk positioned on a faraway node that is periodically refreshed to reflect changes made to the grasp desk. Snapshots are reflect or replicas of tables.
Views are built the use of the columns from one or extra tables. The Single Table View can be updated however the view with multi table cannot be up to date.
A View may be up to date/deleted/inserted if it has simplest one base table if the view is primarily based on columns from one or more tables then insert, update and delete isn't always viable.
Materialized view
A pre-computed desk comprising aggregated or joined statistics from fact and in all likelihood size tables. Also known as a precis or mixture desk.
Q23. What Is A Three Tier Data Warehouse?
A facts warehouse may be concept of as a 3-tier device in which a center system gives usable records in a relaxed way to give up users. On both aspect of this center device are the stop customers and the lower back-end information shops.
Q24. What Is Cube Grouping?
A trformer built set of similar cubes is known as cube grouping. They are usually used in creating smaller cubes which might be based on the information inside the degree of size.
Q25. What Are The Modules In Power Mart?
PowerMart Designer
Server
Server Manager
Repository
Repository Manager
Q26. What Is Ods (operation Data Source)?
ODS - Operational Data Store.
ODS Comes among staging vicinity & Data Warehouse. The records is ODS will be on the low degree of granularity.
Once data was populated in ODS aggregated facts will be loaded into EDW via ODS.
Q27. What Is The Difference Between Informatica 7.0&eight.Zero?
The most effective distinction b/w informatica 7 & 8 is... Eight is a SOA (Service Oriented Architecture) whereas 7 isn't always. SOA in informatica is dealt with through exceptional grid designed in server.
Q28. What Are The Various Test Procedures Used To Check Whether The Data Is Loaded In The Backend, Performance Of The Mapping, And Quality Of The Data Loaded In Informatica?
The satisfactory system to take a help of debugger where we reveal each and every procedure of mappings and the way records is loading based on conditions breaks.
Q29. What Is Active Data Wearhousing?
An active information warehouse represents a unmarried nation of the commercial enterprise. It considers the analytic views of customers and suppliers. It facilitates to deliver the up to date information through reviews.
Q30. Where Do We Use Semi And Non Additive Facts?
Additive: A degree can participate mathematics calculations the use of any or all dimensions.
Ex: Sales profit
Semi additive: A degree can participate arithmetic calculations the usage of a few dimensions.
Ex: Sales amount
Non Additive:A degree cannot take part arithmetic calculations the usage of dimensions.
Ex: temperature
Q31. What Is Real Time Data-wearhousing?
In actual time statistics-warehousing, the warehouse is updated each time the machine performs a traction.
It reflects the real time commercial enterprise data.
This me that once the query is fired within the warehouse, the kingdom of the enterprise at that point could be again.
Q32. Can We Lookup A Table From Source Qualifier Trformation. Ie. Unconnected Lookup?
You cannot lookup from a supply qualifier immediately. However, you may override the SQL inside the source qualifier to sign up for with the research desk to perform the research.
Q33. What Is A Staging Area? Do We Need It? What Is The Purpose Of A Staging Area?
Data staging is virtually a collection of approaches used to put together supply system information for loading a facts warehouse. Staging consists of the subsequent steps:
Source statistics extraction, Data trformation (restructuring),
Data trformation (records cleing, value trformations),
Surrogate key assignments.
Q34. Give Some Popular Tools?
Popular Tools:
IBM Web Sphere Information Integration (Accentual DataStage)
Ab Initio
Informatica
Talend
Q35. What Is The Difference Between Etl Tool And Olap Tools?
ETL tool is supposed for extraction data from the legacy systems and cargo into distinctive database with a few method of cleing statistics.
Ex: Informatica, facts level ....Etc
OLAP is meant for Reporting reason in OLAP facts to be had in Multidirectional version. So that you can write easy query to extract statistics from the statistics base.
Ex: Business items, Cognos....Etc
Q36. Give Some Etl Tool Functionalities?
While the choice of a database and a hardware platform is a must, the selection of an ETL tool is exceedingly endorsed, however it's no longer a ought to. When you evaluate ETL equipment, it pays to look for the subsequent characteristics:
Functional capability: This consists of both the 'trformation' piece and the 'cleing' piece. In widespread, the standard ETL tools are both geared toward having robust trformation skills or having sturdy cleing abilties, but they're seldom very robust in each. As a end result, if you recognise your statistics is going to be dirty coming in, make sure your ETL tool has robust cleing abilties. If you recognize there are going to be a variety of distinctive statistics trformations, it then makes feel to pick a tool this is strong in trformation.
Ability to examine immediately from your statistics supply: For each corporation, there's a unique set of records resources. Make certain the ETL device you select can join immediately for your source records.
Metadata guide: The ETL tool plays a key function in your metadata because it maps the source facts to the vacation spot, that is an critical piece of the metadata. In reality, some companies have come to depend upon the documentation of their ETL device as their metadata supply. As a end result, it's far very critical to select an ETL device that works along with your standard metadata method.
Q37. What Is Informatica Metadata And Where Is It Stored?
Informatica Metadata is data approximately facts which shops in Informatica repositories.
Q38. What Are The Different Problems That Data Mining Can Slove?
Data mining can be used in a ramification of fields/industries like marketing of services and products, AI, government intelligence.
The US FBI makes use of records mining for screening protection and intelligence for figuring out illegal and incriminating e-facts disbursed over internet.
Q39. Do We Need An Etl Tool? When Do We Go For The Tools In The Market?
ETL Tool:
It is used to Extract(E) records from more than one source structures(like RDBMS, Flat documents, Mainframes, SAP, XML and many others) trform(T) then primarily based on Business requirements and Load(L) in target places.(like tables, files and many others).
Need of ETL Tool:
An ETL device is commonly required whilst information scattered throughout extraordinary structures.(like RDBMS, Flat documents, Mainframes, SAP, XML and so forth).
Q40. Where Do We Use Connected And Un Connected Lookups?
If return port only one then we can cross for unconnected. More than one return port is not feasible with Unconnected. If multiple go back port then pass for Connected.
Q41. When Do We Analyze The Tables? How Do We Do It?
The ANALYZE statement allows you to validate and compute information for an index, table, or cluster. These facts are used by the cost-based optimizer when it calculates the most green plan for retrieval. In addition to its function in assertion optimization, ANALYZE additionally helps in validating item structures and in dealing with area on your system. You can pick out the following operations: COMPUTER, ESTIMATE, and DELETE. Early model of Oracle7 produced unpredictable results while the ESTIMATE operation was used. It is first-class to compute your data.
EX:
select OWNER,
sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
sum(decode(nvl(NUM_ROWS,9999), 9999,1,zero)) not_analyzed,
be counted(TABLE_NAME) general
from dba_tables
where OWNER no longer in ('SYS', 'SYSTEM')
group by using OWNER
Q42. What Are Critical Success Factors?
Key regions of activity in which favorable outcomes are important for a employer to obtain its goal.
There are four simple sorts of CSFs which are:
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
Q43. What Are The Different Lookup Methods Used In Informatica?
Connected lookup:
Connected research will acquire enter from the pipeline and sends output to the pipeline and can return any wide variety of values it does now not include go back port.
Unconnected lookup:
Unconnected research can return best one column it include return port.
Q44. What Is Etl?
ETL stands for extraction, trformation and loading.
ETL offer developers with an interface for designing supply-to-target mappings, trformation and job manipulate parameter.
Extraction :
Take facts from an external supply and flow it to the warehouse pre-processor database.
Trformation:
Trform information task permits point-to-factor producing, enhancing and trforming statistics.
Loading:
Load data assignment provides facts to a database table in a warehouse.
Q45. What Is The Metadata Extension?
Informatica permits end customers and companions to extend the metadata stored in the repository with the aid of associating information with individual gadgets inside the repository. For instance, while you create a mapping, you may shop your touch information with the mapping. You companion facts with repository metadata using metadata extensions.
Informatica Client programs can contain the subsequent varieties of metadata extensions:
Vendor-defined. Third-birthday celebration application companies create dealer-described metadata extensions. You can view and trade the values of seller-described metadata extensions, however you cannot create, delete, or redefine them.
User-defined. You create user-described metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view person-described metadata extensions. You also can trade the values of consumer-described extensions.
Q46. What Is Bus Schema?
A BUS schema is to pick out the common dimensions across enterprise strategies, like identifying conforming dimensions. It has conformed size and standardized definition of facts.
Q47. What Is Etl Process ?How Many Steps Etl Contains Explain With Example?
ETL is extraction, trforming, loading method, you may extract statistics from the source and follow the commercial enterprise role on it then you'll load it within the target the steps are :
outline the source(create the odbc and the connection to the supply DB)
define the target (create the odbc and the connection to the goal DB)
create the mapping ( you may observe the business role here by using adding trformations , and define how the records glide will move from the supply to the target )
create the session (its a hard and fast of education that run the mapping )
create the work float (coaching that run the consultation)
Q48. What Is Virtual Data Wearhousing?
A digital data warehouse affords a collective view of the completed data. It can be considered as a logical statistics version of the containing metadata.
Q49. What Is Conformed Fact? What Are Conformed Dimensions Used For?
Conformed truth in a warehouse allows itself to have identical call in separate tables. They may be in comparison and blended mathematically. Conformed dimensions may be used throughout multiple facts marts. They have a static structure. Any measurement desk this is utilized by more than one truth tables can be conformed dimensions.
Q50. How Do You Calculate Fact Table Granularity?
Granularity, is the extent of element in which the truth desk is describing, for instance if we're making time evaluation so the granularity maybe day based - month based totally or year primarily based.
