Interview Questions.

Pentaho BI Interview Question and Answers


Pentaho BI Interview Question and Answers

Q1. Compare Pentaho & Tableau?


Criteria    Pentaho    Tableau

Functionality    ETL, OLAP, & static Reports    Data analytics

Availability    Open source    Proprietary

Strengths    Data Integration    Interactive visualizations

Q2. Define Pentaho and its utilization.

Ans: Revered as one of the most green and imaginitive records integration tools (DI),  Pentaho  actually helps all available facts resources and permits scalable data clustering and information mining. It is a light-weight Business Intelligence suite executing Online Analytical Processing (OLAP) offerings, ETL features, reports and dashboards introduction and different records-evaluation and visualization operations.

Q3. Explain the vital capabilities of Pentaho.


Pentaho is capable of creating Advanced Reporting Algorithms regardless of their enter and output statistics layout.

It helps various document formats, whether Excel spreadsheets, XMLs, PDF docs, CSV files.

It is a Professionally Certified DI Software rendered by means of the famend Pentaho Company based in Florida, United States.

Offers superior functionality and in-Hadoop capability.

Allows dynamic drill down into larger and extra statistics.

Rapid Interactive response optimization.

Explore and examine multidimensional records.

Q4. Name fundamental programs comprising Pentaho BI Project.


Business Intelligence Platform.

Dashboards and Visualizations.

Data Mining.

Data Analysis.

Data Integration and ETL (also called Kettle).

Data Discovery and Analysis (OLAP).

Q5. What is the importance of metadata in Pentaho?

Ans: A metadata model in Pentaho formulates the bodily shape of your database right into a logical commercial enterprise version. These mappings are saved in a significant repository and allow builders and administrators to construct enterprise-logical DB tables which are price effective and optimized. It in addition simplifies the working of business customers letting them create formatted reviews and dashboards ensuring safety to data get admission to.

All in all, metadata model provides an encapsulation across the physical definitions of your database and the logical representation and outline relationships between them.

Q6. Define Pentaho Reporting Evaluation.

Ans: Pentaho Reporting Evaluation is a selected package deal of a subset of the Pentaho Reporting abilties, designed for usual first-segment evaluation sports which include gaining access to sample facts, creating and enhancing reports, and viewing and interacting with reports.

Wish to Learn Pentaho? Click Here

Q7. Explain the blessings of Data Integration.


The largest gain is that integrating data improves consistency and reduces conflicting and erratic statistics from the DB. Integration of statistics permits customers to fetch precisely what they search for, allowing them utilizeand work with what they gathered.

Accurate records extraction, which similarly enables flexible reporting and tracking of the to be had volumes of information.

Helps meet closing dates for effective enterprise management.

Track patron’s information and buying behavior to enhance site visitors and conversions within the future, as a result advancing your business performance.

Q8. What is MDX and its utilization?

Ans: MDX is an acronym for ‘Multi-Dimensional Expressions,’ the standard query language delivered by Microsoft SQL OLAP Services. MDX is an imperative part of XML for analysis API, which has a special structure than SQL. A basic MDX query is:

SELECT [Quantity].[Unit Sales], [Quantity].[Store Sales] ON COLUMNS,

[Product].Participants ON ROWS

FROM [Sales]

WHERE [Time].[1999].[Q2]

Q9. Define three primary forms of Data Integration Jobs.


Transformation Jobs :Used for getting ready information and used most effective while the there's no trade in records till transforming of data process is completed.

Provisioning Jobs :Used for transmission/switch of huge volumes of records. Used only while no alternate is statistics is authorized until process transformation and on massive provisioning requirement.

Hybrid Jobs :Execute both transformation and provisioning jobs. No obstacles for information changes; it may be updates no matter success/failure. The remodeling and provisioning requirements aren't massive in this situation.

Q10. Illustrate the distinction between changes and jobs.

Ans: While variations refer to transferring and reworking rows from supply device to target device, jobs carry out high degree operations like enforcing adjustments, file transfer thru FTP, sending mails, and many others.

Another considerable difference is that the transformation permits parallel execution while jobs put into effect steps so as.

Q11. How to perform database join with PDI (Pentaho Data Integration)?

Ans: PDI supports becoming a member of of two tables form the identical databse using a ‘Table Input’ approach, performing the be a part of in SQL simplest.

On the alternative hand, for becoming a member of  tables in distinct databases, customers put in force ‘Database Join’ step. However, in database be part of, every input row question executes at the goal machine from the principle movement, ensuing in lower overall performance because the variety of queries put in force at the B will increase.

To keep away from the above state of affairs, there may be yet any other option to be part of rows shape two exclusive Table Input steps. You can use ‘Merge Join ‘step, using the SQL question having ‘ORDER BY’ clause. Remember, the rows should be flawlessly taken care of before implementing merge be part of.

Q12. Explain how to sequentialize modifications?

Ans: Since PDI variations aid parallel execution of all of the steps/operations, it is not possible to sequentialize adjustments in Pentaho. Moreover, to make this occur, customers want to trade the center architecture, in an effort to actually bring about slow processing.

HubSpot Video

Q13. Explain Pentaho Reporting Evaluation.

Ans: Pentaho Reporting assessment is a complete bundle of its reporting abilties, activities and gear, specially designed for first-phase evaluation like getting access to the pattern, producing and updating reviews, viewing them and acting numerous interactions. This evaluation includes Pentaho platform components, Report Designer and advert hoc interface for reporting used for neighborhood set up.

Q14. Can fieldnames in a row duplicated in Pentaho?

Ans: No, Pentaho doesn’t permit subject duplication.

Q15. Does transformation allow filed duplication?

Ans: “Select Values” will rename a field as you select the authentic discipline also.  The original area may have a reproduction name of the opposite subject now.

Q16. How to use database connections from repository?

Ans: You can either create a new transformation/activity or close and reopen those already loaded in Spoon.

Q17. Explain in short the idea Pentaho Dashboard.

Ans: Dashboards are the collection of numerous records items on single web page such as diagrams, tables and textual information. The Pentaho AJAX API is used to extract BI information even as Pentaho Solution Repository carries the content material definitions.

Q18. The steps worried in Dashboard advent include


Adding dashboard to the answer.

Defining dashboard content.

Implementing filters.

Editing dashboards.

Q19. How to use logic from one transformation/activity in different system?

Ans: Transformation good judgment can be shared the usage of subtransformations, which presents seamless loading and transformation of variables enhancing efficiency and productiveness of the machine. Subtransformations can be known as and reconfigured whilst required.

Q20. Explain the usage of Pentaho reporting.

Ans: Pentaho reporting permits agencies to create based and informative reviews to without problems get entry to, layout and supply significant and essential data to customers and customers. They additionally help enterprise customers to investigate and music patron’s behavior for the particular time and functionality, thereby directing them towards the right fulfillment path.

Q21. What is Pentaho Data Mining?

Ans: Pentaho Data Mining refers back to the Weka Project, which includes an in depth tool set for device studying and facts mining. Weka is open supply software for extracting large sers of records about users, customers and agencies. It is constructed on Java programming.

Q22. Is Data Integration and ETL Programming equal?

Ans: No. Data Integration refers to passing of statistics from one sort of structures to different in the equal utility. On the contrary, ETL is used to extract and get right of entry to statistics from distinctive sources. And transform it into different objects and tables.

Q22. Explain Hierarchy Flattening.

Ans: It is just the construction of discern baby relationships in a database. Hierarchy Flattening uses both horizontal and vertical formats, which allows clean and trouble-unfastened identification of sub elements. It in addition permits customers to apprehend and read the primary hierarchy of BI and consists of Parent column, Child Column, Parent attributes and Child attributes.

Q23. Explain Pentaho file Designer (PRD).

Ans: PRD is a image tool to execute record-editing features and create easy and superior reports and assist customers export them in PDF, Excel, HTML and CSV documents. PRD consists of Java-primarily based file engine presenting facts integration, portability and scalability. Thus, it could be embedded in Java internet programs and additionally other utility servers like Pentaho BAserver.

Q24. Define Pentaho Report kinds.

Ans: There are several classes of Pentaho reviews : 

Transactional Reports :Data for use shape transactions. Objective is to submit designated and comprehensive statistics for daily business enterprise’s sports like buy orders, sales reporting.

Tactical Reports :records comes from day by day or weekly transactional facts precis. Objective is to offer quick-term information for instant choice making like replacing merchandize.

Strategic Reports :records comes from solid and reliable sources to create lengthy-time period business data reports like season income evaluation.

Helper Reports :records comes from diverse assets and includes pictures, movies to present a variety of activities.

Q25. What are variables and arguments in variations?

Ans: Transformations dialog field includes two extraordinary tables: considered one of arguments and the other of variables. While arguments refer to command line distinct at some point of batch processing, PDI variables talk to items which can be set in a previous transformation/task within the OS.

Q26.How to configure JNDI for Pentaho DI Server?

Ans: Pentaho offers JNDI connection configuration for nearby DI to keep away from non-stop strolling of utility server in the course of the improvement and testing of alterations.  Edit the homes in jdbc.Propertiesfile located at…facts-integration-serverpentaho-solutionssystemsimple-jndi.

Q27. Is Pentaho a Trademark?

Ans: Yes, Pentaho is an indicator.

Q28. Explain MDX?Explain

Ans: Multidimensional Expressions (MDX) is a question language for OLAP databases, similar to SQL is a question language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.

Q29. Define Tuple?

Ans: Finite ordered list of elements is known as as tuple.

Q30. What sort of records, dice comprise?

Ans: The Cube will incorporate the subsequent records:

Fact fields – Sales, Costs and Discounts

Time Dimension – with the following hierarchy: Year, Quarter and Month

Customer Dimensions – one with vicinity (Region, Country) and the alternative with Customer Group and Customer Name

Product Dimension – containing a Product Name

Q31. Differentiate among changes and jobs?

Ans: Transformations is transferring and remodeling rows from source to goal.

Jobs are extra approximately excessive degree go with the flow control.

Q32. How to do a database join with PDI?

Ans: If we want to enroll in 2 tables from the identical database, we are able to use a “Table Input” step and do the be a part of in SQL itself.

If we want to sign up for 2 tables that aren't in the identical database. We can use the the “Database Join”.

Q33. How to sequentialize differences?

Ans: it isn't possible as in PDI variations all of the steps run in parallel. So we will’t sequentialize them.

Q34. How we can use database connections from repository?

Ans: We can Create a brand new conversion or close and re-open those we've loaded in Spoon.

Q34. How do you insert booleans right into a MySql database, PDI encodes a boolean as ‘Y’ or ‘N’ and this could’t be insert into a BIT(1) column in MySql.

Ans: BIT isn't a preferred SQL data kind. It’s not even general on MySQL as the that means (core definition) changed from MySQL model 4 to 5.

Also a BIT uses 2 bytes on MySQL. That’s why in PDI we made the safe desire and went for a char(1) to keep a boolean. There is a simple workaround to be had: change the information type with a Select Values step to “Integer” inside the metadata tab. This converts it to at least one for “true” and zero for “false”, just like MySQL expects.

Q35. By default all steps in a metamorphosis run in parallel, how are we able to make it so that 1 row receives processed completely until the stop earlier than the following row is processed?

Ans: This isn't always possible as in PDI variations all of the steps run in parallel. So we are able to’t sequentialize them. This could require architectural modifications to PDI and sequential processing also result in very sluggish processing.

Q36. Why can’t we replica fieldnames in a unmarried row?

Ans: we will’t. If we've got replica fieldnames. Before PDI v2.Five.0 we had been able to pressure replica fields, but additionally simplest the primary cost of the replica fields could ever be used.

Q37. What are the benefits of Pentaho?


Open Source

Have network that assist the users

 Running nicely under multi platform (Windows, Linux, Macintosh, Solaris, Unix, and so on)

Have entire package from reporting, ETL for warehousing records control,

 OLAP server records mining also dashboard.

Q38. Differentiate between Arguments and variables?



Arguments are command line arguments that we might generally specify at some stage in batch processing.


Variables are environment or PDI variables that we'd usually set in a preceding transformation in a task.

Q39. What are the programs of Pentaho?


i)Suite Pentaho

BI Platform (JBoss Portal)

Pentaho Dashboard





ii)All build underneath Java platform

Q40. Define Pentaho Schema Workbench?

Ans: Pentaho Schema Workbench gives a graphical part for designing OLAP cubes for Pentaho Analysis.

Q41.Brief approximately Pentaho Report designer?

Ans: It is a visual, banded file author. It has numerous features lilke using subreports, charts and graphs etc.

Q42. What do you un derstand with the aid of the time period ETL?

Ans: It is an entri degree tool for statistics manipulation.

What are the stairs to Decrypt a folder or file?

Right-click on the folder or file we need to decrypt, and then click on Properties option.

Click the General tab, and then click Advanced.

Clear the Encrypt contents to at ease information test box, click on OK, and then click OK again.

Q43. Explain Encrypting File device?

Ans: It is the technology which allows files to be transparently encrypted to cozy non-public facts from attackers with physical get right of entry to to the computer.

Q44. What is ETL method? Write the stairs also?.

Ans: ETL is extraction , transforming , loading process the steps are :

outline the supply

outline the goal

create the mapping

create the session

create the work float

Q45. What is metadata?

Ans: The metadata saved in the repository by means of associating facts with person gadgets inside the repository.

Q46. What are snapshots?

Ans: Snapshots are study-most effective copies of a grasp desk positioned on a remote node which may be periodically refreshed to mirror modifications made to the master desk.

Q47. What is records staging?

Ans: Data staging is really a group of methods used to prepare source machine information for loading a information warehouse.

Q48. Data staging is without a doubt a group of approaches used to prepare source device records for loading a statistics warehouse.

Ans: Full Load method absolutely erasing the insides of 1 or greater tables and filling with clean statistics.

Incremental Load method applying ongoing changes to 1 or more tables based on a predefined agenda.

Q49. Define mapping?

Ans: Dataflow from source to goal is known as as mapping.

Q50. Explain consultation?

Ans: It is a hard and fast of coaching which tell when and a way to flow facts from respective supply to target.

Q51. What is Workflow?

Ans: It is a hard and fast of preparation which inform the infomatica server the way to execute the project.

Q52. Define mapplet?

Ans: It creates and configure the set of transformation.

Q53. What do you apprehend via three tier records warehouse?

Ans: A data warehouse is stated to be a 3-tier gadget where a middle system gives usable data in a secure manner to quit customers. Both facet of this center device are the quit users and the back-quit records shops.

Q54. What is ODS?

Ans: ODS is Operational Data Store which is available in between of statistics warehouse and staging place.

Q55. Differentiate among Etl tool and OLAP device?

Ans: ETL Tool is used for extracting facts from the legecy device and cargo it into specific database with some processing of cleansing records.

OLAP Tool is used for reporting system . Here records is available in multidimensional model consequently we are able to write simple query to extract information from database.

Q56. What is XML?

Ans: XML is an extensiable markup language which defines a fixed of rule for encoding documents in each formats which is human readable and machine readable.

Q57. What are the unique variations of infomatica?

Ans: Informatica Powercenter 4.1, Informatica Powercenter 5.1, Powercenter Informatica 6.1.2, Informatica Powercenter 7.1.2, and so on.

Q58. What are various gear in ETL?

Ans: Abinitio,DataStage, Informatica, Cognos Decision Stream, and so on

Q59. Define MDX?

Ans: MDX is multi- dimensional expression which is a major question language applied by the Mondrains.

Q60. Define multi-dimensional cube?

Ans: It is a cube to view statistics in which we are able to slice and cube the information. It have time measurement, locations and figures.

Q61. How do you reproduction a field in a row in a transformation?

Ans: Several solutions exist:

Use a “Select Values” step renaming a area while selecting additionally the original one. The end result will be that the unique area can be duplicated to another call. It will appearance as follows:

This will reproduction fieldA to fieldB and fieldC.

Use a calculator step and use e.G. The NLV(A,B) operation as follows:

This may have the equal impact because the first answer: 3 fields in the output which are copies of each other: fieldA, fieldB, and fieldC.

Use a JavaScript step to duplicate the sector:

This will have the same effect as the preceding answers: 3 fields inside the output that are copies of each other: fieldA, fieldB, and fieldC.

Q62. We will be the use of PDI included in a web utility deployed on an software server. We’ve created a JNDI datasource in our utility server. Of route Spoon doesn’t run within the context of the application server, so how can we use the JNDI information supply in PDI?

Ans: If you appearance inside the PDI predominant listing you'll see a sub-directory “simple-jndi”, which contains a file referred to as “jdbc.Residences”. You should change this file so that the JNDI information matches the one you operate in your software server.

After that you set within the connection tab of Spoon the “Method of get right of entry to” to JNDI, the “Connection kind” to the type of database you’re using. And “Connection call” to the call of the JDNI datasource (as used in “jdbc.Homes”).

Q63. The Text File Input step has a Compression choice that allows you to pick Zip or Gzip, but it's going to best examine the primary document in Zip. How can I use Apache VFS support to address tarballs or multi-report zips?

Ans: The capture is to mainly restrict the record list to the documents inside the compressed collection.  Some examples:

You have a record with the subsequent shape:

get entry to.Logs.Tar.Gz



get right of entry to.Log.3

To study each of these files in a File Input step:

File/Directory    Wildcard

tar:gz:/path/to/get entry to.Logs.Tar.Gz!/access.Logs.Tar!    .+

 Note: If you simplest desired sure documents in the tarball, you may simply use a wildcard like get entry to.Log..* or something.  .+ is the magic in case you don’t want to specify the children filenames.  .* will now not work as it will encompass the folder (i.E. Tar:gz:/direction/to/get right of entry to.Logs.Tar.Gz!/get entry to.Logs.Tar!/ )

You have a less difficult report, fat-get entry to.Log.Gz.  You could use the Compression option of the File Input step to cope with this easy case, but in case you desired to apply VFS rather, you would use the subsequent specification:

File/Directory    Wildcard

gz:record://c:/course/to/fat-get admission to.Log.Gz!    .+

Finally, if you have a zip file with the subsequent shape:

get entry to.Logs.Zip/

a-root-get admission to.Log



subdirectory-get admission to.Log.2


subdirectory-get admission to.Log.1

subdirectory-get entry to.Log.2

You may want to get entry to all the documents, in which case you’d use:

File/Directory    Wildcard

zip:report://c:/course/to/get admission to.Logs.Zip!    A-root-get entry to.Log

zip:document://c:/route/to/get admission to.Logs.Zip!/subdirectory1    subdirectory-access.Log.*

zip:file://c:/route/to/get right of entry to.Logs.Zip!/subdirectory2    subdirectory-get entry to.Log.*

Note: For some cause, the .+ doesn’t work within the subdirectories, they nevertheless display the listing entries. :