Top 100+ Oozie Interview Questions And Answers
Question 1. What Is Apache Oozie?
Answer :
Apache Oozie is a Java Web software used to schedule Apache Hadoop jobs.It is included with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie is a scalable, dependable and extensible system. Oozie is utilized in manufacturing at Yahoo!, strolling greater than 200,000 jobs each day.
Question 2. Mention Some Features Of Oozie?
Answer :
Oozie has client API and command line interface which can be used to release, control and monitor job from Java software.
Using its Web Service APIs you could control jobs from anywhere.
Oozie has provision to execute jobs which can be scheduled to run periodically.
Oozie has provision to send e mail notifications upon of entirety of jobs.
Adv Java Interview Questions
Question three. Explain Need For Oozie?
Answer :
With Apache Hadoop turning into the open source de-facto standard for processing and storing Big Data, many other languages like Pig and Hive have followed - simplifying the procedure of writing massive facts programs based totally on Hadoop.
Although Pig, Hive and lots of others have simplified the process of writing Hadoop jobs, typically a single Hadoop Job isn't always sufficient to get the preferred output. Many Hadoop Jobs need to be chained, facts has to be shared in between the jobs, which makes the complete procedure very complex.
Question four. What Are The Alternatives To Oozie Workflow Scheduler?
Answer :
Azkaban is a batch workflow task scheduler
Apache NiFi is an clean to use, effective, and reliable device to manner and distribute data.
Apache Falcon - Feed control and information processing platform
Adv Java Tutorial
Question five. Explain Types Of Oozie Jobs?
Answer :
Oozie helps activity scheduling for the total Hadoop stack like Apache MapReduce, Apache Hive, Apache Sqoop and Apache Pig.
It includes two components:
Workflow engine: Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs e.G., MapReduce, Pig, Hive.
Coordinator engine: It runs workflow jobs based totally on predefined schedules and availability of records.
Sqoop Interview Questions
Question 6. Explain Oozie Workflow?
Answer :
An Oozie Workflow is a collection of moves arranged in a Directed Acyclic Graph (DAG) . Control nodes outline process chronology, putting regulations for starting and ending a workflow, which controls the workflow execution path with choice, fork and join nodes. Action nodes trigger the execution of tasks.
Workflow nodes are labeled in control flow nodes and motion nodes:
Control waft nodes: nodes that control the start and stop of the workflow and workflow task execution course.
Action nodes: nodes that trigger the execution of a computation/processing undertaking.
Workflow definitions may be parameterized.The parameterization of workflow definitions it completed the use of JSP Expression Language syntax , permitting not most effective to aid variables as parameters however additionally functions and complex expressions.
Question 7. What Is Oozie Workflow Application?
Answer :
Workflow software is a ZIP document that consists of the workflow definition and the essential files to run all the movements.
It contains the following files:
Configuration record – config-default.Xml
App documents – lib/ directory with JAR and SO files
Pig scripts
Sqoop Tutorial Apache Spark Interview Questions
Question 8. What Are The Properties That We Have To Mention In .Residences?
Answer :
Name Node
Job Tracker
Oozie.Wf.Utility.Direction
Lib Path
Jar Path
Question 9. What Are The Extra Files We Need When We Run A Hive Action In Oozie?
Answer :
hive.Hql
hive-site.Xml
Apache Hive Interview Questions
Question 10. What Is Decision Node In Oozie?
Answer :
Decision Nodes are switch statements with a view to run distinctive jobs based on the outcomes of an expression.
Apache Hive Tutorial
Question 11. Explain Oozie Coordinator?
Answer :
Oozie Coordinator jobs are recurrent Oozie Workflow jobs which can be triggered through time and facts availability.Oozie Coordinator can also manage a couple of workflows that are dependent on the outcome of subsequent workflows. The outputs of subsequent workflows grow to be the enter to the following workflow. This chain is referred to as a 'facts utility pipeline'.
Oozie techniques coordinator jobs in a hard and fast timezone with out a DST (generally UTC ), this timezone is referred as ‘Oozie processing timezone’. The Oozie processing timezone is used to solve coordinator jobs begin/end times, activity pause times and the initial-example of datasets. Also, all coordinator dataset instance URI templates are resolved to a datetime inside the Oozie processing time-area.
The usage of Oozie Coordinator can be labeled in three specific segments:
Small: along with a single coordinator software with embedded dataset definitions
Medium: consisting of a single shared dataset definitions and a few coordinator programs
Large: together with a unmarried or a couple of shared dataset definitions and several coordinator packages
Apache Pig Interview Questions
Question 12. Explain Briefly About Oozie Bundle ?
Answer :
Oozie Bundle is a higher-stage oozie abstraction that will batch a set of coordinator applications. The user can be capable of begin/forestall/suspend/resume/rerun within the package deal stage ensuing a better and clean operational control.
More specififcally, the oozie Bundle system allows the consumer to outline and execute a bunch of coordinator programs regularly known as a information pipeline. There isn't any specific dependency among the coordinator applications in a bundle. However, a user may want to use the records dependency of coordinator programs to create an implicit records application pipeline.
Oozie executes workflow based totally on:
Time Dependency(Frequency)
Data Dependency
Adv Java Interview Questions
Question 13. What Is Application Pipeline In Oozie?
Answer :
It is important to connect workflow jobs that run regularly, but at one-of-a-kind time periods. The outputs of multiple next runs of a workflow emerge as the enter to the subsequent workflow. Chaining together those workflows result it is referred as a records software pipeline.
Apache Pig Tutorial
Question 14. How Does Oozie Work?
Answer :
Oozie runs as a provider within the cluster and customers submit workflow definitions for fast or later processing. Oozie workflow includes action nodes and manage-flow nodes.
An motion node represents a workflow undertaking, e.G., shifting documents into HDFS, strolling a MapReduce, Pig or Hive jobs, uploading information the usage of Sqoop or jogging a shell script of a application written in Java.
A manipulate-waft node controls the workflow execution among actions by way of allowing constructs like conditional logic wherein distinctive branches may be accompanied relying on the end result of in advance motion node. Start Node, End Node and Error Node fall beneath this class of nodes.
Start Node, designates begin of the workflow activity.
End Node, indicators give up of the task.
Error Node, designates an prevalence of errors and corresponding mistakes message to be published.
At the cease of execution of workflow, HTTP callback is utilized by Oozie to update patron with the workflow fame. Entry-to or go out-from an movement node may also cause callback.
Question 15. How To Deploy Application?
Answer :
$ hadoop fs-positioned wordcount-wf hdfs://bar.Com:9000/usr/abc/wordcount
Hadoop Administration Interview Questions
Question 16. Mention Workflow Job Parameters?
Answer :
$ cat process.Homes
Oozie.Wf.Software.Route=hdfs://bar.Com:9000/usr/abc/wordcount
Input=/usr/abc/enter-facts
Output=/usr/abc/output-statistics
Apache Flume Tutorial
Question 17. How To Execute Job?
Answer :
$ oozie activity –run –config process.Homes
Job:1-20090525161321-oozie-xyz-W
Apache Flume Interview Questions
Question 18. What Are All The Actions Can Be Performed In Oozie?
Answer :
Email Action
Hive Action
Shell Action
Ssh Action
Sqoop Action
Writing a custom Action Executor
Sqoop Interview Questions
Question 19. Why We Use Fork And Join Nodes Of Oozie?
Answer :
A fork node splits one route of execution into multiple concurrent paths of execution.
A join node waits till each concurrent execution course of a preceding fork node arrives to it.
The fork and join nodes must be utilized in pairs. The join node assumes concurrent execution paths are youngsters of the identical fork node.
Data Structure & Algorithms Tutorial
Question 20. Why Oozie Security?
Answer :
User aren't allowed to adjust task of any other user
Hadoop does no longer guide the authentication of quit person
Oozie has to verify and confirms its person before shifting the process to Hadoop
NoSQL Interview Questions

