CrowdforGeeks | Build Skills with Online Courses from Top Institutions

Top 100+ Hadoop Administration Interview Questions And Answers

Question 1. How Will You Decide Whether You Need To Use The Capacity Scheduler Or The Fair Scheduler?

Answer :

Fair Scheduling is the system wherein assets are assigned to jobs such that each one jobs get to proportion equal quantity of assets over the years.

Fair Scheduler may be used underneath the subsequent occasions:

i) If you wishes the jobs to make same progress rather than following the FIFO order then you definately ought to use Fair Scheduling.

Ii) If you have got slow connectivity and facts locality plays a crucial position and makes a sizable difference to the task runtime then you definitely have to use Fair Scheduling.

Iii) Use honest scheduling if there's lot of variability in the usage between swimming pools.

Capacity Scheduler lets in runs the hadoop mapreduce cluster as a shared, multi-tenant cluster to maximise the utilization of the hadoop cluster and throughput.

Capacity Scheduler may be used beneath the following circumstances:

i) If the roles require scheduler detrminism then Capacity Scheduler may be useful.

Ii) CS's reminiscence based scheduling approach is beneficial if the roles have various reminiscence necessities.

Iii) If you need to put into effect useful resource allocation because you realize very well about the cluster utilization and workload then use Capacity Scheduler.

Question 2. What Are The Daemons Required To Run A Hadoop Cluster?

Answer :

NameNode, DataNode, TaskTracker and JobTracker

Informatica Interview Questions
Question three. How Will You Restart A Namenode?

Answer :

The simplest manner of doing this is to run the command to prevent walking shell script i.E. Click on prevent-all.Sh. Once this is finished, restarts the NameNode through clicking on start-all.Sh.

Question 4. Explain About The Different Schedulers Available In Hadoop.?

Answer :

FIFO Scheduler – This scheduler does now not bear in mind the heterogeneity within the gadget but orders the roles based totally on their arrival times in a queue.

COSHH- This scheduler considers the workload, cluster and the person heterogeneity for scheduling decisions.

Fair Sharing-This Hadoop scheduler defines a pool for every person. The pool incorporates some of map and decrease slots on a aid. Each person can use their personal pool to execute the roles.

Informatica Tutorial
Question 5. List Few Hadoop Shell Commands That Are Used To Perform A Copy Operation.?

Answer :

fs –placed
fs –copyToLocal
fs –copyFromLocal
Teradata Interview Questions
Question 6. What Is Jps Command Used For?

Answer :

jps command is used to verify whether or not the daemons that run the Hadoop cluster are running or no longer. The output of jps command suggests the status of the NameNode, Secondary NameNode, DataNode, TaskTracker and JobTracker.

Question 7. What Are The Important Hardware Considerations When Deploying Hadoop In Production Environment?

Answer :

Memory-System’s memory requirements will range between the worker offerings and control services primarily based at the utility.

Operating System - a sixty four-bit working system avoids any restrictions to be imposed on the amount of reminiscence that can be used on worker nodes.

Storage- It is premier to layout a Hadoop platform through moving the compute pastime to records to attain scalability and high overall performance.

Capacity- Large Form Factor (three.5”) disks price much less and allow to store more, when as compared to Small Form Factor disks.

Network - Two TOR switches in keeping with rack offer better redundancy.

Computational Capacity- This can be determined by using the whole number of MapReduce slots to be had throughout all of the nodes inside a Hadoop cluster.

Teradata Tutorial Hadoop Interview Questions
Question 8. How Many Namenodes Can You Run On A Single Hadoop Cluster?

Answer :

Only one.

Question nine. What Happens When The Namenode On The Hadoop Cluster Goes Down?

Answer :

The file system goes offline on every occasion the NameNode is down.

Java Interview Questions
Question 10. What Is The Conf/hadoop-env.Sh File And Which Variable In The File Should Be Set For Hadoop To Work?

Answer :

This document offers an surroundings for Hadoop to run and includes the following variables-HADOOP_CLASSPATH, JAVA_HOME and HADOOP_LOG_DIR. JAVA_HOME variable should be set for Hadoop to run.

Hadoop Tutorial
Question eleven. Apart From Using The Jps Command Is There Any Other Way That You Can Check Whether The Namenode Is Working Or Not.?

Answer :

Use the command -/and so on/init.D/hadoop-zero.20-namenode status.

Hadoop MapReduce Interview Questions
Question 12. In A Mapreduce System, If The Hdfs Block Size Is sixty four Mb And There Are three Files Of Size 127mb, 64k And 65mb With Fileinputformat. Under This Scenario, How Many Input Splits Are Likely To Be Made By The Hadoop Framework.?

Answer :

2 splits each for 127 MB and sixty five MB documents and 1 break up for the 64KB document.

Informatica Interview Questions
Question 13. Which Command Is Used To Verify If The Hdfs Is Corrupt Or Not?

Answer :

Hadoop FSCK (File System Check) command is used to check lacking blocks.

Java Tutorial
Question 14. List Some Use Cases Of The Hadoop Ecosystem?

Answer :

Text Mining, Graph Analysis, Semantic Analysis, Sentiment Analysis, Recommendation Systems.

Question 15. How Can You Kill A Hadoop Job?

Answer :

Hadoop job –kill jobID

Apache Pig Interview Questions
Question 16. I Want To See All The Jobs Running In A Hadoop Cluster. How Can You Do This?

Answer :

Using the command – Hadoop process –list, gives the listing of jobs strolling in a Hadoop cluster.

Hadoop MapReduce Tutorial
Question 17. Is It Possible To Copy Files Across Multiple Clusters? If Yes, How Can You Accomplish This?

Answer :

Yes, it's far feasible to duplicate files across more than one Hadoop clusters and this could be accomplished the usage of distributed reproduction. DistCP command is used for intra or inter cluster copying.

Machine studying Interview Questions
Question 18. Which Is The Best Operating System To Run Hadoop?

Answer :

Ubuntu or Linux is the maximum preferred operating machine to run Hadoop. Though Windows OS also can be used to run Hadoop however it'll cause numerous troubles and isn't always encouraged.

Teradata Interview Questions
Question 19. What Are The Network Requirements To Run Hadoop?

Answer :

SSH is needed to run - to launch server approaches on the slave nodes.
A password much less SSH connection is needed among the grasp, secondary machines and all of the slaves.
Apache Pig Tutorial
Question 20. The Mapred.Output.Compress Property Is Set To True, To Make Sure That All Output Files Are Compressed For Efficient Space Usage On The Hadoop Cluster. In Case Under A Particular Condition If A Cluster User Does Not Require Compressed Data For A Job. What Would You Suggest That He Do?

Answer :

If the consumer does no longer want to compress the facts for a specific process then he need to create his own configuration document and set the mapred.Output.Compress belongings to fake. This configuration record then should be loaded as a aid into the process.

NoSQL Interview Questions
Question 21. What Is The Best Practice To Deploy A Secondary Namenode?

Answer :

It is usually better to installation a secondary NameNode on a separate standalone machine. When the secondary NameNode is deployed on a separate gadget it does now not intrude with the operations of the primary node.

Question 22. How Often Should The Namenode Be Reformatted?

Answer :

The NameNode ought to by no means be reformatted. Doing so will bring about complete statistics loss. NameNode is formatted best as soon as at the start after which it creates the listing shape for record machine metadata and namespace ID for the entire record gadget.

HBase Tutorial
Question 23. If Hadoop Spawns one hundred Tasks For A Job And One Of The Job Fails. What Does Hadoop Do?

Answer :

The project may be commenced again on a new TaskTracker and if it fails extra than four times which is the default setting (the default price may be modified), the job will be killed.

HBase Interview Questions
Question 24. How Can You Add And Remove Nodes From The Hadoop Cluster?

Answer :

To add new nodes to the HDFS cluster, the hostnames have to be brought to the slaves document after which DataNode and TaskTracker have to be commenced on the brand new node.
To eliminate or decommission nodes from the HDFS cluster, the hostnames ought to be removed from the slaves file and –refreshNodes need to be carried out.
Hadoop Interview Questions
Question 25. You Increase The Replication Level But Notice That The Data Is Under Replicated. What Could Have Gone Wrong?

Answer :

Nothing could have absolutely incorrect, if there's large extent of information because records replication commonly takes instances based totally on facts size because the cluster has to copy the information and it might take a few hours.

MongoDB Tutorial
Question 26. Explain About The Different Configuration Files And Where Are They Located.?

Answer :

The configuration files are placed in “conf” sub directory. Hadoop has 3 one-of-a-kind Configuration documents- hdfs-web page.Xml, middle-web page.Xml and mapred-web site.Xml.

MongoDB Interview Questions