YouTube Icon

Interview Questions.

Top 26 Hadoop Administration Interview Questions - Jul 25, 2022

fluid

Top 26 Hadoop Administration Interview Questions

Q1. The Mapred.Output.Compress Property Is Set To True, To Make Sure That All Output Files Are Compressed For Efficient Space Usage On The Hadoop Cluster. In Case Under A Particular Condition If A Cluster

If the user does not want to compress the facts for a selected job then he must create his own configuration record and set the mapred.Output.Compress belongings to fake. This configuration record then have to be loaded as a resource into the process.

Q2. How Will You Restart A Namenode?

The easiest manner of doing that is to run the command to stop strolling shell script i.E. Click on prevent-all.Sh. Once that is carried out, restarts the NameNode by means of clicking on begin-all.Sh.

Q3. Which Is The Best Operating System To Run Hadoop?

Ubuntu or Linux is the maximum favored running machine to run Hadoop. Though Windows OS also can be used to run Hadoop however it's going to cause numerous troubles and is not endorsed.

Q4. Apart From Using The Jps Command Is There Any Other Way That You Can Check Whether The Namenode Is Working Or Not.?

Use the command -/etc/init.D/hadoop-zero.20-namenode fame.

Q5. How Will You Decide Whether You Need To Use The Capacity Scheduler Or The Fair Scheduler?

Fair Scheduling is the process wherein resources are assigned to jobs such that all jobs get to share equal variety of assets through the years.

Fair Scheduler can be used under the subsequent instances:

i) If you desires the jobs to make same development instead of following the FIFO order then you ought to use Fair Scheduling.

Ii) If you have gradual connectivity and facts locality performs a important position and makes a considerable distinction to the process runtime then you definately should use Fair Scheduling.

Iii) Use honest scheduling if there's lot of variability in the utilization between pools.

Capacity Scheduler allows runs the hadoop mapreduce cluster as a shared, multi-tenant cluster to maximize the usage of the hadoop cluster and throughput.

Capacity Scheduler may be used below the following circumstances:

i) If the jobs require scheduler detrminism then Capacity Scheduler may be beneficial.

Ii) CS's memory based scheduling approach is beneficial if the jobs have various reminiscence requirements.

Iii) If you need to put into effect resource allocation  because you know very well about the cluster usage and workload then use Capacity Scheduler.

Q6. How Many Namenodes Can You Run On A Single Hadoop Cluster?

Only one.

Q7. Is It Possible To Copy Files Across Multiple Clusters? If Yes, How Can You Accomplish This?

Yes, it's far feasible to duplicate files across multiple Hadoop clusters and this will be completed using dispensed reproduction. DistCP command is used for intra or inter cluster copying.

Q8. What Is The Conf/hadoop-env.Sh File And Which Variable In The File Should Be Set For Hadoop To Work?

This file gives an surroundings for Hadoop to run and consists of the following variables-HADOOP_CLASSPATH, JAVA_HOME and HADOOP_LOG_DIR. JAVA_HOME variable need to be set for Hadoop to run.

Q9. I Want To See All The Jobs Running In A Hadoop Cluster. How Can You Do This?

Using the command – Hadoop job –listing, offers the listing of jobs going for walks in a Hadoop cluster.

Q10. How Often Should The Namenode Be Reformatted?

The NameNode need to in no way be reformatted. Doing so will bring about complete data loss. NameNode is formatted simplest as soon as at the start after which it creates the listing shape for document system metadata and namespace ID for the complete report device.

Q11. If Hadoop Spawns 100 Tasks For A Job And One Of The Job Fails. What Does Hadoop Do?

The task may be started again on a new TaskTracker and if it fails extra than four times that is the default putting (the default price can be modified), the job might be killed.

Q12. Which Command Is Used To Verify If The Hdfs Is Corrupt Or Not?

Hadoop FSCK (File System Check) command is used to test missing blocks.

Q13. Explain About The Different Schedulers Available In Hadoop.?

FIFO Scheduler – This scheduler does now not don't forget the heterogeneity in the device but orders the roles based totally on their arrival times in a queue.

COSHH- This scheduler considers the workload, cluster and the consumer heterogeneity for scheduling choices.

Fair Sharing-This Hadoop scheduler defines a pool for each consumer. The pool consists of some of map and decrease slots on a aid. Each consumer can use their own pool to execute the roles.

Q14. What Are The Network Requirements To Run Hadoop?

SSH is required to run - to release server procedures on the slave nodes.

A password much less SSH connection is required among the grasp, secondary machines and all the slaves.

Q15. How Can You Kill A Hadoop Job?

Hadoop job –kill jobID

Q16. List Some Use Cases Of The Hadoop Ecosystem?

Text Mining, Graph Analysis, Semantic Analysis, Sentiment Analysis, Recommendation Systems.

Q17. List Few Hadoop Shell Commands That Are Used To Perform A Copy Operation.?

Fs –positioned

fs –copyToLocal

fs –copyFromLocal

Q18. What Is Jps Command Used For?

Jps command is used to affirm whether the daemons that run the Hadoop cluster are operating or no longer. The output of jps command suggests the repute of the NameNode, Secondary NameNode, DataNode, TaskTracker and JobTracker.

Q19. What Happens When The Namenode On The Hadoop Cluster Goes Down?

The record gadget goes offline on every occasion the NameNode is down.

Q20. In A Mapreduce System, If The Hdfs Block Size Is 64 Mb And There Are three Files Of Size 127mb, 64k And 65mb With Fileinputformat. Under This Scenario, How Many Input Splits Are Likely To Be Made By The H

2 splits each for 127 MB and sixty five MB files and 1 split for the 64KB record.

Q21. How Can You Add And Remove Nodes From The Hadoop Cluster?

To upload new nodes to the HDFS cluster, the hostnames should be brought to the slaves file after which DataNode and TaskTracker need to be started out on the new node.

To cast off or decommission nodes from the HDFS cluster, the hostnames must be eliminated from the slaves report and –refreshNodes have to be completed.

Q22. What Are The Important Hardware Considerations When Deploying Hadoop In Production Environment?

Memory-System’s reminiscence requirements will range among the worker offerings and control services primarily based at the software.

Operating System - a sixty four-bit working gadget avoids any restrictions to be imposed on the amount of reminiscence that may be used on worker nodes.

Storage- It is leading to design a Hadoop platform by means of shifting the compute pastime to facts to obtain scalability and excessive overall performance.

Capacity- Large Form Factor (3.5”) disks fee much less and allow to shop more, when as compared to Small Form Factor disks.

Network - Two TOR switches in keeping with rack provide higher redundancy.

Computational Capacity- This may be determined through the full wide variety of MapReduce slots available across all the nodes within a Hadoop cluster.

Q23. You Increase The Replication Level But Notice That The Data Is Under Replicated. What Could Have Gone Wrong?

Nothing could have actually wrong, if there is massive extent of facts due to the fact records replication usually takes instances based on data length as the cluster has to replicate the records and it'd take a few hours.

Q24. What Is The Best Practice To Deploy A Secondary Namenode?

It is constantly better to deploy a secondary NameNode on a separate standalone system. When the secondary NameNode is deployed on a separate system it does no longer intervene with the operations of the primary node.

Q25. Explain About The Different Configuration Files And Where Are They Located.?

The configuration documents are placed in “conf” sub directory. Hadoop has three exclusive Configuration documents- hdfs-website.Xml, core-site.Xml and mapred-website online.Xml.

Q26. What Are The Daemons Required To Run A Hadoop Cluster?

NameNode, DataNode, TaskTracker and JobTracker




CFG