CrowdforGeeks | Build Skills with Online Courses from Top Institutions

Top 100+ Hadoop Cluster Interview Questions And Answers

Question 1. Explain About The Hadoop-core Configuration Files?

Answer :

Hadoop core is specified through sources. It is configured by way of properly written xml documents that are loaded from the classpath:

Hadoop-default.Xml - Read-best defaults for Hadoop, appropriate for a single system instance.
Hadoop-website online.Xml - It specifies the web site configuration for Hadoop distribution. The cluster unique information is also provided by means of the Hadoop administrator.
Question 2. Explain In Brief The Three Modes In Which Hadoop Can Be Run?

Answer :

The three modes wherein Hadoop may be run are:

Standalone (neighborhood) mode - No Hadoop daemons running, the entirety runs on a single Java Virtual system only.
Pseudo-distributed mode - Daemons run on the neighborhood system, thereby simulating a cluster on a smaller scale.
Fully disbursed mode - Runs on a cluster of machines.
Python Interview Questions
Question three. Explain What Are The Features Of Standalone (local) Mode?

Answer :

In stand-alone or local mode there are not any Hadoop daemons going for walks, and everything runs on a unmarried Java manner. Hence, we don't get the benefit of distributing the code throughout a cluster of machines. Since, it has no DFS, it utilizes the neighborhood report machine. This mode is suitable simplest for strolling MapReduce applications with the aid of builders at some stage in various stages of development. Its the great environment for gaining knowledge of and proper for debugging purposes.

Question 4. What Are The Features Of Fully Distributed Mode?

Answer :

In Fully Distributed mode, the clusters range from some nodes to 'n' quantity of nodes. It is used in manufacturing environments, in which we've lots of machines within the Hadoop cluster. The daemons of Hadoop run on these clusters. We have to configure separate masters and separate slaves on this distribution, the implementation of that is quite complex. In this configuration, Namenode and Datanode runs on specific hosts and there are nodes on which task tracker runs. The root of the distribution is referred as HADOOP_HOME.

Python Tutorial
Question 5. Explain What Are The Main Features Of Pseudo Mode?

Answer :

In Pseudo-disbursed mode, each Hadoop daemon runs in a separate Java manner, as such it simulates a cluster although on a small scale. This mode is used both for development and QA environments. Here, we want to do the configuration changes.

Hadoop Interview Questions
Question 6. What Are The Hadoop Configuration Files At Present?

Answer :

There are 3 configuration documents in Hadoop:

1. Conf/middle-site.Xml:

fs.Default.Name

hdfs: //localhost:9000

2. Conf/hdfs-web site.Xml:

dfs.Replication 1

three. Conf/mapred-web site.Xml:

mapred.Activity.Tracker neighborhood host: 9001

Question 7. Can You Name Some Companies That Are Using Hadoop?

Answer :

Numerous organizations are using Hadoop, from huge Software Companies, MNCs to small businesses. Yahoo is the top contributor with many open supply Hadoop Softwares and frameworks. Social Media Companies like Facebook and Twitter have been using for a long time now for storing their monstrous facts. Apart from that Netflix, IBM, Adobe and e-commerce web sites like Amazon and eBay are also the usage of more than one Hadoop technologies.

Hadoop Tutorial Java Interview Questions
Question eight. Which Is The Directory Where Hadoop Is Installed?

Answer :

Cloudera and Apache have the identical directory structure. Hadoop is set up in cd /usr/lib/hadoop-0.20/.

Question 9. What Are The Port Numbers Of Name Node, Job Tracker And Task Tracker?

Answer :

The port wide variety for Namenode is ’70′, for activity tracker is ’30′ and for venture tracker is ’60′.

Apache Hive Interview Questions
Question 10. Tell Us What Is A Spill Factor With Respect To The Ram?

Answer :

Spill factor is the dimensions after which your documents pass to the temp document. Hadoop-temp directory is used for this. Default cost for io.Kind.Spill.Percentage is zero.Eighty. A fee less than 0.Five isn't always endorsed.

Java Tutorial
Question eleven. Is Fs.Mapr.Working.For A Single Directory?

Answer :

Yes, fs.Mapr.Running.Dir it's far simply one directory.

Hadoop MapReduce Interview Questions
Question 12. Which Are The Three Main Hdfs-website.Xml Properties?

Answer :

The three primary hdfs-web page.Xml houses are:

Dfs.Name.Dir which offers you the region on which metadata could be stored and in which DFS is positioned – on disk or onto the faraway.
Dfs.Records.Dir which gives you the vicinity where the statistics goes to be saved.
Fs.Checkpoint.Dir that's for secondary Namenode.
Python Interview Questions
Question thirteen. How To Come Out Of The Insert Mode?

Answer :

To pop out of the insert mode, press ESC,

Type: q (if you have now not written some thing) OR

Type: wq (if you have written whatever inside the report) and then press ENTER.

Apache Hive Tutorial
Question 14. Tell Us What Cloudera Is And Why It Is Used In Big Data?

Answer :

Cloudera is the leading Hadoop distribution supplier at the Big Data market, its termed as the next-generation data control software program that is required for commercial enterprise vital statistics demanding situations that consists of get right of entry to, storage, management, business analytics, structures safety, and search.

Question 15. We Are Using Ubuntu Operating System With Cloudera, But From Where We Can Download Hadoop Or Does It Come By Default With Ubuntu?

Answer :

This is a default configuration of Hadoop that you have to down load from Cloudera or from eureka’s Dropbox and the run it for your systems. You also can proceed along with your very own configuration but you need a Linux field, be it Ubuntu or Red hat. There are installations steps gift at the Cloudera vicinity or in Eureka’s Drop box. You can pass both ways.

Apache Pig Interview Questions
Question 16. What Is The Main Function Of The ‘jps’ Command?

Answer :

The jps’ command tests whether the Datanode, Namenode, tasktracker, jobtracker, and other additives are running or no longer in Hadoop. One thing to don't forget is that if you have started Hadoop services with sudo then you definately want to run JPS with sudo privileges else the popularity can be no longer shown.

Hadoop MapReduce Tutorial
Question 17. How Can I Restart Namenode?

Answer :

Click on stop-all.Sh after which click on on start-all.Sh OR
Write sudo hdfs (press enter), su-hdfs (press input), /and so on/init.D/ha (press input) and then /and many others/init.D/hadoop-0.20-namenode start (press enter).
Hadoop Administration Interview Questions
Question 18. How Can We Check Whether Namenode Is Working Or Not?

Answer :

To take a look at whether or not Namenode is operating or now not, use the command /and so on/init.D/hadoop- zero.20-namenode fame or as easy as jps’.

Hadoop Interview Questions
Question 19. What Is "fsck" And What Is Its Use?

Answer :

"fsck" is File System Check. FSCK is used to check the health of a Hadoop Filesystem. It generates a summarized report of the general health of the filesystem.

Usage: hadoop fsck /

Apache Pig Tutorial
Question 20. At Times You Get A ‘connection Refused Java Exception’ When You Run The File System Check Command Hadoop Fsck /?

Answer :

The maximum possible purpose is that the Namenode isn't always operating for your VM.

Hadoop Distributed File System (HDFS) Interview Questions
Question 21. What Is The Use Of The Command Mapred.Activity.Tracker?

Answer :

The command mapred.Task.Tracker is used by the Job Tracker to list out which host and port that the MapReduce activity tracker runs at. If it's far "neighborhood", then jobs are run in-procedure as a single map and decrease project.

Question 22. What Does /and so on /init.D Do?

Answer :

/and so on /init.D specifies in which daemons (offerings) are placed or to look the status of those daemons. It could be very LINUX specific, and not anything to do with Hadoop.

Question 23. How Can We Look For The Namenode In The Browser?

Answer :

If you have to search for Namenode within the browser, you don’t should deliver localhost: 8021, the port variety to look for Namenode inside the browser is 50070.

Java Hadoop Developer Interview Questions
Question 24. How To Change From Su To Cloudera?

Answer :

To alternate from SU to Cloudera just kind go out.

Java Interview Questions
Question 25. Which Files Are Used By The Startup And Shutdown Commands?

Answer :

Slaves and Masters are utilized by the startup and the shutdown instructions.

Question 26. What Do Masters And Slaves Consist Of?

Answer :

Masters incorporate a listing of hosts, one in line with line, which can be to host secondary namenode servers. Slaves consist of a list of hosts, one consistent with line, that host datanode and task tracker servers.

Hadoop Testing Interview Questions
Question 27. What Is The Function Of Hadoop-env.Sh? Where Is It Present?

Answer :

This document carries a few surroundings variable settings utilized by Hadoop; it affords the environment for Hadoop to run. The direction of JAVA_HOME is about here for it to run properly. Hadoop-env.Sh document is gift in the conf/hadoop-env.Sh location. You also can create your very own custom configuration file conf/hadoop-person-env.Sh, with a purpose to will let you override the default Hadoop settings.

Apache Hive Interview Questions
Question 28. Can We Have Multiple Entries In The Master Files?

Answer :

Yes, we will have more than one entries in the Master files.

Question 29. In Hadoop_pid_dir, What Does Pid Stands For?

Answer :

PID stands for ‘Process ID’.

Question 30. What Does Hadoop-metrics? Properties File Do?

Answer :

Hadoop-metrics Properties is used for ‘Reporting‘purposes. It controls the reporting for hadoop. The default reputation is ‘no longer to document‘.

Question 31. What Are The Network Requirements For Hadoop?

Answer :

The Hadoop center makes use of Shell (SSH) to release the server tactics at the slave nodes. It calls for password-less SSH connection between the master and all of the slaves and the Secondary machines.

Question 32. Why Do We Need A Password-much less Ssh In Fully Distributed Environment?

Answer :

We need a password-less SSH in a Fully-Distributed environment due to the fact when the cluster is LIVE and walking in Fully Distributed environment, the communique is too common. The process tracker ought to be capable of ship a venture to mission tracker quickly.

Question 33. What Will Happen If A Namenode Has No Data?

Answer :

If a Namenode has no facts it cannot be taken into consideration as a Namenode. In sensible phrases, Namenode needs to have some facts.

Hadoop MapReduce Interview Questions
Question 34. What Happens To Job Tracker When Namenode Is Down?

Answer :

Namenode is the main factor which maintains all the metadata, preserve tracks of failure of datanode with the assist of heart beats. As such while a namenode is down, your cluster can be absolutely down, due to the fact Namenode is the single factor of failure in a Hadoop Installation.

Question 35. Explain What Do You Mean By Formatting Of The Dfs?

Answer :

Like we do in Windows, DFS is formatted for proper structuring of facts. It isn't always typically recommended to do as it layout the Namenode too inside the technique, which is not favored.

Question 36. We Use Unix Variants For Hadoop. Can We Use Microsoft Windows For The Same?

Answer :

In practicality, Ubuntu and Red Hat Linux are the nice Operating Systems for Hadoop. On the alternative hand, Windows can be used but it isn't used frequently for installing Hadoop as there are numerous aid issues associated with it. The frequency of crashes and the subsequent restarts makes it unattractive. As such, Windows isn't always advocated as a favored surroundings for Hadoop Installation, even though users can provide it a strive for gaining knowledge of functions in the preliminary degree.

Apache Pig Interview Questions
Question 37. Which One Decides The Input Split - Hdfs Client Or Namenode?

Answer :

The HDFS Client does no longer decide. It is already laid out in one of the configurations thru which input split is already configured.

Question 38. Let’s Take A Scenario, Let’s Say We Have Already Cloudera In A Cluster, Now If We Want To Form A Cluster On Ubuntu Can We Do It. Explain In Brief?

Answer :

Yes, we are able to clearly do it. We have all the useful installation steps for developing a new cluster. The most effective factor that needs to be accomplished is to uninstall the present cluster and set up the brand new cluster in the targeted surroundings.

Question 39. Can You Tell Me If We Can Create A Hadoop Cluster From Scratch?

Answer :

Yes, we will truely try this. Once we end up acquainted with the Apache Hadoop surroundings, we can create a cluster from scratch.

Question forty. Explain The Significance Of Ssh? What Is The Port On Which Port Does Ssh Work? Why Do We Need Password In Ssh Local Host?

Answer :

SSH is a secure shell communication, is a at ease protocol and the maximum commonplace way of administering faraway servers competently, exceedingly very simple and less expensive to implement. A unmarried SSH connection can host a couple of channels and for this reason can switch information in both instructions. SSH works on Port No. 22, and it's far the default port variety. However, it may be configured to factor to a brand new port range, but its no longer recommended. In nearby host, password is required in SSH for protection and in a scenario in which password less conversation isn't always set.

Hadoop Administration Interview Questions
Question 41. What Is Ssh? Explain In Detail About Ssh Communication Between Masters And The Slaves?

Answer :

Secure Socket Shell or SSH is a password-much less comfortable communication that gives directors with a at ease way to get admission to a faraway pc and information packets are sent across the slave. This network protocol additionally has some format into which records is despatched throughout. SSH communication isn't best between masters and slaves but also among hosts in a community. SSH regarded in 1995 with the introduction of SSH - 1. Now SSH 2 is in use, with the vulnerabilities coming to the fore while Edward Snowden leaked statistics with the aid of decrypting a few SSH visitors.

Question 42. Can You Tell Is What Will Happen To A Namenode, When Job Tracker Is Not Up And Running?

Answer :

When the job tracker is down, it will not be in purposeful mode, all running jobs may be halted because it's miles a single point of failure. Your whole cluster could be down but still Namenode may be present. As such the cluster will nonetheless be available if Namenode is operating, even supposing the activity tracker isn't up and running. But you cannot run your Hadoop job.

Hadoop Distributed File System (HDFS) Interview Questions