CrowdforGeeks | Build Skills with Online Courses from Top Institutions

Top 39 Hadoop Cluster Interview Questions

Q1. How Can I Restart Namenode?

Click on stop-all.Sh and then click on begin-all.Sh OR

Write sudo hdfs (press enter), su-hdfs (press input), /and so on/init.D/ha (press enter) and then /and many others/init.D/hadoop-zero.20-namenode begin (press input).

Q2. What Are The Features Of Fully Distributed Mode?

In Fully Distributed mode, the clusters variety from some nodes to 'n' variety of nodes. It is used in manufacturing environments, where we have heaps of machines inside the Hadoop cluster. The daemons of Hadoop run on those clusters. We must configure separate masters and separate slaves on this distribution, the implementation of that is pretty complex. In this configuration, Namenode and Datanode runs on distinct hosts and there are nodes on which venture tracker runs. The root of the distribution is referred as HADOOP_HOME.

Q3. How Can We Look For The Namenode In The Browser?

If you have to look for Namenode inside the browser, you don’t must provide localhost: 8021, the port number to look for Namenode in the browser is 50070.

Q4. How To Change From Su To Cloudera?

To exchange from SU to Cloudera simply kind go out.

Q5. What Is Ssh? Explain In Detail About Ssh Communication Between Masters And The Slaves?

Secure Socket Shell or SSH is a password-less relaxed verbal exchange that offers administrators with a relaxed manner to access a remote pc and records packets are sent throughout the slave. This community protocol also has some layout into which statistics is despatched throughout. SSH communique is not handiest between masters and slaves however also among hosts in a network. SSH appeared in 1995 with the introduction of SSH - @Now SSH 2 is in use, with the vulnerabilities coming to the fore when Edward Snowden leaked facts with the aid of decrypting a few SSH visitors.

Q6. What Are The Hadoop Configuration Files At Present?

There are three configuration documents in Hadoop:

@conf/center-web site.Xml:

fs.Default.Name

hdfs: //localhost:9000

@conf/hdfs-website.Xml:

dfs.Replication 1

@conf/mapred-site.Xml:

mapred.Job.Tracker local host: 9001

Q7. Explain What Do You Mean By Formatting Of The Dfs?

Like we do in Windows, DFS is formatted for proper structuring of facts. It is not typically recommended to do because it format the Namenode too inside the process, which isn't always desired.

Q8. What Are The Network Requirements For Hadoop?

The Hadoop middle uses Shell (SSH) to release the server strategies at the slave nodes. It requires password-less SSH connection among the grasp and all of the slaves and the Secondary machines.

Q9. Can We Have Multiple Entries In The Master Files?

Yes, we can have a couple of entries in the Master documents.

Q10. What Do Masters And Slaves Consist Of?

Masters incorporate a listing of hosts, one in keeping with line, which might be to host secondary namenode servers. Slaves include a listing of hosts, one in keeping with line, that host datanode and challenge tracker servers.

Q11. Can You Tell Me If We Can Create A Hadoop Cluster From Scratch?

Yes, we can truely try this. Once we come to be familiar with the Apache Hadoop environment, we are able to create a cluster from scratch.

Q12. What Is The Use Of The Command Mapred.Activity.Tracker?

The command mapred.Process.Tracker is utilized by the Job Tracker to listing out which host and port that the MapReduce activity tracker runs at. If it's far "nearby", then jobs are run in-method as a single map and reduce assignment.

Q13. We Use Unix Variants For Hadoop. Can We Use Microsoft Windows For The Same?

In practicality, Ubuntu and Red Hat Linux are the pleasant Operating Systems for Hadoop. On the opposite hand, Windows can be used but it isn't used frequently for putting in Hadoop as there are numerous aid problems related to it. The frequency of crashes and the following restarts makes it unattractive. As such, Windows is not recommended as a preferred surroundings for Hadoop Installation, even though customers can supply it a try for getting to know functions within the preliminary level.

Q14. Why Do We Need A Password-less Ssh In Fully Distributed Environment?

We want a password-less SSH in a Fully-Distributed environment because while the cluster is LIVE and running in Fully Distributed environment, the conversation is simply too common. The task tracker have to be able to send a assignment to venture tracker speedy.

Q15. Which One Decides The Input Split - Hdfs Client Or Namenode?

The HDFS Client does no longer decide. It is already laid out in one of the configurations through which enter break up is already configured.

Q16. We Are Using Ubuntu Operating System With Cloudera, But From Where We Can Download Hadoop Or Does It Come By Default With Ubuntu?

This is a default configuration of Hadoop that you have to download from Cloudera or from eureka’s Dropbox and the run it in your systems. You can also proceed together with your very own configuration but you need a Linux field, be it Ubuntu or Red hat. There are installations steps present on the Cloudera area or in Eureka’s Drop container. You can go both approaches.

Q17. Which Files Are Used By The Startup And Shutdown Commands?

Slaves and Masters are utilized by the startup and the shutdown instructions.

Q18. Is Fs.Mapr.Working.For A Single Directory?

Yes, fs.Mapr.Working.Dir it's far just one listing.

Q19. Can You Tell Is What Will Happen To A Namenode, When Job Tracker Is Not Up And Running?

When the activity tracker is down, it'll not be in useful mode, all walking jobs could be halted due to the fact it's miles a single point of failure. Your entire cluster may be down however nonetheless Namenode might be present. As such the cluster will still be available if Namenode is operating, even though the job tracker isn't always up and going for walks. But you can not run your Hadoop process.

Q20. In Hadoop_pid_dir, What Does Pid Stands For?

PID stands for ‘Process ID’.

Q21. What Happens To Job Tracker When Namenode Is Down?

Namenode is the primary factor which maintains all of the metadata, preserve tracks of failure of datanode with the help of coronary heart beats. As such whilst a namenode is down, your cluster could be completely down, due to the fact Namenode is the single factor of failure in a Hadoop Installation.

Q22. How Can We Check Whether Namenode Is Working Or Not?

To test whether or not Namenode is running or now not, use the command /and so on/init.D/hadoop- zero.20-namenode popularity or as simple as jps’.

Q23. Explain What Are The Features Of Standalone (nearby) Mode?

In stand-by myself or local mode there are no Hadoop daemons jogging, and the entirety runs on a unmarried Java manner. Hence, we don't get the benefit of distributing the code across a cluster of machines. Since, it has no DFS, it utilizes the local report gadget. This mode is suitable most effective for jogging MapReduce programs by developers for the duration of various levels of development. Its the high-quality environment for getting to know and good for debugging purposes.

Q24. Explain In Brief The Three Modes In Which Hadoop Can Be Run?

The three modes in which Hadoop can be run are:

Standalone (neighborhood) mode - No Hadoop daemons running, everything runs on a unmarried Java Virtual system only.

Pseudo-distributed mode - Daemons run on the neighborhood gadget, thereby simulating a cluster on a smaller scale.

Fully disbursed mode - Runs on a cluster of machines.

Q25. What Will Happen If A Namenode Has No Data?

If a Namenode has no information it can't be taken into consideration as a Namenode. In practical phrases, Namenode wishes to have a few data.

Q26. Which Is The Directory Where Hadoop Is Installed?

Cloudera and Apache have the same listing structure. Hadoop is established in cd /usr/lib/hadoop-0.20/.

Q27. What Does Hadoop-metrics? Properties File Do?

Hadoop-metrics Properties is used for ‘Reporting‘functions. It controls the reporting for hadoop. The default fame is ‘now not to file‘.

Q28. Explain The Significance Of Ssh? What Is The Port On Which Port Does Ssh Work? Why Do We Need Password In Ssh Local Host?

SSH is a cozy shell verbal exchange, is a cozy protocol and the maximum not unusual manner of administering remote servers adequately, enormously quite simple and inexpensive to enforce. A single SSH connection can host a couple of channels and for this reason can trfer statistics in each instructions. SSH works on Port No. 22, and it's far the default port number. However, it is able to be configured to factor to a brand new port variety, but its now not advocated. In neighborhood host, password is needed in SSH for protection and in a situation wherein password much less verbal exchange isn't set.

Q29. Explain What Are The Main Features Of Pseudo Mode?

In Pseudo-disbursed mode, each Hadoop daemon runs in a separate Java technique, as such it simulates a cluster although on a small scale. This mode is used each for development and QA environments. Here, we want to do the configuration adjustments.

Q30. Which Are The Three Main Hdfs-website.Xml Properties?

The three foremost hdfs-website.Xml properties are:

Dfs.Name.Dir which gives you the area on which metadata will be stored and where DFS is positioned – on disk or onto the remote.

Dfs.Facts.Dir which offers you the location in which the statistics goes to be stored.

Fs.Checkpoint.Dir that's for secondary Namenode.

Q31. What Is "fsck" And What Is Its Use?

"fsck" is File System Check. FSCK is used to check the fitness of a Hadoop Filesystem. It generates a summarized document of the overall fitness of the filesystem.

Usage: hadoop fsck /

Q32. Tell Us What Cloudera Is And Why It Is Used In Big Data?

Cloudera is the leading Hadoop distribution vendor at the Big Data marketplace, its termed as the next-technology information control software that is required for business essential facts demanding situations that consists of access, garage, management, commercial enterprise analytics, structures security, and search.

Q33. Explain About The Hadoop-core Configuration Files?

Hadoop center is specified by means of two resources. It is configured by way of well written xml documents that are loaded from the classpath:

Hadoop-default.Xml - Read-handiest defaults for Hadoop, suitable for a single machine example.

Hadoop-website.Xml - It specifies the web site configuration for Hadoop distribution. The cluster particular data is also supplied by means of the Hadoop administrator.

Q34. What Are The Port Numbers Of Name Node, Job Tracker And Task Tracker?

The port variety for Namenode is ’70′, for activity tracker is ’30′ and for project tracker is ’60′.

Q35. How To Come Out Of The Insert Mode?

To pop out of the insert mode, press ESC,

Type: q (if you have no longer written some thing) OR

Type: wq (if you have written something in the file) after which press ENTER.

Q36. What Does /and so forth /init.D Do?

/and so forth /init.D specifies where daemons (services) are positioned or to peer the reputation of these daemons. It is very LINUX particular, and not anything to do with Hadoop.

Q37. Tell Us What Is A Spill Factor With Respect To The Ram?

Spill factor is the dimensions after which your documents pass to the temp report. Hadoop-temp directory is used for this. Default value for io.Type.Spill.Percentage is 0.Eight@A cost less than zero.Five isn't endorsed.

Q38. Can You Name Some Companies That Are Using Hadoop?

Numerous agencies are using Hadoop, from huge Software Companies, MNCs to small groups. Yahoo is the pinnacle contributor with many open supply Hadoop Softwares and frameworks. Social Media Companies like Facebook and Twitter were the usage of for a long time now for storing their mammoth statistics. Apart from that Netflix, IBM, Adobe and e-trade web sites like Amazon and eBay are also the usage of multiple Hadoop technologies.

Q39. What Is The Function Of Hadoop-env.Sh? Where Is It Present?

This report carries some surroundings variable settings utilized by Hadoop; it provides the environment for Hadoop to run. The route of JAVA_HOME is ready here for it to run well. Hadoop-env.Sh document is gift inside the conf/hadoop-env.Sh location. You can also create your very own custom configuration report conf/hadoop-user-env.Sh, so one can assist you to override the default Hadoop settings.