Interview Questions For Comprehensive MapReduce
Q1. What is Hadoop Map Reduce ?
Ans: For processing huge data units in parallel across a hadoop cluster, Hadoop MapReduce framework is used. Data evaluation makes use of a -step map and decrease method.
Q2. How Hadoop MapReduce works?
Ans: In MapReduce, at some point of the map segment it counts the words in every report, whilst within the lessen phase it aggregates the information as per the report spanning the entire series. During the map segment the enter data is divided into splits for analysis by using map responsibilities jogging in parallel throughout Hadoop framework.
Q3. Explain what's shuffling in MapReduce ?
Ans: The manner by which the device performs the type and transfers the map outputs to the reducer as inputs is referred to as the shuffle
Q4. Explain what is shipped Cache in MapReduce Framework ?
Ans: Distributed Cache is an crucial characteristic Aprovided via map lessen framework. When you want to proportion some files across all nodes in Hadoop Cluster, DistributedCache is used. The files will be an executable jar files or easy houses record.
Q5. Explain what's NameNode in Hadoop?
Ans: NameNode in Hadoop is the node, where Hadoop stores all the record place information in HDFS (Hadoop Distributed File System). In different phrases, NameNode is the centrepiece of an HDFS record device. It keeps the document of all the files in the report device, and tracks the report records throughout the cluster or more than one machines
Q6. Explain what's JobTracker in Hadoop? What are the moves accompanied by means of Hadoop?
Ans: In Hadoop for submitting and monitoring MapReduce jobs, JobTracker is used. Job tracker run on its personal JVM method
Hadoop performs following actions in Hadoop
Client application post jobs to the process tracker
JobTracker communicates to the Namemode to determine facts place
Near the information or with available slots JobTracker locates TaskTracker nodes
On selected TaskTracker Nodes, it submits the paintings
When a challenge fails, Job tracker notify and decides what to do then.
The TaskTracker nodes are monitored via JobTracker
Q7. Explain what's heartbeat in HDFS?
Ans: Heartbeat is referred to a sign used among a information node and Name node, and among project tracker and activity tracker, if the Name node or activity tracker does not reply to the sign, then it is taken into consideration there's some problems with information node or assignment tracker
Q8. Explain what combiners is and while you need to use a combiner in a MapReduce Job?
Ans: To growth the performance of MapReduce Program, Combiners are used. The amount of information can be reduced with the assist of combiner’s that need to be transferred throughout to the reducers. If the operation accomplished is commutative and associative you could use your reducer code as a combiner. The execution of combiner isn't always guaranteed in Hadoop
Q9. What takes place when a datanode fails ?
Ans: When a datanode fails
Jobtracker and namenode detect the failure
On the failed node all duties are re-scheduled
Namenode replicates the users records to any other node
Q10. Explain what is Speculative Execution?
Ans: In Hadoop throughout Speculative Execution a sure variety of reproduction obligations are launched. On extraordinary slave node, more than one copies of equal map or reduce assignment may be achieved using Speculative Execution. In simple words, if a specific force is taking long term to complete a project, Hadoop will create a duplicate task on any other disk. Disk that end the challenge first are retained and disks that do not finish first are killed.
Q11. Explain what are the simple parameters of a Mapper?
The primary parameters of a Mapper are
LongWritable and Text
Text and IntWritable
Q12. Explain what's the characteristic of MapReducer partitioner?
Ans: The function of MapReducer partitioner is to ensure that every one the cost of a single key is going to the same reducer, finally which allows evenly distribution of the map output over the reducers
Q13. Explain what is distinction between an Input Split and HDFS Block?
Ans: Logical department of facts is called Split whilst bodily department of records is known as HDFS Block
Q14. Explain what occurs in textinformat ?
Ans: In textinputformat, every line within the text record is a file. Value is the content of the road while Key is the byte offset of the line. For instance, Key: longWritable, Value: textual content
Q15. Mention what are the principle configuration parameters that consumer need to specify to run Mapreduce Job ?
Ans: The consumer of Mapreduce framework wishes to specify
Job’s enter places inside the distributed document machine
Job’s output place within the disbursed record gadget
Class containing the map characteristic
Class containing the reduce characteristic
JAR file containing the mapper, reducer and driver classes
Q16. Explain what is WebDAV in Hadoop?
Ans: To aid modifying and updating documents WebDAV is a hard and fast of extensions to HTTP. On most working machine WebDAV stocks can be installed as filesystems , so it's far possible to access HDFS as a general filesystem via exposing HDFS over WebDAV.
Q17. Explain what's sqoop in Hadoop ?
Ans: To transfer the facts between Relational database control (RDBMS) and Hadoop HDFS a tool is used referred to as Sqoop. Using Sqoop information may be transferred from RDMS like MySQL or Oracle into HDFS as well as exporting facts from HDFS file to RDBMS
Q18. Explain how JobTracker schedules a project ?
Ans: The task tracker send out heartbeat messages to Jobtracker normally every short time to make sure that JobTracker is active and functioning. The message also informs JobTracker approximately the wide variety of to be had slots, so the JobTracker can live upto date with in which within the cluster work can be delegated
Q19. Explain what is Sequencefileinputformat?
Ans: Sequencefileinputformat is used for studying files in series. It is a selected compressed binary record format that is optimized for passing statistics between the output of one MapReduce activity to the input of some other MapReduce task.
Q20. Explain what does the conf.SetMapper Class do ?
Ans: Conf.SetMapperclass units the mapper class and all of the stuff related to map task such as reading facts and generating a key-fee pair out of the mapper
Q21. Explain what is Hadoop?
Ans: It is an open-supply software framework for storing statistics and walking programs on clusters of commodity hardware. It presents full-size processing power and massive garage for any type of statistics.
Q22. Mention what's the difference among an RDBMS and Hadoop?
RDBMS is relational database control device Hadoop is node based flat shape
It used for OLTP processing while Hadoop It is presently used for analytical and for BIG DATA processing
In RDBMS, the database cluster makes use of the identical records documents stored in shared storage In Hadoop, the storage facts can be stored independently in every processing node.
You want to preprocess records earlier than storing it you don’t need to preprocess information earlier than storing it
23) Mention Hadoop center additives?
Ans: Hadoop middle components include,
Q24. What is NameNode in Hadoop?
Ans: NameNode in Hadoop is where Hadoop stores all the document vicinity statistics in HDFS. It is the grasp node on which task tracker runs and includes metadata.
Q25. Mention what are the records additives used by Hadoop?
Ans: Data additives utilized by Hadoop are
Q26. Mention what's the records garage component used by Hadoop?
Ans: The information garage issue utilized by Hadoop is HBase.
Q27. Mention what are the most not unusual enter formats defined in Hadoop?
Ans: The maximum not unusual enter codecs defined in Hadoop are;
Q28. In Hadoop what's InputSplit?
Ans: It splits input documents into chunks and assign each break up to a mapper for processing.
Q29. For a Hadoop process, how will you write a custom partitioner?
Ans: You write a custom partitioner for a Hadoop process, you comply with the following direction
Create a new magnificence that extends Partitioner Class
Override technique getPartition
In the wrapper that runs the MapReduce
Add the custom partitioner to the task via using technique set Partitioner Class or – add the custom partitioner to the task as a config document
Q30. For a process in Hadoop, is it feasible to alternate the range of mappers to be created?
Ans: No, it isn't possible to alternate the wide variety of mappers to be created. The range of mappers is decided through the variety of enter splits.
Q31. Explain what is a sequence record in Hadoop?
Ans: To store binary key/fee pairs, collection report is used. Unlike everyday compressed file, series file aid splitting even when the data within the report is compressed.
Q32. When Namenode is down what happens to job tracker?
Ans: Namenode is the single factor of failure in HDFS so when Namenode is down your cluster will spark off.
Q33. Explain how indexing in HDFS is done?
Ans: Hadoop has a unique manner of indexing. Once the data is stored as according to the block length, the HDFS will preserve on storing the remaining a part of the facts which say in which the next a part of the records may be.
Q34. Explain is it viable to search for documents the usage of wildcards?
Ans: Yes, it is viable to look for files the use of wildcards.
Q35. List out Hadoop’s 3 configuration documents?
Ans: The 3 configuration documents are
Q36. Explain how are you going to take a look at whether Namenode is running beside the usage of the jps command?
Ans: Beside the use of the jps command, to test whether Namenode are running you could additionally use
/and so on/init.D/hadoop-zero.20-namenode status.
Q37. Explain what is “map” and what's "reducer" in Hadoop?
Ans: In Hadoop, a map is a section in HDFS question fixing. A map reads records from an input place, and outputs a key price pair in keeping with the enter type.
In Hadoop, a reducer collects the output generated by means of the mapper, procedures it, and creates a final output of its personal.
Q38. In Hadoop, which report controls reporting in Hadoop?
Ans: In Hadoop, the hadoop-metrics.Homes report controls reporting.
Q39. For the usage of Hadoop list the network necessities?
Ans: For the usage of Hadoop the list of network requirements are:
Password-less SSH connection
Secure Shell (SSH) for launching server tactics
Q40. Mention what is rack focus?
Ans: Rack focus is the manner wherein the namenode determines on how to region blocks primarily based at the rack definitions.
Q41. Explain what is a Task Tracker in Hadoop?
Ans: A Task Tracker in Hadoop is a slave node daemon within the cluster that accepts tasks from a JobTracker. It also sends out the heart beat messages to the JobTracker, each couple of minutes, to verify that the JobTracker remains alive.
Q42. Mention what daemons run on a master node and slave nodes?
Daemons run on Master node is "NameNode"
Daemons run on each Slave nodes are “Task Tracker” and "Data"
Q43. Explain how are you going to debug Hadoop code?
The famous methods for debugging Hadoop code are:
By the usage of web interface provided by way of Hadoop framework
By using Counters
Q44. Explain what's storage and compute nodes?
The storage node is the gadget or pc where your document system is living to shop the processing facts
The compute node is the laptop or machine in which your actual commercial enterprise common sense will be achieved.
Q45. Mention what's the usage of Context Object?
Ans: The Context Object allows the mapper to engage with the relaxation of the Hadoop
device. It consists of configuration statistics for the process, in addition to interfaces which permit it to emit output.
Q46. Mention what is the next step after Mapper or MapTask?
Ans: The next step after Mapper or MapTask is that the output of the Mapper are looked after, and walls could be created for the output.
Q47. Mention what is the variety of default partitioner in Hadoop?
Ans: In Hadoop, the default partitioner is a “Hash” Partitioner.
Q48. Explain what's the cause of RecordReader in Hadoop?
Ans: In Hadoop, the RecordReader masses the records from its source and converts it into (key, cost) pairs appropriate for reading by the Mapper.
Q49. Explain how is information partitioned before it is despatched to the reducer if no custom partitioner is defined in Hadoop?
Ans: If no custom partitioner is described in Hadoop, then a default partitioner computes a hash cost for the important thing and assigns the partition based at the result.
Q50. Explain what happens while Hadoop spawned 50 responsibilities for a process and one of the assignment failed?
Ans: It will restart the mission once more on some different TaskTracker if the mission fails greater than the defined limit.
Q51. Mention what is the excellent way to copy documents among HDFS clusters?
Ans: The first-class way to copy documents among HDFS clusters is by the use of multiple nodes and the distcp command, so the workload is shared.
Q52. Mention what is the difference among HDFS and NAS?
Ans: HDFS statistics blocks are distributed across nearby drives of all machines in a cluster whilst NAS records is saved on devoted hardware.
Q53. Mention how Hadoop isn't the same as other data processing tools?
Ans: In Hadoop, you could growth or lower the quantity of mappers without demanding about the quantity of statistics to be processed.
Q54. Mention what process does the conf class do?
Ans: Job conf class separate specific jobs strolling at the identical cluster. It does the activity degree settings consisting of declaring a activity in a real surroundings.
Q55. Mention what is the Hadoop MapReduce APIs settlement for a key and price elegance?
Ans: For a key and price elegance, there are two Hadoop MapReduce APIs contract
The cost need to be defining the org.Apache.Hadoop.Io.Writable interface
The key ought to be defining the org.Apache.Hadoop.Io.WritableComparable interface
Q56. Mention what are the 3 modes wherein Hadoop can be run?
Ans: The three modes wherein Hadoop can be run are
Pseudo disbursed mode
Standalone (neighborhood) mode
Fully dispensed mode
Q57. Mention what does the textual content enter layout do?
Ans: The text input format will create a line object this is an hexadecimal range. The fee is taken into consideration as a whole line text at the same time as the key is considered as a line item. The mapper will acquire the fee as ‘text’ parameter at the same time as key as ‘longwriteable’ parameter.
Q58. Mention what number of InputSplits is made by a Hadoop Framework?
Ans: Hadoop will make 5 splits
1 split for 64K files
2 break up for 65mb documents
2 splits for 127mb files
Q59. Mention what is sent cache in Hadoop?
Distributed cache in Hadoop is a facility provided by way of MapReduce framework. At the time of execution of the job, it is used to cache report. The Framework copies the necessary documents to the slave node before the execution of any project at that node.
Q60. Explain how does Hadoop Classpath performs a important role in preventing or starting in Hadoop daemons?
Classpath will consist of a list of directories containing jar documents to prevent or begin daemons.