Top Mapreduce Interview Questions And Answers
MapReduce is a strategy to deal with information and furthermore a program model for Java-based appropriated processing. The MapReduce calculation incorporates two critical cycles: Map and Reduce. The guide work takes up the dataset, further changing over it by breaking singular components into tuples. This MapReduce Interview Questions blog comprises of a portion of the example inquiries addresses that are asked by experts. Consequently, prior to going for your meeting, experience the accompanying MapReduce inquiries questions:
Q1. Analyze MapReduce and Spark
Q2. What is MapReduce?
Q3. Show a basic illustration of the working of MapReduce.
Q4. What are the primary segments of MapReduce Job?
Q5. What is Shuffling and Sorting in MapReduce?
Q6. What is Partitioner and its use?
Q7. What is Identity Mapper and Chain Mapper?
Q8. What principle arrangement boundaries are indicated in MapReduce?
Q9. Name Job control choices determined by MapReduce.
Q10. What is InputFormat in Hadoop?
Fundamental Interview Questions
1. Look at MapReduce and Spark
| Criteria | MapReduce | Spark |
| Processing Speeds | Good | Exceptional |
| Standalone mode | Needs Hadoop | Can work independently |
| Ease of use | Needs extensive Java program | APIs for Python, Java, & Scala |
| Versatility | Real-time & machine learning applications | Not optimized for real-time & machine learning applications |
2. What is MapReduce?
Alluded as the center of Hadoop, MapReduce is a programming system to deal with huge arrangements of information or huge information across a large number of workers in a Hadoop Cluster. The idea of MapReduce is like the bunch scale-out information preparing frameworks. The term MapReduce alludes to two significant cycles of Hadoop program works.
First is the guide() work, which changes over a bunch of information into another separating singular components into key/esteem sets (tuples). At that point comes decrease() work into play, wherein the yield from the guide, for example the tuples fill in as the information and are consolidated into more modest arrangement of tuples. As the name proposes, the guide work each time happens before the lessen one.
3. Represent a basic illustration of the working of MapReduce.
How about we take a basic guide to comprehend the working of MapReduce. In any case, continuously ventures and applications, this will be detailed and unpredictable as the information we manage Hadoop and MapReduce is broad and huge.
Accept you have five documents and each document comprises of two key/esteem sets as in two segments in each document – a city name and its temperature recorded. Here, name of city is the key and the temperature is esteem.
- San Francisco, 22
- Los Angeles, 15
- Vancouver, 30
- London, 25
- Los Angeles, 16
- Vancouver, 28
- London,12
Note that each record may comprise of the information for same city on different occasions. Presently, out of this information, we need to ascertain the most extreme temperature for every city across these five documents. As clarified, the MapReduce system will partition it into five guide errands and each guide assignment will perform information capacities on one of the five documents and returns adage temperature for every city.
(San Francisco, 22)(Los Angeles, 16)(Vancouver, 30)(London, 25)
Also every mapper performs it for the other four records and produce transitional outcomes, for example like beneath.
(San Francisco, 32)(Los Angeles, 2)(Vancouver, 8)(London, 27)
(San Francisco, 29)(Los Angeles, 19)(Vancouver, 28)(London, 12)
(San Francisco, 18)(Los Angeles, 24)(Vancouver, 36)(London, 10)
(San Francisco, 30)(Los Angeles, 11)(Vancouver, 12)(London, 5)
These assignments are then passed to the decrease work, where the contribution from all documents are consolidated to yield a solitary worth. The eventual outcomes here would be:
(San Francisco, 32)(Los Angeles, 24)(Vancouver, 36)(London, 27)
These figurings are perform right away and are very effective to compute yields on a huge dataset.
4. What are the principle parts of MapReduce Job?
Principle Driver Class: giving position arrangement boundaries
Mapper Class: should expand org.apache.hadoop.mapreduce.Mapper class and performs execution of guide() strategy
Reducer Class: should expand org.apache.hadoop.mapreduce.Reducer class
5. What is Shuffling and Sorting in MapReduce?
Rearranging and Sorting are two significant cycles working at the same time during the working of mapper and reducer.
The way toward moving information from Mapper to reducer is Shuffling. It is a required activity for reducers to continue their positions further as the rearranging cycle fills in as contribution for the decrease assignments.
In MapReduce, the yield key-esteem sets between the guide and decrease stages (after the mapper) are consequently arranged prior to moving to the Reducer. This component is useful in projects where you need arranging at certain stages. It likewise saves the developer's general time.
6. What is Partitioner and its use?
Partitioner is one more significant stage that controls the parceling of the transitional guide diminish yield keys utilizing a hash work. The way toward apportioning decides in what reducer, a key-esteem pair (of the guide yield) is sent. The quantity of segments is equivalent to the all out number of lessen occupations for the cycle.
Hash Partitioner is the default class accessible in Hadoop , which executes the accompanying function.int getPartition(K key, V worth, int numReduceTasks)
The capacity restores the segment number utilizing the numReduceTasks is the quantity of fixed reducers.
7. What is Identity Mapper and Chain Mapper?
Character Mapper is the default Mapper class gave by Hadoop. at the point when no other Mapper class is characterized, Identify will be executed. It just composes the information into yield and don't perform and calculations and figurings on the information.
The class name is org.apache.hadoop.mapred.lib.IdentityMapper.
Chain Mapper is the usage of basic Mapper class through chain activities across a bunch of Mapper classes, inside a solitary guide task. In this, the yield from the principal mapper turns into the contribution for second mapper and second mapper's yield the contribution for third mapper, etc until the last mapper.
The class name is org.apache.hadoop.mapreduce.lib.ChainMapper.
8. What primary design boundaries are indicated in MapReduce?
The MapReduce developers need to determine following setup boundaries to play out the guide and decrease occupations:
- The information area of the occupation in HDFs.
- The yield area of the occupation in HDFS.
- The info's and yield's arrangement.
- The classes containing plan and decrease capacities, individually.
- The .container document for mapper, reducer and driver classes
9. Name Job control alternatives indicated by MapReduce.
Since this structure upholds tied activities wherein a contribution of one guide work fills in as the yield for other, there is a requirement for work controls to oversee these perplexing tasks.
The different employment control choices are:
- Job.submit() : to present the employment to the bunch and promptly return
- Job.waitforCompletion(boolean) : to present the employment to the group and sit tight for its fruition
10. What is InputFormat in Hadoop?
Another significant element in MapReduce programming, InputFormat characterizes the info determinations for a work. It plays out the accompanying capacities:
- Approves the info detail of work.
- Split the info file(s) into intelligent cases called InputSplit. Every one of these split records are then allocated to singular Mapper.
- Gives execution of RecordReader to remove input records from the above examples for additional Mapper preparing
- Halfway Interview Questions
11. What is the distinction between HDFS square and InputSplit?
A HDFS block parts information into actual divisions while InputSplit in MapReduce parts input documents legitimately.
While InputSplit is utilized to control number of mappers, the size of parts is client characterized. In actuality, the HDFS block size is fixed to 64 MB, for example for 1GB information , it will be 1GB/64MB = 16 parts/blocks. In any case, whenever input split size isn't characterized by client, it takes the HDFS default block size.
12. What is Text Input Format?
It is the default InputFormat for plain content records in a given employment having input documents with .gz augmentation. In TextInputFormat, documents are broken into lines, wherein key is position in the record and worth alludes to the line of text. Developers can compose their own InputFormat.
The chain of importance is:
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<LongWritable,Text>
org.apache.hadoop.mapreduce.lib.input.TextInputFormat
org.apache.hadoop.mapreduce.lib.input.TextInputFormat
13. What is JobTracker?
JobTracker is a Hadoop administration utilized for the preparing of MapReduce occupations in the group. It submits and tracks the positions to explicit hubs having information. Just a single JobTracker runs on single Hadoop bunch on its own JVM cycle. in the event that JobTracker goes down, all the positions stop.
14. Clarify work booking through JobTracker.
JobTracker speaks with NameNode to distinguish information area and presents the work to TaskTracker hub. The TaskTracker assumes a significant part as it advises the JobTracker for any employment disappointment. It really is alluded to the heartbeat correspondent consoling the JobTracker that it is as yet alive. Afterward, the JobTracker is answerable for the activities as in it might either resubmit the work or imprint a particular record as untrustworthy or boycott it.
15. What is SequenceFileInputFormat?
A packed twofold yield document organization to peruse in arrangement records and expands the FileInputFormat.It passes information between yield contribution (between yield of one MapReduce task to contribution of another MapReduce job)phases of MapReduce occupations.
16. How to set mappers and reducers for Hadoop occupations?
Clients can design JobConf variable to set number of mappers and reducers.
job.setNumMaptasks()
job.setNumreduceTasks()
17. Clarify JobConf in MapReduce.
It is an essential interface to characterize a guide lessen work in the Hadoop for work execution. JobConf determines mapper, Combiner, partitioner, Reducer,InputFormat , OutputFormat executions and other progressed work faets liek Co
18. What is a MapReduce Combiner?
Otherwise called semi-reducer, Combiner is a discretionary class to consolidate the guide out records utilizing a similar key. The principle capacity of a combiner is to acknowledge contributions from Map Class and pass those key-esteem sets to Reducer class
19. What is RecordReader in a Map Reduce?
RecordReader is utilized to understand key/esteem sets structure the InputSplit by changing over the byte-arranged view and introducing record-situated view to Mapper.
20. Characterize Writable information types in MapReduce.
Hadoop peruses and composes information in a serialized structure in writable interface. The Writable interface has a few classes like Text (putting away String information), IntWritable, LongWriatble, FloatWritable, BooleanWritable. clients are allowed to characterize their own Writable classes too.
Progressed Interview Questions
21. What is OutputCommitter?
OutPutCommitter depicts the submit of MapReduce task. FileOutputCommitter is the default accessible class accessible for OutputCommitter in MapReduce. It plays out the accompanying activities:
- Make transitory yield index for the occupation during introduction.
- At that point, it cleans the occupation as in eliminates transitory yield registry post employment fruition.
- Sets up the errand brief yield.
- Recognizes whether an undertaking needs submit. The submit is applied whenever required.
- JobSetup, JobCleanup and TaskCleanup are significant errands during yield submit.
22. What is a "map" in Hadoop?
In Hadoop, a guide is a stage in HDFS inquiry settling. A guide peruses information from an information area, and yields a key worth pair as indicated by the information type.
23. What is a "reducer" in Hadoop?
In Hadoop, a reducer gathers the yield produced by the mapper, measures it, and makes its very own last yield.
24. What are the boundaries of mappers and reducers?
The four boundaries for mappers are:
- LongWritable (input)
- text (input)
- text (halfway yield)
- IntWritable (halfway yield)
The four boundaries for reducers are:
- Text (moderate yield)
- IntWritable (moderate yield)
- Text (last yield)
- IntWritable (last yield)
25. What are the vital contrasts between Pig versus MapReduce?
PIG is an information stream language, the critical focal point of Pig is deal with the progression of information from input source to yield store. As a feature of dealing with this information stream it moves information taking care of it to
measure 1. taking yield and taking care of it to
process2. The center highlights are forestalling execution of resulting stages if past stage fizzles, oversees brief stockpiling of information and above all packs and revamps preparing ventures for quicker handling. While this should be possible for any sort of handling assignments Pig is composed explicitly for overseeing information stream of Map diminish kind of occupations. Most if not all positions in a Pig are map diminish occupations or information development occupations. Pig takes into account custom capacities to be added which can be utilized for preparing in Pig, some default ones resemble requesting, gathering, particular, tally and so on
Mapreduce then again is an information handling worldview, it is a structure for application designers to compose code in so its effectively scaled to PB of undertakings, this makes a partition between the engineer that composes the application versus the engineer that scales the application. Not everything applications can be moved to Map lessen however great few can be including complex ones like k-intends to straightforward ones like including uniques in a dataset.
26. What is dividing?
Apportioning is a cycle to recognize the reducer case which would be utilized to supply the mappers yield. Before mapper produces the information (Key Value) pair to reducer, mapper distinguish the reducer as a beneficiary of mapper yield. All the key, regardless of which mapper has created this, should lie with same reducer.
27. How to set which system would be utilized to run mapreduce program?
mapreduce.framework.name. it tends to be
- Neighborhood
- exemplary
- Yarn
28. What stage and Java adaptation is needed to run Hadoop?
Java 1.6.x or higher variant are useful for Hadoop, ideally from Sun. Linux and Windows are the upheld working framework for Hadoop, yet BSD, Mac OS/X and Solaris are more popular to work.
29. Could MapReduce program be written in any language other than Java?
Indeed, Mapreduce can be written in many programming dialects Java, R, C++, scripting Languages (Python, PHP). Any language ready to peruse from stadin and write to stdout and parse tab and newline characters should work . Hadoop streaming (A Hadoop Utility) permits you to make and run Map/Reduce occupations with any executable or contents as the mapper and additionally the reducer.

