YouTube Icon

Interview Questions.

Top MapReduce Interview Questions and Answers - Jul 17, 2022

fluid

Top MapReduce Interview Questions and Answers

Q1a. What is mapreduce?

Ans: MapReduce is a programming model and an associated implementation for processing and generating massive records sets with a parallel, disbursed algorithm on a cluster.

Or

What is MapReduce?

 Referred because the core of Hadoop, MapReduce is a programming framework to procedure massive units of information or massive information throughout heaps of servers in a Hadoop Cluster. The concept of MapReduce is much like the cluster scale-out statistics processing structures. The term MapReduce refers to 2 essential procedures of Hadoop application operates.
 

First is the map() task, which converts a set of information into some other breaking down person elements into key/fee pairs (tuples). Then comes lessen() job into play, wherein the output from the map, i.E. The tuples function the enter and are mixed into smaller set of tuples. As the name suggests, the map activity on every occasion takes place earlier than the lessen one.

Q1b. What is Hadoop Map Reduce ?

Ans: For processing big statistics units in parallel across a hadoop cluster, Hadoop MapReduce framework is used.  Data evaluation makes use of a -step map and reduce technique.

Q2. How Hadoop MapReduce works?

Ans: In MapReduce, at some point of the map segment it counts the words in every file, while inside the reduce segment it aggregates the facts as according to the document spanning the whole collection. During the map section the input facts is divided into splits for analysis by means of map obligations jogging in parallel throughout Hadoop framework.

Q3. Explain what is shuffling in MapReduce ?

Ans: The system by way of which the system plays the type and transfers the map outputs to the reducer as inputs is referred to as the shuffle

Q4. Explain what is sent Cache in MapReduce Framework ?

Ans: Distributed Cache is an important feature provided by means of map reduce framework. When you want to percentage some files across all nodes in Hadoop Cluster, DistributedCache  is used.  The files can be an executable jar files or easy homes document.

Q5. Explain what is NameNode in Hadoop?

Ans: NameNode in Hadoop is the node, wherein Hadoop shops all the record area facts in HDFS (Hadoop Distributed File System).  In other words, NameNode is the centrepiece of an HDFS file system.  It continues the file of all of the files within the record machine, and tracks the document information across the cluster or more than one machines

Q6. Explain what is JobTracker in Hadoop? What are the actions observed by Hadoop?

Ans: In Hadoop for filing and tracking MapReduce jobs,  JobTracker is used. Job tracker run on its very own JVM procedure

Hadoop plays following movements in Hadoop

Client software submit jobs to the activity tracker

JobTracker communicates to the Namemode to determine data location

Near the information or with available slots JobTracker locates TaskTracker nodes

On chosen TaskTracker Nodes, it submits the paintings

When a venture fails, Job tracker notify and comes to a decision what to do then.

The TaskTracker nodes are monitored via JobTracker

Q7. Explain what's heartbeat in HDFS?

Ans: Heartbeat is referred to a signal used between a statistics node and Name node, and among project tracker and activity tracker, if the Name node or job tracker does no longer respond to the signal, then it's far taken into consideration there's a few problems with information node or task tracker

Q8. Explain what combiners is and when you have to use a combiner in a MapReduce Job?

Ans: To growth the efficiency of MapReduce Program, Combiners are used.  The quantity of facts may be decreased with the assist of combiner’s that need to be transferred across to the reducers. If the operation accomplished is commutative and associative you can use your reducer code as a combiner.  The execution of combiner isn't guaranteed in Hadoop

Q9. What happens while a datanode fails ?

Ans: When a datanode fails

Jobtracker and namenode come across the failure

On the failed node all obligations are re-scheduled

Namenode replicates the users data to any other node

Q10. Explain what is Speculative Execution?

Ans: In Hadoop at some stage in Speculative Execution a sure number of duplicate responsibilities are released.  On distinct slave node, multiple copies of same map or lessen undertaking may be executed the use of Speculative Execution. In easy phrases, if a particular force is taking long term to finish a assignment, Hadoop will create a duplicate mission on every other disk.  Disk that end the project first are retained and disks that don't finish first are killed.

Q11. Explain what are the fundamental parameters of a Mapper?

Ans: The fundamental parameters of a Mapper are

LongWritable and Text

Text and IntWritable

Q12. Explain what's the feature of MapReducer partitioner?

Ans: The characteristic of MapReducer partitioner is to ensure that all the fee of a single key is going to the equal reducer, subsequently which facilitates frivolously distribution of the map output over the reducers

Q13. Explain what's distinction between an Input Split and HDFS Block?

Ans: Logical department of information is referred to as Split even as physical department of facts is known as HDFS Block

HubSpot Video
 

Q14. Explain what happens in textinformat ?

Ans: In textinputformat, each line within the textual content report is a record.  Value is the content of the road at the same time as Key is the byte offset of the road. For example, Key: longWritable, Value: text

Q15. Mention what are the main configuration parameters that user need to specify to run Mapreduce Job ?

Ans: The user of Mapreduce framework desires to specify

Job’s input places within the dispensed record device

Job’s output vicinity within the allotted file system

Input layout

Output format

Class containing the map feature

Class containing the reduce feature

JAR file containing the mapper, reducer and driving force instructions

Q16. Explain what is WebDAV in Hadoop?

Ans: To help editing and updating files WebDAV is a hard and fast of extensions to HTTP.  On maximum operating machine WebDAV shares may be set up as filesystems , so it's miles viable to access HDFS as a popular filesystem with the aid of exposing HDFS over WebDAV.

Q17. Explain what is sqoop in Hadoop ?

Ans: To switch the facts between Relational database management (RDBMS) and Hadoop HDFS a tool is used called Sqoop. Using Sqoop data can be transferred from RDMS like MySQL or Oracle into HDFS in addition to exporting statistics from HDFS document to RDBMS

Q18. Explain how JobTracker schedules a mission ?

Ans: The mission tracker ship out heartbeat messages to Jobtracker usually every few minutes to make sure that JobTracker is energetic and functioning.  The message additionally informs JobTracker about the number of to be had slots, so the JobTracker can live upto date with in which inside the cluster paintings can be delegated

Q19. Explain what is Sequencefileinputformat?

Ans: Sequencefileinputformat is used for studying documents in series. It is a selected compressed binary document format which is optimized for passing information among the output of 1 MapReduce job to the enter of a few other MapReduce activity.

Q20. Explain what does the conf.SetMapper Class do ?

Ans: Conf.SetMapperclass  units the mapper elegance and all of the stuff associated with map job which include analyzing facts and generating a key-fee pair out of the mapper

Q21. Explain what's Hadoop?

Ans: It is an open-supply software framework for storing facts and walking applications on clusters of commodity hardware.  It offers huge processing energy and big garage for any sort of facts.

Q22. Mention what is the distinction between an RDBMS and Hadoop?

Ans:

RDBMS    Hadoop

RDBMS is relational database management gadget    Hadoop is node based flat shape

It used for OLTP processing while Hadoop    It is presently used for analytical and for BIG DATA processing

In RDBMS, the database cluster uses the identical statistics files stored in shared storage    In Hadoop, the storage records can be stored independently in every processing node.

You need to preprocess facts earlier than storing it    you don’t want to preprocess information before storing it
 

Q23. Mention Hadoop middle components?

Ans: Hadoop middle additives consist of,

HDFS

MapReduce

Q24. What is NameNode in Hadoop?

Ans: NameNode in Hadoop is wherein Hadoop shops all of the report vicinity facts in HDFS. It is the grasp node on which process tracker runs and includes metadata.

Q25. Mention what are the information components used by Hadoop?

Ans: Data additives utilized by Hadoop are

Pig

Hive

Q26. Mention what is the information storage thing used by Hadoop?

Ans: The statistics storage factor utilized by Hadoop is HBase.

Q27. Mention what are the maximum commonplace input codecs defined in Hadoop?

Ans: The maximum not unusual enter formats described in Hadoop are;

TextInputFormat

KeyValueInputFormat

SequenceFileInputFormat

Q28. In Hadoop what is InputSplit?

Ans: It splits enter documents into chunks and assign each break up to a mapper for processing.

 

Q29. For a Hadoop task, how will you write a custom partitioner?

Ans: You write a custom partitioner for a Hadoop task, you observe the subsequent direction

Create a new class that extends Partitioner Class

Override technique getPartition

In the wrapper that runs the MapReduce

Add the custom partitioner to the task with the aid of the usage of technique set Partitioner Class or – add the custom partitioner to the job as a config record

Q30. For a process in Hadoop, is it possible to change the number of mappers to be created?

Ans: No, it isn't possible to trade the wide variety of mappers to be created. The variety of mappers is determined with the aid of the quantity of enter splits.

Q31. Explain what's a chain report in Hadoop?

Ans:To keep binary key/fee pairs, series record is used. Unlike ordinary compressed file, collection document assist splitting even when the information in the record is compressed.

Q32. When Namenode is down what takes place to task tracker?

Ans: Namenode is the single factor of failure in HDFS so when Namenode is down your cluster will activate.

Q33. Explain how indexing in HDFS is achieved?

Ans: Hadoop has a unique manner of indexing. Once the information is saved as in step with the block length, the HDFS will hold on storing the ultimate a part of the information which say wherein the following part of the information could be.

Q34. Explain is it possible to look for files the use of wildcards?

Ans: Yes, it's far possible to look for files using wildcards.

Q35. List out Hadoop’s 3 configuration documents?

Ans:The three configuration documents are

core-website online.Xml

mapred-web page.Xml

hdfs-web page.Xml

Q36. Explain how will you check whether or not Namenode is working beside the usage of the jps command?

Ans: Beside the usage of the jps command, to check whether or not Namenode are running you could additionally use

/and so forth/init.D/hadoop-zero.20-namenode popularity.

Q37. Explain what is “map” and what is "reducer" in Hadoop?

Ans: In Hadoop, a map is a segment in HDFS query fixing.  A map reads statistics from an input region, and outputs a key fee pair consistent with the input type.

In Hadoop, a reducer collects the output generated by means of the mapper, tactics it, and creates a very last output of its very own.

Q38. In Hadoop, which report controls reporting in Hadoop?

Ans: In Hadoop, the hadoop-metrics.Homes file controls reporting.

Q39. For using Hadoop listing the community necessities?

Ans:For the usage of Hadoop the listing of community requirements are:

Password-less SSH connection

Secure Shell (SSH) for launching server procedures

Q40. Mention what's rack attention?

Ans: Rack cognizance is the way wherein the namenode determines on the way to area blocks based on the rack definitions.

Q41. Explain what is a Task Tracker in Hadoop?

Ans: A Task Tracker in Hadoop is a slave node daemon in the cluster that accepts duties from a JobTracker. It also sends out the heartbeat messages to the JobTracker, each short while, to confirm that the JobTracker is still alive.

Q42. Mention what daemons run on a grasp node and slave nodes?

Ans:

Daemons run on Master node is "NameNode"

Daemons run on each Slave nodes are “Task Tracker” and "Data"

Q43. Explain how will you debug Hadoop code?

Ans: The famous strategies for debugging Hadoop code are:

By using net interface furnished by using Hadoop framework

By the usage of Counters

Q44. Explain what's storage and compute nodes?

Ans:

The garage node is the machine or pc where your report machine is living to keep the processing facts

The compute node is the pc or machine where your real business logic might be performed.

Q45. Mention what's using Context Object?

Ans: The Context Object allows the mapper to engage with the relaxation of the Hadoop

system. It consists of configuration records for the process, as well as interfaces which allow it to emit output.

Q46. Mention what's the next step after Mapper or MapTask?

Ans: The next step after Mapper or MapTask is that the output of the Mapper are looked after, and partitions can be created for the output.

Q47. Mention what's the wide variety of default partitioner in Hadoop?

Ans: In Hadoop, the default partitioner is a “Hash” Partitioner.

Q48. Explain what is the reason of RecordReader in Hadoop?

Ans: In Hadoop, the RecordReader loads the records from its source and converts it into (key, price) pairs suitable for reading by the Mapper.

Q49. Explain how is statistics partitioned before it's miles sent to the reducer if no custom partitioner is defined in Hadoop?

Ans: If no custom partitioner is described in Hadoop, then a default partitioner computes a hash fee for the key and assigns the partition based on the end result.

Q50. Explain what happens whilst Hadoop spawned 50 responsibilities for a task and one of the undertaking failed?

Ans: It will restart the mission again on a few other TaskTracker if the task fails more than the defined restrict.

Q51. Mention what is the best manner to duplicate documents among HDFS clusters?

Ans: The pleasant manner to replicate documents between HDFS clusters is via using more than one nodes and the distcp command, so the workload is shared.

Q52. Mention what's the difference between HDFS and NAS?

Ans: HDFS data blocks are dispensed throughout nearby drives of all machines in a cluster while NAS records is stored on dedicated hardware.

Q53. Mention how Hadoop isn't like different statistics processing gear?

Ans: In Hadoop, you can increase or lower the wide variety of mappers without demanding about the extent of statistics to be processed.

Q54. Mention what task does the conf elegance do?

Ans: Job conf magnificence separate different jobs running at the equal cluster.  It does the task stage settings consisting of declaring a process in a actual environment.

Q55. Mention what is the Hadoop MapReduce APIs agreement for a key and price magnificence?

Ans: For a key and fee elegance, there are  Hadoop MapReduce APIs settlement

The fee need to be defining the org.Apache.Hadoop.Io.Writable interface

The key should be defining the org.Apache.Hadoop.Io.WritableComparable interface

Q56. Mention what are the three modes in which Hadoop can be run?

Ans: The three modes in which Hadoop may be run are

Pseudo dispensed mode

Standalone (local) mode

Fully dispensed mode

Q57. Mention what does the textual content enter format do?

Ans: The textual content enter layout will create a line object this is an hexadecimal number.  The cost is considered as a whole line textual content whilst the key's taken into consideration as a line object. The mapper will obtain the price as ‘textual content’ parameter whilst key as ‘longwriteable’ parameter.

Q58. Mention how many InputSplits is made by way of a Hadoop Framework?

Ans: Hadoop will make 5 splits

1 split for 64K files

2 split for 65mb documents

2 splits for 127mb files

Q59. Mention what is sent cache in Hadoop?

Ans: Distributed cache in Hadoop is a facility supplied by MapReduce framework.  At the time of execution of the activity, it's far used to cache document.  The Framework copies the important files to the slave node before the execution of any project at that node.

Q60. Explain how does Hadoop Classpath performs a critical function in preventing or starting in Hadoop daemons?

Ans: Classpath will consist of a listing of directories containing jar files to stop or begin daemons.

Q61. Compare MapReduce and Spark?

Ans:

Criteria    MapReduce    Spark

Processing Speeds    Good    Exceptional

Standalone mode    Needs Hadoop    Can paintings independently

Ease of use    Needs widespread Java software    APIs for Python, Java, & Scala

Versatility    Real-time & system gaining knowledge of applications    Not optimized for actual-time & machine mastering applications
 

Q62. Can MapReduce software be written in any language aside from Java?

Ans: Yes, Mapreduce may be written in many programming languages Java, R, C++, scripting Languages (Python, PHP). Any language able to read from stadin and write to stdout and parse tab and newline characters have to paintings . Hadoop streaming (A Hadoop Utility) lets in you to create and run Map/Reduce jobs with any executable or scripts because the mapper and/or the reducer.

Q63. Illustrate a easy example of the running of MapReduce.

Ans: Let’s take a simple instance to apprehend the functioning of MapReduce. However, in real-time initiatives and packages, that is going to be difficult and complicated because the facts we address Hadoop and MapReduce is giant and large.

Assume you have 5 files and every record consists of two key/price pairs as in  columns in every record – a town call and its temperature recorded. Here, call of town is the important thing and the temperature is cost.

San Francisco, 22

Los Angeles, 15

Vancouver, 30

London, 25

Los Angeles, sixteen

Vancouver, 28

London,12

It is important to be aware that every file might also include the data for identical metropolis more than one instances. Now, out of this statistics, we want to calculate the most temperature for every metropolis across these five files. As defined, the MapReduce framework will divide it into 5 map duties and each map challenge will perform information capabilities on one of the 5 files and returns maxim temperature for each city.

(San Francisco, 22)(Los Angeles, 16)(Vancouver, 30)(London, 25)

Similarly every mapper performs it for the other 4 files and bring intermediate consequences, for example like beneath.

(San Francisco, 32)(Los Angeles, 2)(Vancouver, 8)(London, 27)

(San Francisco, 29)(Los Angeles, 19)(Vancouver, 28)(London, 12)

(San Francisco, 18)(Los Angeles, 24)(Vancouver, 36)(London, 10)

(San Francisco, 30)(Los Angeles, 11)(Vancouver, 12)(London, 5)

These obligations are then handed to the reduce activity, in which the enter from all files are mixed to output a single price. The very last results here might be:

Q64. What are the principle additives of MapReduce Job?

Ans: Main Driver Class: providing task configuration parameters

Mapper Class: need to make bigger org.Apache.Hadoop.Mapreduce.Mapper magnificence and plays execution of map() approach

Reducer Class: have to make bigger org.Apache.Hadoop.Mapreduce.Reducer elegance

Q65. What is Shuffling and Sorting in MapReduce?

Ans: Shuffling and Sorting are two primary techniques operating simultaneously throughout the operating of mapper and reducer.

The system of transferring statistics from Mapper to reducer is Shuffling. It is a mandatory operation for reducers to continue their jobs further because the shuffling system serves as enter for the reduce obligations.

In MapReduce, the output key-fee pairs among the map and reduce levels (after the mapper) are mechanically taken care of earlier than shifting to the Reducer. This feature is helpful in programs wherein you need sorting at some stages. It also saves the programmer’s normal time.

Q66. What is Partitioner and its usage?

Ans: Partitioner is yet some other critical phase that controls the partitioning of the intermediate map-reduce output keys using a hash function. The process of partitioning determines in what reducer, a key-value pair (of the map output) is sent. The variety of partitions is same to the entire number of reduce jobs for the system.

Hash Partitioner is the default elegance available in Hadoop , which implements the following function.Int getPartition(K key, V price, int numReduceTasks)

The function returns the partition number the use of the numReduceTasks is the range of fixed reducers.

Q67. What is Identity Mapper and Chain Mapper?

Ans: Identity Mapper is the default Mapper elegance furnished via Hadoop. Whilst no different Mapper magnificence is described, Identify can be performed. It most effective writes the input facts into output and do no longer perform and computations and calculations at the input facts.

The elegance name is org.Apache.Hadoop.Mapred.Lib.IdentityMapper.

Chain Mapper is the implementation of easy Mapper elegance via chain operations throughout a fixed of Mapper classes, within a single map challenge. In this, the output from the first mapper turns into the enter for 2d mapper and second mapper’s output the enter for third mapper and so forth until the remaining mapper.

The magnificence call is org.Apache.Hadoop.Mapreduce.Lib.ChainMapper.

Q68. What primary configuration parameters are specified in MapReduce?

Ans: The MapReduce programmers want to specify following configuration parameters to carry out the map and decrease jobs:

The enter vicinity of the task in HDFs.

The output area of the process in HDFS.

The enter’s and output’s layout.

The training containing map and decrease functions, respectively.

The .Jar record for mapper, reducer and motive force training.

Q69. Name Job control alternatives specified by using MapReduce.

Ans: Since this framework helps chained operations wherein an enter of one map job serves as the output for other, there's a need for task controls to govern these complex operations.

The diverse task manage options are:

Job.Submit() : to post the process to the cluster and right now return

Job.WaitforCompletion(boolean) : to post the activity to the cluster and watch for its finishing touch

Q70. What is InputFormat in Hadoop?

Ans: Another critical feature in MapReduce programming, InputFormat defines the enter specs for a activity. It performs the following functions:

Validates the enter-specification of process.

Split the enter file(s) into logical times called InputSplit. Each of these cut up files are then assigned to man or woman Mapper.

Provides implementation of RecordReader to extract enter information from the above times for similarly Mapper processing

Q71. What is the difference between HDFS block and InputSplit?

Ans: An HDFS block splits facts into bodily divisions at the same time as InputSplit in MapReduce splits enter documents logically.

While InputSplit is used to govern quantity of mappers, the size of splits is user defined. On the opposite, the HDFS block length is constant to sixty four MB, i.E. For 1GB information , it will be 1GB/64MB = sixteen splits/blocks. However, if input cut up length isn't always described by way of person, it takes the HDFS default block size.

Q72. What is Text Input Format?

Ans: It is the default InputFormat for plain textual content documents in a given process having input documents with .Gz extension. In TextInputFormat, documents are damaged into traces, wherein secret's position within the record and cost refers to the road of textual content. Programmers can write their personal InputFormat.

The hierarchy is:

java.Lang.Object

org.Apache.Hadoop.Mapreduce.InputFormat<K,V>

org.Apache.Hadoop.Mapreduce.Lib.Enter.FileInputFormat<LongWritable,Text>

org.Apache.Hadoop.Mapreduce.Lib.Enter.TextInputFormat

Q73. Explain job scheduling through JobTracker.

Ans: JobTracker communicates with NameNode to become aware of information area and submits the paintings to TaskTracker node. The TaskTracker performs a chief function as it notifies the JobTracker for any job failure. It definitely is referred to the pulse reporter reassuring the JobTracker that it is still alive. Later, the JobTracker is chargeable for the actions as in it could either resubmit the task or mark a specific file as unreliable or blacklist it.

Q74. What is SequenceFileInputFormat?

Ans: A compressed binary output record format to study in sequence documents and extends the FileInputFormat.It passes records between output-input (between output of 1 MapReduce task to input of another MapReduce task)levels of MapReduce jobs.

Q75. How to set mappers and reducers for Hadoop jobs?

Ans: Users can configure JobConf variable to set wide variety of mappers and reducers.

Activity.SetNumMaptasks()

process.SetNumreduceTasks()

 

Q76. Explain JobConf in MapReduce.

Ans: It is a primary interface to outline a map-lessen activity in the Hadoop for process execution. JobConf specifies mapper, Combiner, partitioner, Reducer,InputFormat , OutputFormat implementations and different superior activity faets liek Comparators.

 

Q77. What is a MapReduce Combiner?

Ans: Also referred to as semi-reducer, Combiner is an optionally available magnificence to mix the map out information using the identical key. The major feature of a combiner is to just accept inputs from Map Class and bypass the ones key-fee pairs to Reducer magnificence

Q78. What is RecordReader in a Map Reduce?

Ans: RecordReader is used to study key/cost pairs form the InputSplit by using converting the byte-oriented view  and offering file-oriented view to Mapper.

Q79. Define Writable records sorts in MapReduce.

Ans: Hadoop reads and writes records in a serialized form in writable interface. The Writable interface has several lessons like Text (storing String records), IntWritable, LongWriatble, FloatWritable, BooleanWritable. Customers are free to outline their private Writable training as properly.

Q80. What is OutputCommitter?

Ans: OutPutCommitter describes the commit of MapReduce undertaking. FileOutputCommitter is the default to be had magnificence available for OutputCommitter in MapReduce. It plays the subsequent operations:

Create transient output directory for the process all through initialization.

Then, it cleans the activity as in removes temporary output listing publish task of entirety.

Sets up the venture brief output.

Identifies whether a challenge needs devote. The dedicate is carried out if required.

JobSetup, JobCleanup and TaskCleanup are vital tasks at some point of output devote.

Q81. What is a “map” in Hadoop?

Ans: In Hadoop, a map is a segment in HDFS question solving. A map reads facts from an enter region, and outputs a key value pair in step with the input type.

Q82. What is a “reducer” in Hadoop?

Ans: In Hadoop, a reducer collects the output generated by means of the mapper, tactics it, and creates a final output of its very own.

Q83. What are the parameters of mappers and reducers?

Ans: The four parameters for mappers are:

LongWritable (input)

text (input)

text (intermediate output)

IntWritable (intermediate output)

The 4 parameters for reducers are:

Text (intermediate output)

IntWritable (intermediate output)

Text (final output)

IntWritable (final output)

Q84. What are the important thing variations among Pig vs MapReduce?

Ans: PIG is a statistics waft language, the important thing attention of Pig is manipulate the drift of records from enter supply to output save. As part of managing this statistics go with the flow it movements records feeding it to

manner 1. Taking output and feeding it to

process2. The core functions are preventing execution of next degrees if previous level fails, manages brief storage of statistics and most significantly compresses and rearranges processing steps for faster processing. While this will be carried out for any type of processing tasks Pig is written particularly for managing facts glide of Map reduce form of jobs. Most if now not all jobs in a Pig are map reduce jobs or data motion jobs. Pig lets in for custom capabilities to be brought which may be used for processing in Pig, a few default ones are like ordering, grouping, awesome, depend and many others.

Mapreduce then again is a information processing paradigm, it's far a framework for utility builders to jot down code in so that its without difficulty scaled to PB of duties, this creates a separation among the developer that writes the software vs the developer that scales the software. Not all packages may be migrated to Map lessen however exact few may be including complicated ones like k-approach to easy ones like counting uniques in a dataset.

Q85. How to set which framework could be used to run mapreduce software?

Ans: mapreduce.Framework.Call. It can be

Local

classic

Yarn

Q86. What platform and Java version is required to run Hadoop?

Ans: Java 1.6.X or higher version are correct for Hadoop, ideally from Sun. Linux and Windows are the supported running device for Hadoop, however BSD, Mac OS/X and Solaris are more well-known to work.




CFG