YouTube Icon

Interview Questions.

Top PIG Interview Questions - Most Asked - Dec 27, 2020

fluid

Top PIG Interview Questions - Most Asked

1. Think about Pig and Hive. 

Criteria Pig Hive
Language Pig Latin SQL-like
Application Programming purposes Report creation
Operation Client Side Server side
Data support Semi-structured Structured
Connectivity Can be called by other applications JDBC & BI tool integration

2. Does Pig vary from MapReduce? In the event that indeed, how? 

Indeed, Pig varies from MapReduce on the grounds that, in MapReduce, the gathering by activity is performed at reducer side and channel, and furthermore in the guide stage the projection is actualized. Pig Latin gives the activities that are like MapReduce, for example, groupby, orderby, and channels. We can investigate the Pig content and information stream to discover the blunder checking. Pig Latin is lower in expense to compose and keep up contrasted with MapReduce Java code. 

3. Clarify the employments of Map Reduce in Pig. 

Apache Pig programs are written in Pig Latin inquiry language which is like the SQL question language. To execute this inquiries, there requires an execution motor. The Pig motor empowers to change over the inquiries into MapReduce occupations and in this manner MapReduce goes about as the execution motor and is intended to run the projects according to the prerequisites. 

Pigs' administrators are utilizing Hadoops' API relying on the designs the occupation is executed in nearby mode or Hadoop group. Pig is never passes any yields to Hadoop rather set the information sources and information areas for map-diminish. 

Pig Latin gives a bunch of standard Data-preparing activities, for example, join, channel, bunch by, request by, association, and so on which are planned to do the guide decrease errands. A Pig Latin content depicts a (DAG) coordinated non-cyclic chart, where the edges are information streams and the hubs are administrators that cycle the information. 

4. Clarify the employments of PIG. 

We can utilize Pig in three classes, they are 

ETL information pipeline : It assists with populating our information stockroom. Pig can pipeline the information to an outer application, it will stand by until it's done, with the goal that it has get the handled information and proceed from that point. It is the most well-known use case for Pig. 

Exploration on crude information. 

Iterative handling. 

5. Name the scalar information type and complex information types in Pig. 

The scalar information types in pig are int, drift, twofold, long, chararray, and bytearray. 

The mind boggling information types in Pig are map, tuple, and sack. 

Guide: The information component with the information type chararray where component has pig information type incorporate complex information type 

Example- [city’#’bang’,’pin’#560001]

In this city and pin are information component planning to values. 

Tuple : It is an assortment of information types and it has fixed length. Tuple is having numerous fields and these are requested. 

Sack : It is an assortment of tuples, however it is unordered, tuples clinched are isolated by comma 

Example: {(‘Bangalore’, 560001),(‘Mysore’,570001),(‘Mumbai’,400001)

6. Express the use of 'channels', 'gathering' , 'orderBy', 'unmistakable' watchwords in pig contents. 

Channels : Filters has the comparable usefulness as where proviso in SQL. Channels contain predicate and on the off chance that it assesses valid for a given record, at that point that record will be passed down the pipeline. Else, it won't predicate the outcomes and hence contains various administrators like ==,>=, <=,!=.so,== and != which is been applied in making maps and tuples. 

A= load ‘inputs’ as (name,address)
B=filter A by symbol matches ‘CM.*';

GroupBy : The gathering explanation gathers different records with a similar key. In SQL information base GroupBy makes a gathering which takes care of straightforwardly to at least one total capacities. Be that as it may, in Pig Latin has no immediate association among gathering and total capacities. 

Input 2 = load ‘daily’ as(exchanges,stocks);
grpds = group input2 by stocks;

Request : The Order articulation sorts the information creating an all out request of yield information. The Order punctuation is like Group. Give a key or set of keys to arrange your information according to necessity. Coming up next are the models for the equivalent: 

Input 2 = load ‘daily’ as(exchanges,stocks);
grpds = order input2 by exchanges;

Unmistakable : The particular proclamation is exceptionally easy to comprehend and actualize. It eliminates copy records and the first information will be made sure about. It is executed uniquely on whole records, not on individual fields. Consider the beneath models which clarifies the equivalent: 

Input 2 = load ‘daily’ as(exchanges,stocks);
grpds = distinct exchanges;

7. Clarify the LOAD catchphrase in Pig content. 

Burden assists with stacking information from the record framework. It is a social administrator 

In the initial phase in information stream language we need to make reference to the info, which is finished by utilizing 'load' watchword. 

LOAD ‘mydata’ [USING function] [AS schema];
Example- A = LOAD ‘intellipaat.txt’;
A = LOAD ‘intellipaat.txt’ USINGPigStorage(‘\t’);

8. What are the connection tasks in Pig? Clarify any two with models. 

The social activities in Pig: 

foreach, request by, channels, gathering, particular, join, limit.foreach: It takes a bunch of articulations and applies them to all records in the information pipeline to the following operator.A =LOAD 'contribution' as (emp_name :charrarray, emp_id : long, emp_add : chararray, telephone : chararray, inclinations : map [] );B = foreach A produce emp_name, emp_id;Filters: It contains a predicate and it permits us to choose which records will be held in our information pipeline. 

Syntax: alias = FILTER alias BY expression;

Pseudonym shows the name of the connection, By demonstrates required watchword and the articulation has Boolean. 

Example: M = FILTER N BY F5 == 4;

9. Does Pig uphold multi-line orders? 

Indeed, pig upholds both single line and multi-line orders. In single line order it executes the information, yet it doesn't store in the record framework, however in different lines orders it stores the information into '/yield';/* , so it can store the information in HDFS. 

10. Clarify diverse execution modes accessible in Pig. 

Three distinctive execution modes accessible in Pig they are, 

Intuitive mode or Grunt mode. 

Clump mode or Script mode. 

Implanted mode 

Intuitive mode or snort mode: Pig's intelligent shell is known as snort shell. In the event that no document is indicated to run in Pig it will begin. 

grunt> run scriptfile.pig
grunt> exec scriptfile.pig

Clump mode or Script mode : Pig executes the predetermined orders in the content record. 

Installed mode : We can implant Pig programs in Java and we can run the projects from Java. 

11. What are the exemption dealing with administrators in Pig content? 

Following administrators are utilized for taking care of the exemption in pig content. 

  • DUMP : It assists with showing the outcomes on screen. 
  • Portray : It assists with showing the blueprint of aparticular connection. 
  • Delineate : It assists with showing bit by bit execution of a grouping of pig explanations 
  • Disclose : It assists with showing the execution plan for Pig Latin explanations. 

12. Separate between the actual arrangement and sensible arrangement in Pig content. 

The two plans are made while to execute the pig content. 

Actual arrangement : It is a progression of MapReduce occupations while making the actual plan.It's isolated into three actual administrators, for example, Local Rearrange, Global Rearrange, and bundle. It represents the actual administrators Pig will use to execute the content without alluding to how they will execute in MapReduce Loading and putting away capacities are settled in actual arrangement. 

Example- A: Load(/emp:PigStorage(‘ ‘))

Intelligent arrangement : The Logical arrangement is an arrangement which is made for each line in the Pig contents. It is delivered after semantic checking and essential parsing. With each line, the coherent arrangement for that specific program becomes broadened and bigger in light of the fact that every single explanation has its own consistent plan.Loading and putting away capacity are not settled in sensible arrangement. 

Example: X: (Name: LOLoad schema: emp_id#36:bytearray,emp_name#37:bytearray,city#38:bytearray,salary#39:bytearray)Required Fields:null

13. Is Pig content case delicate? 

Pig content is both case delicate and case harsh. For instance, in client characterized capacities, the field name, and relations are case touchy ,i.e., INTELLIPAAT isn't same as intellipaat or M=load 'test' isn't same as m=load 'test'. Furthermore, Pig content watchwords are case unfeeling i.e., LOAD is same as a heap. 

14. Feature the contrast among gathering and Cogroup administrators in Pig. 

Both the administrators can work with at least one relations. Gathering and Cogroup administrators are indistinguishable. Gathering administrator gathers all records with a similar key. Cogroup is a mix of gathering and go along with, it is a speculation of a gathering as opposed to gathering records of one information relies upon a key, it gathers records of n inputs dependent on a key. At a time we can Cogroup upto 127 relations. 

15. What is the capacity of UNION and SPLIT administrators? Give models. 

Association administrator assists with blending the substance of at least two relations. 

Syntax: grunt> Relation_name3 = UNION Relation_name1, Relation_name2
Example: grunt> INTELLIPAAT = UNION intellipaat_data1.txt intellipaat_data2.txt

SPLIT administrator assists with partitioning the substance of at least two relations.

Syntax: grunt> SPLIT Relationa1_name INTO Relationa2_name IF (condition1), Relation2_name (condition2);
Example: SPLIT student_details into student_details1 if marks<35, student_details2 if (8590);

16. How might we see just top 15 records from the student.txt out of 100 records in the HDFS registry? 

We should change the name student.txt into STUDENT it is the connection name. We can see the best 15 records in utilizing limit administrator 

Result = limit student 15.

17. What is the utilization of BloomMapFile? 

It is an all-inclusive class of MapFile. Its usefulness is like MapFile. It is utilized in the Hbase table arrangement, Bloom Map File utilizes dynamic Bloom channels to give fast enrollment test to the keys. 

18. How does the Pig stage handle social frameworks information? 

There are two different ways Pig can work with social datasets. 

Burden social information straightforwardly into the Hadoop system, where Pig can get to it. 

Utilizing information base connectors, Pig can stack information straightforwardly from a social data set framework and we can get to it. 

19. What are the downsides of Pig? 

A portion of the disadvantages of Pig are: 

  • Pig isn't generally a helpful choice for ongoing use cases. 
  • Pig doesn't end up being valuable when you need to get single record from a gigantic dataset. 
  • Since it deals with MapReduce, it works in clumps. 

20. Notice the regular highlights in Pig and Hive. 

  • The normal highlights in Both Hive and Pig are 
  • Inside both are changed over the orders into MapReduce. 
  • Both the advances give elevated level reflections. 
  • Both don't uphold low-dormancy questions. 
  • Both don't uphold OLAP or OLTP. 

21. Separate between Pig Latin and Pig Engine. 

Pig Latin is scripting language like Perl for looking through enormous informational collections and it is comprised of a progression of changes and tasks that are applied to the information to create information. 

Pig motor is a climate to execute the Pig Latin projects. It changes over Pig Latin administrators into a progression of MapReduce occupations. 

22. Clarify the terms in the beneath syntax.EXPLAIN [-content pigscript] [-out path] [-brief] [-dot] [-paramparam_name = param_value] [-param_filefile_name] nom de plume; 

  • content: It is utilized to determine a Pig content 
  • out : Used to determine the yield way (catalog) 
  • brief :Does not grow settled plans 
  • dab : yields an arrangement that can be passed to the dab utility for graphical showcase – will create a coordinated non-cyclic chart (DAG) of the plans in any upheld design (.gif, .jpg … ). 
  • Nom de plume : name of a connection. 
  • paramparam_name = param_value : used to see 

23. What are all details classes in the org.apache.pig.tools.pigstats bundle? 

Detail classes are in the bundle 

  • PigStats 
  • JobStats 
  • OutputStats 
  • InputStats.




CFG