Hive Interview Questions and Answers
Q1. Explain what's Hive?
Ans: Hive is an ETL and Data warehousing device advanced on top of Hadoop Distributed File System (HDFS). It is a information warehouse framework for querying and evaluation of information this is saved in HDFS. Hive is an open-source-software program that we could programmers analyze massive records units on Hadoop.
Q2. When to apply Hive?
Hive is useful when making statistics warehouse programs
When you're coping with static facts instead of dynamic data
When application is on high latency (excessive response time)
When a big statistics set is maintained
When we are using queries rather than scripting
Q3. Mention what are the different modes of Hive?
Ans: Depending on the size of facts nodes in Hadoop, Hive can perform in modes.
These modes are:
Map lessen mode
Q4. Mention while to apply Map lessen mode?
Ans: Map reduce mode is used when;
It will perform on big quantity of records units and query going to execute in a parallel manner.
Hadoop has multiple statistics nodes, and information is sent across specific node we use Hive on this mode.
Processing huge statistics sets with higher overall performance wishes to be accomplished.
Q5. Mention key components of Hive Architecture?
Ans: Key components of Hive Architecture includes,
Q6. Mention what are the exceptional sorts of tables to be had in Hive?
Ans: There are two forms of tables available in Hive:
Managed desk: In managed desk, both the records and schema are underneath manipulate of Hive
External table: In the outside desk, handiest the schema is beneath the control of Hive.
Q7. Explain what's Metastore in Hive?
Ans: Metastore is a vital repository in Hive. It is used for storing schema facts or metadata within the outside database.
Q8. Mention what Hive consists of?
Ans: Hive includes three primary components:
Hive Storage and Computing
Q9. Mention what are the form of database does Hive support?
Ans: For single user metadata storage, Hive uses derby database and for multiple consumer Metadata or shared Metadata case Hive makes use of MYSQL.
Q10. Mention Hive default read and write classes?
Hive default read and write lessons are:
Q11. Mention what are the exclusive modes of Hive?
Ans: Different modes of Hive depends on the dimensions of data nodes in Hadoop.
These modes are:
Map reduce mode
Q12. Why is Hive no longer appropriate for OLTP structures?
Ans: Hive isn't always appropriate for OLTP structures because it does no longer offer insert and update feature on the row level.
Q13. Differentiate between Hive and HBase
Enables most of the SQL queries This doesn’t allow SQL queries
Doesn’t guide report level insert, update, and delete operations on table It helps
It is a facts warehouse framework It is NoSQL database
Hive run at the top of MapReduce HBase runs at the pinnacle of HDFS
Q14. Explain what is a Hive variable? What for we use it?
Ans: Hive variable is created inside the Hive surroundings that can be referenced via Hive scripts. It is used to bypass a few values to the hive queries while the query begins executing.
Q15. Mention what is ObjectInspector functionality in Hive?
Ans: ObjectInspector functionality in Hive is used to analyze the internal structure of the columns, rows, and complicated items. It lets in to access the internal fields inside the objects.
Q16. Mention what is (HS2) HiveServer2?
Ans: It is a server interface that plays following functions:
It permits far off customers to execute queries against Hive
Retrieve the effects of cited queries
Some advanced features Based on Thrift RPC in its present day model consist of:
Q17. Mention what Hive query processor does?
Ans: Hive query processor convert graph of MapReduce jobs with the execution time framework. So that the roles can be done in the order of dependencies.
Q18. Mention what are the additives of a Hive query processor?
Ans: The additives of a Hive query processor encompass:
Logical Plan Generation
Physical Plan Generation
UDF’s and UDAF’s
Q19. Mention what's Partitions in Hive?
Ans: Hive organizes tables into walls.
It is one of the approaches of dividing tables into different components based on partition keys.
Partition is useful when the table has one or more Partition keys.
Partition keys are fundamental factors for determining how the information is saved in the table.
Q20. Mention while to pick out “Internal Table” and “External Table” in Hive?
Ans: In Hive you could pick out inner desk:
If the processing facts to be had in local record gadget.
If we need Hive to manipulate the whole lifecycle of statistics inclusive of the deletion.
You can pick External desk:
If processing records to be had in HDFS.
Useful while the files are being used out of doors of Hive.
Q21. Mention if we will call view identical as the name of a Hive desk?
Ans: No. The call of a view should be unique in comparison to all different tables and as perspectives gift inside the identical database.
Q22. Mention what are perspectives in Hive?
Ans: In Hive, Views are Similar to tables. They are generated primarily based on the necessities.
We can save any result set information as a view in Hive.
Usage is similar to as views utilized in SQL.
All form of DML operations can be finished on a view.
Q23. Explain how Hive Deserialize and serialize the records?
Ans: Usually, whilst read/write the information, the consumer first talk with inputformat. Then it connects with Record reader to examine/write document. To serialize the statistics, the information is going to row. Here deserialized custom serde use item inspector to deserialize the facts in fields.
Q24. What is Buckets in Hive?
The records present inside the partitions can be divided further into Buckets
The division is completed primarily based on Hash of unique columns this is selected inside the desk.
Q25. In Hive, how are you going to enable buckets?
Ans: In Hive, you could allow buckets by means of the usage of the subsequent command,
Q26. In Hive, are you able to overwrite Hadoop MapReduce configuration in Hive?
Ans: Yes, you can overwrite Hadoop MapReduce configuration in Hive.
Q27. Explain how can you exchange a column statistics kind in Hive?
Ans: You can exchange a column facts type in Hive via the use of command,
ALTER TABLE table_name CHANGE column_name column_name new_datatype;
Q28. Mention what's the difference among order through and sort through in Hive?
SORT BY will type the statistics inside each reducer. You can use any quantity of reducers for SORT BY operation.
ORDER BY will kind all the records collectively, which has to skip through one reducer. Thus, ORDER BY in hive makes use of a single
Q29. Explain when to apply explode in Hive?
Ans: Hadoop developers every so often take an array as enter and convert into a separate desk row. To convert complex records types into preferred desk codecs, Hive use explode.
Q30. Mention how will you stop a partition form being queried?
Ans: You can prevent a partition shape being queried via the use of the ENABLE OFFLINE clause with ALTER TABLE declaration.
Q31. Compare Pig and Hive
Criteria Pig Hive
Architecture Procedural data go with the flow language SQL type declarative language
Application Programming functions Report advent
Operational subject Client aspect Server aspect
Support for avro files Yes No
Q32. What is the definition of Hive? What is the present model of Hive and provide an explanation for approximately ACID transactions in Hive?
Ans: Hive is an open supply records warehouse gadget. We can use Hive for analyzing and querying in large information units of Hadoop documents. It’s similar to SQL. The gift version of hive is 0.Thirteen.1. Hive supports ACID transactions: The full shape of ACID is Atomicity, Consistency, Isolation, and Durability. ACID transactions are furnished at the row levels, there are Insert, Delete, and Update options so that Hive helps ACID transaction.
Q33. Explain what is a Hive variable. What do we use it for?
Ans: Hive variable is essentially created in the Hive environment that is referenced by using Hive scripting languages. It offers to skip a few values to the hive queries while the question starts offevolved executing. It uses the supply command.
Q34. What form of information warehouse application is suitable for Hive? What are the forms of tables in Hive?
Ans: Hive isn't always considered as a complete database. The design rules and regulations of Hadoop and HDFS positioned restrictions on what Hive can do.Hive is maximum suitable for facts warehouse packages.
Analyzing the surprisingly static records.
Less Responsive time.
No speedy modifications in facts.Hive doesn’t provide fundamental capabilities required for OLTP, Online Transaction Processing.Hive is appropriate for data warehouse applications in huge facts units. Two forms of tables in Hive
Q35. Can We Change settings within Hive Session? If Yes, How?
Ans: Yes we can trade the settings inside Hive session, the usage of the SET command. It enables to exchange Hive job settings for an genuine question.
Example: The following instructions shows buckets are occupied consistent with the table definition.
Hive> SET hive.Enforce.Bucketing=authentic;
We can see the modern-day price of any property by means of the use of SET with the property name. SET will list all of the residences with their values set by Hive.
Hive> SET hive.Implement.Bucketing;
And this list will not include defaults of Hadoop. So we should use the below like
It will list all the homes which includes the Hadoop defaults within the device.
Interested in gaining knowledge of Hive? Well, we've got the comprehensive Hive Training Course to offer you a head begin for your profession.
Q36. Is it viable to feature a hundred nodes when we have a hundred nodes already in Hive? How?
Ans: Yes, we are able to add the nodes by following the underneath steps.
Take a new gadget create a new username and password.
Install the SSH and with master node setup ssh connections.
Add ssh public_rsa identification key to the authorized keys document.
Add the brand new statistics node host name, IP cope with and other information in /and many others/hosts slaves file
168.1.102 slave3.In slave3.
Start the Data Node on New Node.
Login to the new node like suhadoop or ssh -X email@example.com.
Start HDFS of a newly introduced slave node by using the use of the following command
./bin/hadoop-daemon.Sh start facts node.
Check the output of jps command on a brand new node
Q37. Explain the concatenation characteristic in Hive with an instance .
Ans: Concatenate characteristic will be a part of the input strings. We can specify the
‘N’ wide variety of strings separated by way of a comma.
So, on every occasion we set the limits of the strings by way of ‘-‘. If it is common for every strings, then Hive provides any other command
CONCAT_WS. In this example,we ought to specify the set limits of operator first.
Wish to Learn Hive? Click Here
Q38. Trim and Reverse feature in Hive with examples.
Ans: Trim feature will delete the areas associated with a string.
TRIM(‘ INTELLIPAAT ‘);
To take away the Leading space
To remove the trailing space
In Reverse feature, characters are reversed within the string.
Q39. How to change the column records kind in Hive? Explain RLIKE in Hive.
Ans: We can change the column information type by way of using ALTER and CHANGE.
The syntax is :
ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;
Example: If we want to trade the information form of the revenue column from integer to bigint within the worker desk.
ALTER TABLE employee CHANGE revenue income BIGINT;RLIKE: Its complete form is Right-Like and it is a special feature within the Hive. It helps to look at the two substrings. I.E, if the substring of A matches with B then it evaluates to genuine.
Trueà‘Intellipaat’ RLIKE ‘tell’
True (that is a regular expression)à‘Intellipaat’ RLIKE ‘^I.*’
Q40. What are the additives utilized in Hive query processor?
Ans: The additives of a Hive query processor include:
Logical Plan of Generation.
Physical Plan of Generation.
UDF’s and UDAF’s.
Q41. What is Buckets in Hive?
Ans: The gift records is partitioned and divided into specific Buckets. This data is split on the premise of Hash of the precise table columns.
Q42. Explain technique to access sub directories recursively in Hive queries.
Ans: By the usage of beneath commands we are able to get right of entry to sub directories recursively in Hive
hive> Set mapred.Input.Dir.Recursive=authentic;
hive> Set hive.Mapred.Helps.Subdirectories=real;
Hive tables can be pointed to the higher stage listing and that is appropriate for the directory structure which is like /facts/country/state/town/
Q43. What are the additives used in Hive query processor?
Ans: The components of a Hive query processor encompass:
Logical Plan of Generation
Physical Plan of Generation
UDF’s and UDAF’s
Q44. How to pass header rows from a desk in Hive?
Ans: Header facts in log files
In the above three traces of headers that we do now not want to include in our Hive query. To pass header strains from our tables inside the Hive,set a table property to be able to allow us to pass the header traces.
CREATE EXTERNAL TABLE employee (
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
Q45. What is the most size of string data kind supported via hive? Mention the Hive guide binary codecs.
Ans: The maximum length of string records type supported through hive is two GB.
Hive supports the text file format with the aid of default and it helps the binary format Sequence documents, ORC documents, Avro Data documents, Parquet files.
Sequence documents: Splittable, compressible and row oriented are the overall binary layout.
ORC documents: Full form of ORC is optimized row columnar layout documents. It is a Record columnar document and column oriented storage record. It divides the table in row break up. In every break up stores that value of the primary row within the first column and followed sub eventually.
AVRO statistics documents: It is identical as a series file splittable, compressible and row oriented, however except the assist of schema evolution and multilingual binding aid.
Q46. What is the priority order of HIVE configuration?
Ans: We are the usage of a priority hierarchy for setting the residences
SET Command in HIVE
The command line –hiveconf choice
Q47. If you run a pick * question in Hive, Why does it now not run MapReduce?
Ans: The hive.Fetch.Mission.Conversion belongings of Hive lowers the latency of mapreduce overhead and in impact whilst executing queries like SELECT, FILTER, LIMIT, and many others., it skips mapreduce characteristic
Q48. How Hive can improve performance with ORC format tables?
Ans: We can keep the hive statistics in quite green manner within the Optimized Row Columnar record layout. It can simplify many Hive record format barriers. We can improve the overall performance via the usage of ORC documents while analyzing, writing and processing the information.
Set hive.Compute.Query.The usage of.Stats-genuine;
CREATE TABLE orc_table (
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘:’
LINES TERMINATED BY ‘n’
STORES AS ORC;
Q49. Explain the functionality of Object-Inspector.
Ans: It allows to investigate the internal shape of row item and individual shape of columns in HIVE. It also gives a uniform manner to get entry to complex items that can be saved in multiple codecs in the memory.
Instance of Java class
A preferred Java object
A lazily initialized object
The Object-Inspector tells shape of the object and additionally methods to get entry to the inner fields within the item.
Q50. Whenever we run hive question, new metastore_db is created. Why?
Ans: Local metastore is created when we run Hive in embedded mode. And before growing it assessments whether the metastore exists or now not and this metastore property is described in the configuration file hive-web page.Xml. Property is“javax.Jdo.Alternative.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=proper”.So to alternate the behavior of the vicinity to an absolute route, in order that from that area meta-keep will be used.
Q51. How are we able to get admission to the sub directories recursively?
Ans: By using under commands we can access sub directories recursively in Hive:
hive> Set mapred.Input.Dir.Recursive=real;
hive> Set hive.Mapred.Supports.Subdirectories=genuine;
Hive tables can be pointed to the better level listing and this is appropriate for the listing shape that's like /facts/u . S . A ./state/city/
Q52. What are the uses of explode Hive?
Ans: Hadoop builders keep in mind the array as their inputs and convert them right into a separate table row. To convert complicate statistics types into preferred table formats Hive is basically using explode.
Q53. What is available mechanism for connecting from packages, when we run hive as a server?
Thrift Client: Using thrift you can name hive instructions from diverse programming languages. Example: C++, PHP,Java, Python and Ruby.
JDBC Driver: JDBC Driver helps the Type four (pure Java) JDBC Driver
ODBC Driver: ODBC Driver helps the ODBC protocol.
Q54. How will we write our own custom SerDe?
Ans: End users need to read their own data format in place of writing, so the consumer desires to write a Deserializer than SerDe.
Example: The RegexDeserializer will deserialize the information the use of the configuration parameter ‘regex’, and a list of column names.
If our SerDe supports DDL, we in all likelihood want to implement a protocol primarily based on DynamicSerDe. It’s non-trivial to write a “thrift DDL” parser.
Q55. Mention the date facts type in Hive. Name the Hive facts kind collection.
Ans: The TIMESTAMP records kind shops date in java.Square.Timestamp layout.
Three collection records sorts in Hive:
Q56. Can we run UNIX shell commands from Hive? Can Hive queries be accomplished from script documents? How? Give an example.
Ans: Yes, we will run UNIX shell commands from Hive the use of the! Mark before the command .For instance: !Pwd at hive prompt will list the modern directory.
We can execute Hive queries from the script files with the aid of the usage of the supply command.
Hive> source /course/to/document/file_with_query.Hql