YouTube Icon

Interview Questions.

Top Hive Interview Questions – Most Asked - Dec 27, 2020

fluid

Top Hive Interview Questions – Most Asked

Hive or Apache Hive is the information base programming that permits you to peruse, compose, and oversee enormous arrangements of information that are put away in a dispersed stockpiling stage utilizing SQL. In this Hive Interview Questions blog, we will likely cover all the inquiries that are typically posed by scouts during any Hive prospective employee meet-up. We mean to cause you to set you up for the accompanying Hive inquiries questions: 

Q1. Separate among Pig and Hive. 

Q2. How to skip header columns from a table in Hive? 

Q3. What is a Hive variable? What do we use it for? 

Q4. Disclose the cycle to get to subdirectories recursively in Hive inquiries. 

Q5. Would we be able to change the settings inside a Hive meeting? On the off chance that indeed, how? 

Q6. Is it conceivable to add 100 hubs when we have 100 hubs in Hive? In the event that indeed, how? 

Q7. Clarify the connection work in Hive with a model. 

Q8. Clarify the Trim and Reverse capacities in Hive with models. 

Q9. How to change the segment information type in Hive? Clarify RLIKE in Hive. 

Q10. What are the parts utilized in Hive Query Processor? 

1. Separate among Pig and Hive. 

Criteria Apache Pig Apache Hive
Nature Procedural data flow language Declarative SQL-like language
Application Used for programming Used for report creation
Used by Researchers and programmers Mainly Data Analysts
Operates on The client-side of a cluster The server-side of a cluster
Accessing raw data Not as fast as HiveQL Faster with in-built features
Schema or data type Always defined in the script itself Stored in the local database
Ease of learning Takes little extra time and effort to master Easy to learn from database experts

2. How to skip header lines from a table in Hive? 

Envision that header records in a table are as per the following: 

System=…
Version=…
Sub-version=…

Assume, we would prefer not to remember the over three lines of headers for our Hive question. To avoid the header lines from our table in Hive, we will set a table property. 

CREATE EXTERNAL TABLE employee (
name STRING,
job STRING,
dob STRING,
id INT,
salary INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
LOCATION ‘/user/data’
TBLPROPERTIES("skip.header.line.count"="2”);

3. What is a Hive variable? What do we use it for? 

Hive factors are essentially established in the Hive climate that is referred to by Hive scripting dialects. They permit to pass a few qualities to a Hive question when the inquiry begins executing. They utilize the source order. 

4. Disclose the cycle to get to subdirectories recursively in Hive questions. 

By utilizing the underneath orders, we can get to subdirectories recursively in Hive: 

hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be highlighted the more significant level catalog, and this is reasonable for the registry structure like: 

/data/country/state/city/

5. Would we be able to change the settings inside a Hive meeting? On the off chance that truly, how? 

Truly, we can change the settings inside a Hive meeting utilizing the SET order. It helps change the Hive work settings for a definite inquiry. For instance, the accompanying order shows that pails are involved by the table definition: 

hive> SET hive.enforce.bucketing=true;

We can see the current estimation of any property by utilizing SET with the property name. SET will list all the properties with their qualities set by Hive. 

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

This rundown wo exclude the defaults of Hadoop. Along these lines, we should utilize the beneath code: 

SET -v

It will list all the properties incorporating the Hadoop defaults in the framework. 

6. Is it conceivable to add 100 hubs when we have 100 hubs in Hive? On the off chance that truly, how? 

Indeed, we can add the hubs by following the beneath steps: 

Stage 1: Take another framework; make another username and secret word 

Stage 2: Install SSH and with the expert hub arrangement SSH associations 

Stage 3: Add ssh public_rsa id key to the approved keys record 

Stage 4: Add the new DataNode hostname, IP address, and different subtleties in/and so on/has slaves record: 

192.168.1.102 slave3.in slave3

Stage 5: Start the DataNode on another hub 

Stage 6: Login to the new hub like suhadoop or: 

ssh -X hadoop@192.168.1.103

Stage 7: Start HDFS of the recently added slave hub by utilizing the accompanying order: 

./bin/hadoop-daemon.sh start data node

Stage 8: Check the yield of the jps order on the new hub 

7. Clarify the link work in Hive with a model. 

The link capacity will join the info strings. We can determine 

'n' number of strings isolated by a comma. 

Model: 

CONCAT_WS ('-',’Intellipaat’,’is’,’a’,’eLearning’,‘provider’);

Yield: 

Intellipaat-is-a-eLearning-provider

Without fail, we set the restrictions of the strings by '- '. In the event that it is basic for each string, at that point Hive gives another order: 

CONCAT_WS

For this situation, we need to indicate the set furthest reaches of the administrator first as follows: 

CONCAT_WS ('-',’Intellipaat’,’is’,’a’,’eLearning’,‘provider’);

Yield: 

Intellipaat-is-a-eLearning-provider

Middle Interview Questions 

8. Clarify the Trim and Reverse capacities in Hive with models. 

The trim capacity will erase the spaces related with a string. 

Model: 

TRIM(‘ INTELLIPAAT ‘);

Yield: 

INTELLIPAAT

To eliminate the main space: 

LTRIM(‘ INTELLIPAAT’);

To eliminate the following space: 

RTRIM(‘INTELLIPAAT ‘)

In the opposite capacity, characters are switched in the string. 

Model: 

REVERSE(‘INTELLIPAAT’);

Yield: 

TAAPILLETNI

9. How to change the segment information type in Hive? Clarify RLIKE in Hive. 

We can change the section information type by utilizing ALTER and CHANGE as follows: 

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

For instance, in the event that we need to change the information kind of the compensation section from number to bigint in the representative table, we can utilize the accompanying: 

ALTER TABLE employee CHANGE salary salary BIGINT;

RLIKE: Its full structure is Right-Like and it is a unique capacity in Hive. It inspects two substrings, i.e., in the event that the substring of A matches with B, at that point it assesses to valid. 

Model: 

‘Intellipaat’ RLIKE ‘tell’ ? True
‘Intellipaat’ RLIKE ‘^I.*’ ? True (this is a regular expression)

10. What are the parts utilized in Hive Query Processor? 

Following are the segments of a Hive Query Processor: 

  • Parse and Semantic Analysis (ql/parse) 
  • Metadata Layer (ql/metadata) 
  • Type Interfaces (ql/typeinfo) 
  • Meetings (ql/meeting) 
  • Guide/Reduce Execution Engine (ql/executive) 
  • Plan Components (ql/plan) 
  • Hive Function Framework (ql/udf) 
  • Devices (ql/apparatuses) 
  • Analyzer (ql/enhancer) 

11. What are Buckets in Hive? 

Containers in Hive are utilized in isolating Hive table information into numerous records or catalogs. They are utilized for productive questioning. 

12. What sort of information distribution center application is reasonable for Hive? What are the sorts of tables in Hive? 

Hive isn't viewed as a full information base. The plan rules and guidelines of Hadoop and HDFS have put limitations on what Hive can do. Notwithstanding, Hive is generally appropriate for information distribution center applications since it: 

  • Investigates moderately static information 
  • Has less responsive time 
  • Doesn't roll out fast improvements in information 

In spite of the fact that Hive doesn't give crucial highlights needed to Online Transaction Processing (OLTP), it is appropriate for information stockroom applications in enormous datasets. There are two sorts of tables in Hive: 

  • Overseen tables 
  • Outside tables 

13. What is the meaning of Hive? What is the current variant of Hive? Clarify ACID exchanges in Hive. 

Hive is an open-source information distribution center framework. We can utilize Hive for investigating and questioning enormous datasets. It's like SQL. The current variant of Hive is 0.13.1. Hive upholds ACID (Atomicity, Consistency, Isolation, and Durability) exchanges. Corrosive exchanges are given at line levels. Following are the alternatives Hive uses to help ACID exchanges: 

  • Addition 
  • Erase 
  • Update 

14. What is the greatest size of a string information type upheld by Hive? Clarify how Hive underpins paired organizations. 

The greatest size of a string information type upheld by Hive is 2 GB. Hive bolsters the content record design as a matter of course, and it additionally underpins the paired organization succession documents, ORC documents, Avro information records, and Parquet records. 

  • Succession record: It is a splittable, compressible, and line arranged document with an overall twofold configuration. 
  • ORC document: Optimized line columnar (ORC) design document is a record-columnar and section situated capacity document. It partitions the table in column split. Each split stores the estimation of the main line in the primary segment and follows accordingly. 
  • Avro information record: It is equivalent to an arrangement document that is splittable, compressible, and line situated however without the help of composition development and multilingual authoritative. 
  • Parquet record: In Parquet design, alongside putting away lines of information neighboring each other, we can likewise store section esteems adjoining each other with the end goal that both on a level plane and vertically datasets are apportioned. 

15. What is the priority request of Hive setup? 

We are utilizing a priority progressive system for setting properties: 

  • The SET order in Hive 
  • The order line – hiveconf alternative 
  • Hive-site.XML 
  • Hive-default.xml 
  • Hadoop-site.xml 
  • Hadoop-default.xml

16. On the off chance that you run a select * inquiry in Hive, for what reason doesn't it run MapReduce? 

The hive.fetch.task.conversion property of Hive brings down the inactivity of MapReduce overhead, and in actuality when executing inquiries, for example, SELECT, FILTER, LIMIT, and so forth it skirts the MapReduce work. 

Progressed Interview Questions 

17. How might we improve the exhibition with ORC design tables in Hive? 

We can store Hive information in an exceptionally productive way in an Optimized Row Columnar (ORC) record design. It can streamline many Hive record design impediments. We can improve the exhibition by utilizing ORC documents while perusing, composing, and preparing information. 

Set hive.compute.query.using.stats-true;
Set hive.stats.dbclass-fs;
CREATE TABLE orc_table (
idint,
name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\:’
LINES TERMINATED BY ‘\n’
STORES AS ORC;

18. Clarify the usefulness of ObjectInspector. 

ObjectInspector investigates the interior structure of a line object and the individual structure of segments in Hive. It likewise gives a uniform method to get to complex items that can be put away in different organizations in the memory. 

  • An occasion of Java class 
  • A standard Java object 
  • An apathetically instated object 

ObjectInspector tells the structure of the article and furthermore the approaches to get to the inward fields inside the item. 

19. At whatever point we run a Hive question, another metastore_db is made. Why? 

A neighborhood metastore is made when we run Hive in an installed mode. Prior to making, it checks if the metastore exists, and this metastore property is characterized in the arrangement record, hive-site.xml. The property is: 

javax.jdo.option.ConnectionURL

with the default esteem: 

jdbc:derby:;databaseName=metastore_db;create=true

Hence, we need to change the conduct of the area to a flat out way so that from that area the metastore can be utilized. 

20. Separate among Hive and HBase. 

Hive HBase
Enables most SQL queries Does not allow SQL queries
Operations do not run in real time Operations run in real time
A data warehouse framework A NoSQL database
Runs on top of MapReduce Runs on top of HDFS

21. How might we access the subdirectories recursively? 

By utilizing the underneath orders, we can get to subdirectories recursively in Hive: 

hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be highlighted the more significant level registry, and this is reasonable for the catalog structure: 

/data/country/state/city/

22. What are the employments of Hive Explode? 

Hadoop Developers consider an exhibit as their information and convert it into a different table line. To change over convoluted information types into wanted table organizations, Hive utilizes Explode. 

23. What is the accessible system for interfacing applications when we run Hive as a worker? 

  • Frugality Client: Using Thrift, we can call Hive orders from different programming dialects, for example, C++, PHP, Java, Python, and Ruby. 
  • JDBC Driver: JDBC Driver empowers getting to information with JDBC uphold, by interpreting calls from an application into SQL and passing the SQL questions to the Hive motor. 
  • ODBC Driver: It actualizes the ODBC API standard for the Hive DBMS, empowering ODBC-agreeable applications to interface flawlessly with Hive. 

24. How would we compose our own custom SerDe? 

Generally, end-clients favor composing a Deserializer as opposed to utilizing SerDe as they need to peruse their own information design as opposed to keeping in touch with it, e.g., RegexDeserializer deserializes information with the assistance of the arrangement boundary 'regex' and with a rundown of segment names. 

In the event that our SerDe upholds DDL (i.e., SerDe with defined sections and segment types), we will most likely execute a convention dependent on DynamicSerDe, rather than composing a SerDe. This is on the grounds that the system passes DDL to SerDe through the 'Frugality DDL' organization and it's absolutely pointless to compose a "Frugality DDL" parser. 

25. Notice different date types upheld by Hive. 

The timestamp information type stores date in the java.sql.timestamp design. 

Three assortment information types in Hive are: 

  • Exhibits 
  • Guides 
  • Structs 

26. Would we be able to run UNIX shell orders from Hive? Will Hive inquiries be executed from content records? On the off chance that truly, how? Give a model. 

Indeed, we can run UNIX shell orders from Hive utilizing a '!' mark before the order. For instance, !pwd at Hive brief will show the current index. 

We can execute Hive questions from the content records utilizing the source order. 

Model: 

Hive> source /path/to/file/file_with_query.hql

 




CFG