YouTube Icon

Interview Questions.

Top 26 Apache Tajo Interview Questions - Jul 25, 2022

fluid

Top 26 Apache Tajo Interview Questions

Q1. Mention The Salient Features Of Apache Tajo ?

Some salient feaures of Tajo are:

Superior scalability and optimized performance

Low latency

User-described features

Row/columnar storage processing framework.

Compatibility with HiveQL and Hive MetaStore

Simple records drift and smooth renovation.

Q2. How Tables Are Managed In Apache Tajo?

The logical view of the information supply is described as desk. The desk includes numerous homes like logical schema, partitions, URL and so forth. A Tajo desk may be a listing in HDFS, a unmarried report, one HBase table, or a RDBMS table.

The styles of tables supported by using Apache Tajo are:

External table: External table needs the region assets when the table is created. For instance, if the facts is already there as Text/JSON files or HBase table, it is able to be registered as Tajo outside table. The following query is an example of outside desk creation.

Create external table sample(col1 int,col2 text,col3 int) 

place ‘hdfs://path/to/desk';

Internal table: A Internal table is also called an Managed Table. It is created in a pre-defined bodily place referred to as the Tablespace.

Create table table1(col1 int,col2 text);

By default, Tajo uses “tajo.Warehouse.Listing” located in “conf/tajo-web site.Xml” . Tablespace configuration is used to assign new vicinity for the desk.

Q3. What Are The Storage Supported By Tajo?

Tajo helps the following storage codecs:

HDFS

JDBC

Amazon S3

Apache HBase

Elasticsearch

Q4. What Is Apache Tajo?

Apache Tajo is a relational and disbursed information processing framework. It is designed for low latency and scalable advert-hoc question evaluation.

Tajo helps wellknown SQL and diverse facts codecs. Most of the Tajo queries may be achieved without any change.

Tajo has fault-tolerance thru a restart mechanism for failed responsibilities and extensible question rewrite engine.

Tajo performs the vital ETL (Extract Trform and Load technique) operations to summarize large datasets stored on HDFS. It is an alternative choice to Hive/Pig.

Q5. How To Add Column In Apache Tajo?

To insert new column inside the “students” table, kind the subsequent syntax -

Alter table ADD COLUMN

 alter desk college students add column grade textual content;

Q6. What Is Having Clause In Apache Tajo?

The HAVING clause allows you to specify conditions that filter out which institution results seem inside the final outcomes. The WHERE clause locations situations on the selected columns, while the HAVING clause locations situations at the groups created by using the GROUP BY clause.

SELECT column1, column2 FROM table1 

     GROUP BY column HAVING [ conditions ]

 select age from mytable organization by age having sum(mark) > 2 hundred;

Q7. Explain About Tablespace?

The locations within the storage system are defined with the aid of Tablespace. It is supported for handiest internal tables. Tablespaces are accessed through their names. Each tablespace can use a one-of-a-kind garage type. If the tablespace is not detailed then, Tajo uses the default tablespace in the root directory. Tajo’s inner desk information may be accessed from every other desk only. It can be configured with tablespace.

CREATE TABLE [IF NOT EXISTS] 

   [(column_list)] [TABLESPACE tablespace_name]

   [using [with ( = , ...)]] [AS ]

Q8. What Are Apache Tajo Sql Functions?

Some of the SQL capabilities supported by using Apache Tajo are categorized into:

Math Functions

String Functions

DateTime Functions

JSON Functions

Q9. Mention Some Basic Tajo Shell Commands?

Start server

$ bin/begin-tajo.Sh

Start Shell

$ bin/tsql

List Database

default> l

List out Built-in Functions

default> df

Describe Function: df function call - This question returns the entire description of the given characteristic.

Default> df sqrt

Quit Terminal

default> q

Cluster Info

default&> admin -cluster

Show grasp

default> admin -showmasters

Q10. Explain About Tajo Worker Configuration?

Worker Heap Memory Size: The surroundings variable TAJO_WORKER_HEAPSIZE in conf/tajo-env.Sh allow Tajo Worker to apply the desired heap reminiscence length. If you need to regulate heap memory length, set TAJO_WORKER_HEAPSIZE variable in conf/tajo-env.

Sh with a proper size as follows:

TAJO_WORKER_HEAPSIZE=8000

The default length is one thousand (1GB).

Temporary Data Directory: TajoWorker stores temporary statistics on nearby document gadget due to out-of-core algorithms. It is possible to specify one or greater temporary facts directories wherein transient facts can be stored.

Maximum number of parallel going for walks tasks for each employee: Each employee can execute multiple duties at a time. Tajo lets in users to specify the most range of parallel strolling responsibilities for every employee.

Q11. Explain The Tajo Architecture?

Client: Client submits the SQL statements to the Tajo Master to get the end result.

Master: Master is the principle daemon. It is answerable for query planning and is the coordinator for workers.

Catalog server: Maintains the desk and index descriptions. It is embedded in the Master daemon. The catalog server uses Apache Derby as the storage layer and connects via JDBC consumer.

Worker: Master node assigns project to worker nodes. TajoWorker procedures data. As the range of TajoWorkers will increase, the processing capability also will increase linearly.

Query Master: Tajo master assigns query to the Query Master. The Query Master is responsible for controlling a disbursed execution plan. It launches the TaskRunner and schedules responsibilities to TaskRunner. The predominant position of the Query Master is to monitor the walking responsibilities and report them to the Master node.

Node Managers: Manages the useful resource of the employee node. It comes to a decision on allocating requests to the node.

TaskRunner: Acts as a nearby query execution engine. It is used to run and monitor question process. The TaskRunner strategies one challenge at a time.

It has the following three essential attributes:

Logical plan - An execution block which created the mission.

A fragment - an input course, an offset range, and schema.

Fetches URIs:

Query Executor: It is used to execute a query.

Storage carrier: Connects the underlying records garage to Tajo.

Q12. Explain Tajo Configuration Files?

Tajo’s configuration is based on Hadoop’s configuration device.

Tajo makes use of  config documents:

catalog-web page.Xml- configuration for the catalog server.

Tajo-web page.Xml- configuration for other tajo modules. Tajo has a diffusion of internal configs. If you don’t set a few config explicitly, the default config can be used for for that config. Tajo is designed to use only a few of configs in ordinary instances. You might not be worried with the configuration.

In default, there is no tajo-web site.Xml in $TAJO/conf directory. If you put a few configs, first replica $TAJO_HOME/conf/tajo-web page.Xml.Templete to tajo-web page.Xml. Then, add the configs on your tajo-web page.

Q13. How To Drop Database In Apache Tajo?

The syntax used to drop a database is -

DROP DATABASE

Ex: take a look at> c default

Q14. Explain Different Queries Performed By Apache Tajo?

Predicates: To examine the genuine/false values of the UNKNOWN, an expression used is known as Predication. For the hunt condition of WHERE clause and HAVING clause, and constructs that require a Boolean value, predicate is used.

Explain: To obtain a question execution plan with a logical and global plan execution of a assertion, Explain is used.

Join: SQL joins are used to combine rows from  or extra tables.

The following are the unique varieties of SQL Joins:

Inner join

 FULL  OUTER JOIN

Cross join

Self join

Natural be part of

Q15. What Are The Benefits Of Apache Tajo?

Apache Tajo offers the following benefits:

Easy to apply

Simplified structure

Cost-based totally question optimization

Vectorized query execution plan

Fast delivery

Simple I/O mechanism and supports diverse sort of storage.

Fault tolerance

Q16. How To Create Index Statement In Apache Tajo?

The CREATE INDEX assertion is used to create indexes in tables. Index is used for fast retrieval of data. Current model helps index for handiest simple TEXT formats saved on HDFS.

CREATE INDEX [ name ] ON table_name (  column_name 

create index student_index on mytable(identity);

Q17. How To Set Property In Apache Tajo?

This property is used to alternate the desk’s belongings.

ALTER TABLE college students SET PROPERTY 'compression.Type' = 'RECORD',

'compression.Codec' = 'org.Apache.Hadoop.Io.Compress.Snappy Codec' ;

Q18. What Are The Window Functions Provided By Apache Tajo?

The functions that execute on a fixed of rows and return a unmarried value for every row are Window features. The Window function in a query, defines the window the use of the OVER() clause.

The OVER() clause has the subsequent talents:

Defines window partitions to shape organizations of rows. (PARTITION BY clause)

Orders rows within a partition. (ORDER BY clause)

Some of the window capabilities are:

rank()

row_num()

lead(fee[, offset integer[, default any]])

lag(value[, offset integer[, default any]])

first_value(price)

last_value(fee)

Q19. What Are The Different Data Formats Supported By Apache Tajo?

Text

JSON

Parquet

RCFile

SequenceFile

ORC

Q20. Explain Abount Postgresql Storage Handler?

Tajo helps PostgreSQL storage handler. It permits user queries to get entry to database gadgets in PostgreSQL. It is the default storage handler in Tajo so you can without difficulty configure it.

"spaces": 

"postgre": 

"uri": "jdbc:postgresql://hostname:port/database1"

"configs": 

"mapped_database": “sampledb”

"connection_properties": 

"person":“tajo", "password": "pwd"

Here, “database1” refers to the postgreSQL database that's mapped to the database “sampledb” in Tajo.

Q21. What Is Distinct Clause In Apache Tajo?

A table column can also incorporate reproduction values. The DISTINCT keyword may be used to go back simplest distinct (one-of-a-kind) values.

SELECT DISTINCT column1,column2 FROM desk name;

 choose wonderful age from mytable;

Q22. What Are The Data Formats Supported By Apache Tajo?

Apache Tajo helps the following facts formats:

JSON

Text record(CSV)

Parquet

Sequence File

AVRO

Protocol Buffer

Apache Orc

Q23. How To Create Database Statement In Apache Tajo?

The declaration used to create a database in Tajo is Create Database and the syntax for the statement is:

CREATE DATABASE [IF NOT EXISTS]

Ex: default> create database if no longer exists test;

Q24. Explain About Catalog Configuration?

If you need to customise the catalog provider, copy $TAJO_HOME/conf/catalog-site.Xml.Template to catalog-website online.Xml. Then, upload the following configs to catalog-web site.Xml. Note that the default configs are sufficient to launch Tajo cluster in most cases.

Tajo.Catalog.Grasp.Addr - If you want to launch a Tajo cluster in disbursed mode, you need to specify this cope with. For greater detail statistics, see Default Ports.

Tajo.Catalog.Keep.Class - If you need to alternate the chronic garage of the catalog server, specify the magnificence call. Its default price is tajo.Catalog.Keep.DerbyStore. In the present day model, Tajo affords 3 persistent storage instructions as follows:

tajo.Catalog.Store.DerbyStore - this garage elegance uses Apache Derby.

Tajo.Catalog.Save.MySQLStore - this storage elegance uses MySQL.

Tajo.Catalog.Shop.MemStore - that is the in-reminiscence garage. It is most effective utilized in unit checks to shorten the duration of unit tests.

Q25. How To Insert Records In Apache Tajo?

To insert facts inside the 'test' table, type the subsequent question.

Db pattern> insert overwrite into check pick out * from mytable;

Q26. How Can We Launch A Tajo Cluster?

To release the tajo grasp, execute begin-tajo.Sh.

$ $TAJO HOME/sbin/begin-tajo.Sh

After then, you could use tajo-cli to get right of entry to the command line interface of Tajo. If you need to the way to use tsql, study Tajo Interactive Shell file.

$ $TAJO HOME/bin/tsql




CFG