Cassandra Interview Question and Answers
Q1. Explain what is Cassandra?
Ans: Cassandra is an open supply information storage gadget developed at Facebook for inbox search and designed for storing and coping with big amounts of facts throughout commodity servers. It can server as each.
Real time facts store machine for on-line packages.
Also as a examine intensive database for enterprise intelligence system.
Q2. What is using Cassandra and why to apply Cassandra?
Ans: Cassandra became designed to handle big records workloads throughout more than one nodes with none single point of failure. The different factors answerable for the use of Cassandra are:
It is fault tolerant and regular.
Gigabytes to petabytes scalabilities.
It is a column-orientated database.
No unmarried factor of failure.
No need for separate caching layer.
Flexible schema layout.
It has flexible data storage, clean information distribution, and fast writes.
It helps ACID (Atomicity, Consistency, Isolation, and Durability)residences.
Multi-records middle and cloud succesful.
Q3. Explain what is composite kind in Cassandra?
Ans: In Cassandra, composite type lets in to outline key or a column name with a concatenation of statistics of different type. You can use two varieties of Composite Type:
Q4. How Cassandra shops information?
All information stored as bytes
When you specify validator, Cassandra guarantees the ones bytes are encoded as consistent with requirement
Then a comparator orders the column primarily based at the ordering specific to the encoding
While composite are just byte arrays with a selected encoding, for each factor it shops a two byte period followed by way of the byte encoded element followed via a termination bit.
Q5. Mention what are the main additives of Cassandra Data Model?
Ans: The essential components of Cassandra Data Model are:
Column & Family
Q6. Explain what is a column family in Cassandra?
Ans: Column own family in Cassandra is referred for a set of Rows.
Q7. Explain what is a cluster in Cassandra?
Ans: A cluster is a container for keyspaces. Cassandra database is segmented over several machines that function together. The cluster is the outermost container which arranges the nodes in a ring format and assigns statistics to them. These nodes have a replica which takes price in case of facts dealing with failure.
Q8. List out the alternative additives of Cassandra?
Ans: The different components of Cassandra are:
Q9. Explain what's a keyspace in Cassandra?
Ans: In Cassandra, a keyspace is a namespace that determines records replication on nodes. A cluster encompass one keyspace in line with node.
Q10. What is the syntax to create keyspace in Cassandra?
Ans: Syntax for creating keyspace in Cassandra is;
CREATE KEYSPACE <identifier> WITH <properties>eleven) Mention what are the values saved in the Cassandra Column?
In Cassandra Column, basically there are 3 values:
Q12. Mention while you may use Alter keyspace?
Ans: ALTER KEYSPACE can be used to trade homes which include the range of replicas and the durable_write of a keyspace.
Q13. Explain what is Cassandra-Cqlsh?
Ans: Cassandra-Cqlsh is a question language that enables users to talk with its database. By the usage of Cassandra cqlsh, you can do following things:
Define a schema
Insert a facts and
Execute a query
Q14. Mention what does the shell instructions “Capture” and “Consistency” determines?
Ans: There are numerous Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and provides it to a report even as, command “Consistency” display the current consistency degree or set a brand new consistency level.
Q15. What is mandatory whilst developing a table in Cassandra?
Ans: While growing a desk number one key is obligatory, it is made up of 1 or more columns of a table.
Q16. Mention what wishes to be taken care even as adding a Column?
Ans: While including a column you want to take care that the:
Column name isn't conflicting with the prevailing column names
Table isn't always defined with compact garage alternative
Q17. Mention what is Cassandra- CQL collections?
Ans: Cassandra CQL collections help you to shop a couple of values in a single variable. In Cassandra, you may use CQL collections in following approaches:
List: It is used whilst the order of the statistics wishes to be maintained, and a cost is to be saved multiple instances (holds the list of specific factors)
SET: It is used for institution of factors to store and again in looked after orders (holds repeating factors)
MAP: It is a facts type used to save a key-value pair of factors
Q18. Explain how Cassandra writes data?
Ans: Cassandra writes information in three components:
Cassandra first writes information to a devote log and then to an in-memory desk shape memtable and at last in SStable
Q19. Explain what is Memtable in Cassandra?
Cassandra writes the data to a in reminiscence structure known as Memtable
It is an in-memory cache with content saved as key/column
By key Memtable records are sorted
There is a separate Memtable for every ColumnFamily, and it retrieves column records from the key
Q20. Explain what is SStable encompass?
Ans: SStable consist of particularly 2 files:
Index record ( Bloom clear out & Key offset pairs)
Data file (Actual column statistics)
Q21. Explain what is Bloom Filter is used for in Cassandra?
Ans: A bloom filter out is a area green data structure this is used to test whether or not an detail is a member of a fixed. In different phrases, it's far used to determine whether or not an SSTable has information for a particular row. In Cassandra it's far used to shop IO when acting a KEY LOOKUP.
Q22. Explain how Cassandra writes modified information into commit log?
Cassandra concatenate modified data to devote log.
Commit log acts as a crash recuperation log for data.
Until the changed information is concatenated to commit log write operation might be never considered a success.
Data will no longer be misplaced once commit log is flushed out to file.
Q23. Explain how Cassandra delete Data.
Ans: SSTables are immutable and cannot do away with a row from SSTables. When a row wishes to be deleted, Cassandra assigns the column value with a unique price called Tombstone. When the facts is read, the Tombstone cost is taken into consideration as deleted.
Q24. Explain the idea of Tunable Consistency in Cassandra.
Ans: Tunable Consistency is an outstanding feature that makes Cassandra a favored database choice of Developers, Analysts and Big facts Architects. Consistency refers to the updated and synchronized records rows on all their replicas. Cassandra’s Tunable Consistency allows users to pick out the consistency degree satisfactory appropriate for his or her use instances. It supports consistencies -Eventual and Consistency and Strong Consistency.
The former guarantees consistency when no new updates are made on a given data item, all accesses go back the remaining up to date cost in the end. Systems with eventual consistency are acknowledged to have done reproduction convergence.
For Strong consistency, Cassandra helps the subsequent situation:
R + W > N, where
N – Number of replicas
W – Number of nodes that want to agree for a a hit write
R – Number of nodes that need to agree for a a success examine
Q25. How does Cassandra write?
Ans: Cassandra performs the write function by applying commits-first it writes to a commit log on disk after which commits to an in-reminiscence based called memtable. Once the 2 commits are a hit, the write is performed. Writes are written in the table structure as SSTable (sorted string table). Cassandra gives swifter write performance.
Q26. Define the control tools in Cassandra.
Ans: DataStaxOpsCenter: net-primarily based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes a further Edition of OpsCenter.
SPM frequently administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also video display units Hadoop, Spark, Solr, Storm, zookeeper and different Big Data structures. The major functions of SPM include correlation of events and metrics, disbursed transaction tracing, growing actual-time graphs with zooming, anomaly detection and heartbeat alerting.
Q27. Define memtable.
Ans: Similar to table, memtable is in-memory/write-lower back cache space consisting of content material in key and column layout. The records in memtable is taken care of by means of key, and each ColumnFamily include a awesome memtable that retrieves column statistics thru key. It stores the writes until it's far complete, after which flushed out.
Q28. What is SSTable? How is it exceptional from other relational tables?
Ans: SSTable expands to ‘Sorted String Table,’ which refers to an essential statistics record in Cassandra and accepts ordinary written memtables. They are stored on disk and exist for each Cassandra desk. Exhibiting immutability, SStables do no longer permit any in addition addition and removal of statistics items as soon as written. For each SSTable, Cassandra creates 3 separate documents like partition index, partition summary and a bloom filter out.
Q29. Explain the concept of Bloom Filter.
Ans: Associated with SSTable, Bloom clear out is an off-heap (off the Java heap to local memory) records shape to check whether or not there may be any statistics to be had in the SSTable before appearing any I/O disk operation.Learn extra about Apache Cassandra- A Brief Intro on this insightful weblog now!
Q30. Explain CAP Theorem.
Ans: With a sturdy requirement to scale structures whilst extra resources are wished, CAP Theorem performs a primary function in keeping the scaling approach. It is an efficient way to handle scaling in distributed structures. Consistency Availability and Partition tolerance (CAP) theorem states that in disbursed structures like Cassandra, users can revel in best two out of these 3 traits.
One of them wishes to be sacrificed. Consistency ensures the go back of most current write for the client, Availability returns a rational reaction within minimal time and in Partition Tolerance, the gadget will maintain its operations when community walls occur. The two options available are AP and CP.
Q31. State the differences between a node, a cluster and datacenter in Cassandra.
Ans: While a node is a single gadget running Cassandra, cluster is a set of nodes which have similar kind of facts grouped together. DataCentersare useful components while serving clients in one of a kind geographical regions. You can group special nodes of a cluster into extraordinary data facilities.
Q32. How to jot down a query in Cassandra?
Ans:Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.
Q33. What OS Cassandra helps?
Ans: Windows and Linux.
Q34. What is Cassandra Data Model?
Ans: Cassandra Data Model includes 4 essential components:
Cluster: Made up of multiple nodes and keyspaces
Keyspace: a namespace to institution multiple column households, mainly one in line with partition
Column: includes a column name, value and timestamp
ColumnFamily: more than one columns with row key reference.
Q35. What is CQL?
Ans: CQL is Cassandra Query language to access and question the Apache allotted database. It consists of a CQL parser that incites all of the implementation details to the server. The syntax of CQL is just like SQL however it does not alter the Cassandra facts model.
Q36. Explain the idea of compaction in Cassandra.
Ans: Compaction refers to a preservation manner in Cassandra , in which, the SSTables are reorganized for facts optimization of data structure son the disk. The compaction technique is useful all through interactive with memtable. There are two kind sof compaction in Cassandra:
Minor compaction: commenced automatically while a brand new sstable is created. Here, Cassandra condenses all the equally sized sstables into one.
Major compaction is triggered manually using nodetool. Compacts all sstables of a ColumnFamily into one.
Q37. Does Cassandra support ACID transactions?
Ans: Unlike relational databases, Cassandra does not support ACID transactions.
Q38. Explain Cqlsh.
Ans: Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive terminal. It is a Python-base command-line prompt used on Linux or Windows and exequte CQL instructions like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE and lots of others. With cqlsh, users can outline a schema, insert statistics and execute a question.
Q39. What is SuperColumn in Cassandra?
Ans: Cassandra Super Column is a completely unique element consisting of similar collections of facts. They are truly key-cost pairs with values as columns. It is a taken care of array of columns, and that they observe a hierarchy when in action: keystore> column own family> first rate column> column information shape in JSON.
Similar to row keys, awesome column records entries incorporates no independent values however are used to gather different columns. It is exciting to be aware that splendid column keys acting in one-of-a-kind rows do now not necessarily suit and could not ever.
Q40. Define the consistency stages for examine operations in Cassandra.
ALL: Highly consistent. A write ought to be written to commitlog and memtable on all reproduction nodes within the cluster
EACH_QUORUM: A write ought to be written to commitlog and memtable on quorum of reproduction nodes in all data facilities.
LOCAL_QUORUM:A write ought to be written to commitlog and memtable on quorum of reproduction nodes within the identical middle.
ONE: A write have to be written to commitlog and memtableof at the least one duplicate node.
TWO, Three: Same as One but at least two and three duplicate nodes, respectively
LOCAL_ONE: A write must be written for at the least one reproduction node inside the local statistics center
SERIAL: Linearizable Consistency to save you unconditional update
LOCAL_SERIAL: Same as Serial but confined to nearby statistics middle
Q41. What is difference between Column and Super Column?
Ans: Both elements work at the principle of tuple having call and fee. However, the former‘s value is a string whilst the cost in latter is a Map of Columns with extraordinary statistics kinds.
Unlike Columns, Super Columns do no longer include the third element of timestamp.
Q42. What is ColumnFamily?
Ans: As the name suggests, ColumnFamily refers to a structure having endless wide variety of rows. That are referred by a key-cost pair, in which key's the call of the column and value represents the column records. It is a good deal just like a hashmap in java or dictionary in Python. Rememeber, the rows aren't limited to a predefined listing of Columns right here. Also, the ColumnFamily is actually flexible with one row having a hundred Columns at the same time as the opposite most effective 2 columns.
Q43. Define using Source Command in Cassandra.
Ans: Source command is used to execute a document which include CQL statements.
Q44. What is Thrift?
Ans: Thrift is a legacy RPC protocol or API unified with a code technology device for CQL. The reason of using Thrift in Cassandra is to facilitate get admission to to the DB across the programming language.
Q45. Explain Tombstone in Cassandra.
Ans: Tombstone is row marker indicating a column deletion. These marked columns are deleted at some stage in compaction. Tombstones are of fantastic significance as Cassnadra supports eventual consistency, in which the statistics should reply before any a success operation.
Q46. What Platforms Cassandra runs on?
Ans: Since Cassandra Online Training is a Java application, it may successfully run on any Java-driven platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux structures.
Q47. Name the ports Cassandra makes use of.
Ans: The default settings country that Cassandra makes use of 7000 ports for Cluster Management, 9160 for Thrift Clients, 8080 for JMX. These are all TCP ports and may be edited within the configuration report: bin/Cassandra.In.Sh
Q48. Can you upload or take away Column Families in a working Cluster?
Ans: Yes, but maintaining in mind the subsequent processes.
Do no longer forget about to clear the commitlog with ‘nodetool drain’
Turn off Cassandra to test that there is no data left in commitlog.
Delete the sstable documents for the removed CFs.
Q49. What is Replication Factor in Cassandra?
Ans: ReplicationFactor is the degree of quantity of information copies existing. It is essential to increase the replication issue to log into the cluster.
Q50. Can we alternate Replication Factor on a live cluster?
Ans: Yes, however it'll require jogging restore to regulate the replica matter of existing facts.