Interview Questions.

Cassandra Crunch Interview Questions and Answers


Cassandra Crunch Interview Questions and Answers

Q1. Explain what is Cassandra?

Ans: Cassandra is an open supply information garage device advanced at Facebook for inbox seek and designed for storing and managing large quantities of records throughout commodity servers. It can server as both;

Real time information save gadget for on line programs

Also as a examine in depth database for commercial enterprise intelligence system

Q2. What is using Cassandra and why to use Cassandra?

Ans: Cassandra become designed to handle huge facts workloads throughout a couple of nodes without any single point of failure.  The different factors answerable for the use of Cassandra are:

It is fault tolerant and consistent

Gigabytes to petabytes scalabilities

It is a column-oriented database

No single factor of failure

No want for separate caching layer

Flexible schema design

It has bendy statistics storage, clean data distribution, and rapid writes

It supports ACID (Atomicity, Consistency, Isolation, and Durability)houses

Multi-statistics middle and cloud succesful

Data compression

Q3. Explain what's composite kind in Cassandra?

Ans: In Cassandra, composite type allows to define key or a column name with a concatenation of statistics of different kind. You can use  sorts of Composite Type:

Row Key

Column Name

Q4. How Cassandra shops information?


All facts stored as bytes

When you specify validator, Cassandra ensures those bytes are encoded as per requirement

Then a comparator orders the column based at the ordering unique to the encoding

While composite are simply byte arrays with a specific encoding, for every aspect it stores a  byte period observed by using the byte encoded aspect observed by way of a termination bit.

Q5. Mention what are the principle components of Cassandra Data Model?

Ans: The predominant components of Cassandra Data Model are:




Column & Family

Q6. What is Cassandra Data Model?

Ans: Cassandra Data Model consists of four essential components:

Cluster: Made up of more than one nodes and keyspaces

Keyspace: a namespace to organization multiple column households, specially one according to partition

Column: consists of a column name, cost and timestamp

ColumnFamily: more than one columns with row key reference.

Q7.Explain what's a column own family in Cassandra?

Ans: Column own family in Cassandra is referred for a group of Rows.

Q8. Explain what's a cluster in Cassandra?

Ans: A cluster is a container for keyspaces. Cassandra database is segmented over several machines that function collectively. The cluster is the outermost field which arranges the nodes in a hoop format and assigns information to them.  These nodes have a replica which takes price in case of data dealing with failure.

Q9. List out the other additives of Cassandra?


The different additives of Cassandra are:


Data Center


Commit log



Bloom Filter

Q10. Explain what is a keyspace in Cassandra?

Ans: In Cassandra, a keyspace is a namespace that determines statistics replication on nodes. A cluster consist of one keyspace in step with node.

Q11. What is the syntax to create keyspace in Cassandra?

Ans: Syntax for growing keyspace in Cassandra is

CREATE KEYSPACE <identifier> WITH <properties>

Q12. Mention what are the values saved within the Cassandra Column?

Ans: In Cassandra Column, essentially there are three values:

Column Name


Time Stamp

Q13. Mention when you could use Alter keyspace?

Ans: ALTER KEYSPACE can be used to exchange homes which include the quantity of replicas and the durable_write of a keyspace.

Q14. Explain what's Cassandra-Cqlsh?

Ans: Cassandra-Cqlsh is a question language that allows users to talk with its database. By using Cassandra cqlsh, you could do following matters:

Define a schema

Insert a data and

Execute a query

Q15. Mention what does the shell commands “Capture” and “Consistency” determines?

Ans: There are diverse Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and provides it to a record even as, command “Consistency” show the present day consistency degree or set a brand new consistency level.

Q16. What is obligatory at the same time as growing a table in Cassandra?

Ans: While growing a table number one secret is obligatory, it is made up of one or greater columns of a desk.

Q17. Mention what desires to be taken care at the same time as adding a Column?

Ans: While adding a column you need to take care that the;

Column name isn't conflicting with the prevailing column names.

Table isn't defined with compact garage alternative.

Q18. Mention what is Cassandra- CQL collections?

Ans: Cassandra CQL collections help you to store multiple values in a unmarried variable. In Cassandra, you may use CQL collections in following methods:

List: It is used when the order of the records wishes to be maintained, and a value is to be stored more than one times (holds the list of particular elements)

SET: It is used for institution of elements to store and lower back in sorted orders (holds repeating factors)

MAP: It is a statistics type used to keep a key-cost pair of factors

Q19. Explain how Cassandra writes statistics?

Ans: Cassandra writes statistics in 3 additives:

Commitlog write

Memtable write

SStable write

Cassandra first writes information to a commit log and then to an in-memory desk structure memtable and at ultimate in SStable

Q20. Explain what's Memtable in Cassandra?


Cassandra writes the information to a in memory structure referred to as Memtable

It is an in-reminiscence cache with content saved as key/column

By key Memtable statistics are sorted

There is a separate Memtable for each ColumnFamily, and it retrieves column information from the important thing


Similar to table, memtable is in-memory/write-back cache area including content material in key and column layout. The data in memtable is looked after by way of key, and each ColumnFamily consist of a wonderful memtable that retrieves column records via key. It shops the writes until it's miles full, and then flushed out.

Q21. Explain what is SStable encompass?

Ans: SStable include specially 2 documents:

Index record ( Bloom filter out & Key offset pairs)

Data file (Actual column statistics)

Q22. What is SSTable? How is it special from different relational tables?

Ans: SSTable expands to ‘Sorted String Table,’ which refers to an vital facts document in Cassandra and accepts regular written memtables. They are saved on disk and exist for each Cassandra desk. Exhibiting immutability, SStables do no longer allow any in addition addition and elimination of information gadgets once written. For each SSTable, Cassandra creates three separate documents like partition index, partition summary and a bloom clear out.

Q23. Explain what's Bloom Filter is used for in Cassandra?

Ans: A bloom filter out is a space efficient records shape this is used to check whether an detail is a member of a set. In other phrases, it's far used to determine whether or not an SSTable has facts for a particular row. In Cassandra it's far used to shop IO whilst appearing a KEY LOOKUP.


Explain the idea of Bloom Filter.

Ans: Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) information shape to check whether or not there's any information to be had within the SSTable before appearing any I/O disk operation.Learn greater about Apache Cassandra- A Brief Intro  in this insightful blog now!


Q24. Explain how Cassandra writes modified records into commitlog?


Cassandra concatenate changed facts to commitlog

Commitlog acts as a crash restoration log for records

Until the changed statistics is concatenated to commitlog write operation will be in no way taken into consideration successful

Data will not be misplaced as soon as commitlog is flushed out to document.

Q25. Explain how Cassandra delete Data?

Ans: SSTables are immutable and can not put off a row from SSTables.  When a row wishes to be deleted, Cassandra assigns the column fee with a unique fee referred to as Tombstone. When the facts is read, the Tombstone fee is considered as deleted. 

Q26. Compare MongoDB and Cassandra.


Criteria    MongoDB    Cassandra

Data Model    Document    Big Table like

Database scalability    Read    Write

Querying of information    Multi-listed    Using Key or Scan

Q27. List the benefits of the usage of Cassandra.

Ans: Unlike traditional or some other database, Apache Cassandradelivers close to real-time overall performance simplifying the paintings of Developers, Administrators, Data Analysts and Software Engineers.

Instead of grasp-slave structure, Cassandra is mounted on peer-to-peer structure ensuring no failure.

It additionally assures exceptional flexibility because it permits insertion of multiple nodes to any Cassandra cluster in any datacenter. Further, any customer can ahead its request to any server.

Cassandra helps extensible scalability and can be without difficulty scaled up and scaled down as per the necessities. With a excessive throughput for examine and write operations, this NoSQL software want no longer be restarted while scaling.

Cassandra is likewise respected for its sturdy information replication capability as it allows facts storage at multiple places enabling users to retrieve information from every other location if one node fails. Users have the option to set up the wide variety of replicas they want to create.

Shows high-quality overall performance whilst used for huge datasets and for that reason, the most superior NoSQL DB by maximum groups.

Operates on column-oriented structure and hence, hastens and simplifies the manner of reducing. Even facts get right of entry to and retrieval becomes extra efficient with column-primarily based data version.

Further, Apache Cassandra helps schema-free/schema-optionally available statistics version, which un-necessitate the cause of showing all of the columns required by means of your application.Find out how Cassandra Versus MongoDB assist you to get beforehand for your profession!

Q28. Explain the concept of Tunable Consistency in Cassandra.

Ans: Tunable Consistency is a phenomenal function that makes Cassandra a preferred database preference of Developers, Analysts and Big statistics Architects. Consistency refers back to the updated and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency permits users to select the consistency level satisfactory proper for his or her use cases. It helps  consistencies -Eventual and Consistency and Strong Consistency.

The former ensures consistency whilst no new updates are made on a given statistics object, all accesses return the remaining up to date value in the end. Systems with eventual consistency are recognized to have carried out replica convergence.

For Strong consistency, Cassandra helps the subsequent condition:

R + W > N, wherein

N – Number of replicas

W – Number of nodes that need to agree for a a hit write

R – Number of nodes that need to agree for a a success read

Q29. How does Cassandra write?

Ans: Cassandra performs the write function with the aid of applying  commits-first it writes to a devote go online disk after which commits to an in-reminiscence dependent called memtable. Once the 2 commits are successful, the write is completed. Writes are written within the table structure as SSTable (taken care of string table). Cassandra offers swifter write performance.

Q30. Define the management equipment in Cassandra.

Ans: DataStaxOpsCenter: internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to down load and consists of an extra Edition of OpsCenter.

SPM frequently administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms. The important features of SPM include correlation of activities and metrics, allotted transaction tracing, growing real-time graphs with zooming, anomaly detection and heartbeat alerting.

Similar to desk, memtable is in-memory/write-returned cache area including content material in key and column layout. The statistics in memtable is taken care of via key, and each ColumnFamily encompass a awesome memtable that retrieves column facts thru key. It stores the writes till it's far complete, after which flushed out.

SSTable expands to ‘Sorted String Table,’ which refers to an critical records report in Cassandra and accepts regular written memtables. They are stored on disk and exist for every Cassandra table. Exhibiting immutability, SStables do now not permit any similarly addition and removal of statistics gadgets once written. For each SSTable, Cassandra creates 3 separate documents like partition index, partition precis and a bloom filter out.

Q31. Explain CAP Theorem.

Ans: With a sturdy requirement to scale structures whilst extra resources are wished, CAP Theorem plays a main position in maintaining the scaling strategy. It is an effective manner to address scaling in disbursed systems. Consistency Availability and Partition tolerance (CAP) theorem states that in disbursed structures like Cassandra, customers can enjoy handiest  out of those 3 characteristics.

One of them desires to be sacrificed. Consistency ensures the return of maximum current write for the consumer, Availability returns a rational reaction within minimum time and in Partition Tolerance, the gadget will hold its operations whilst network walls arise. The two alternatives available are AP and CP.

Q32. State the differences between a node, a cluster and datacenter in Cassandra.

Ans: While a node is a unmarried device walking Cassandra, cluster is a set of nodes that have comparable type of statistics grouped collectively. DataCentersare beneficial additives while serving clients in specific geographical areas. You can organization exclusive nodes of a cluster into distinctive statistics centers.

Q33. How to write a question in Cassandra?

Ans: Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.

Q34. What OS Cassandra supports?

Ans: Windows and Linux.

Q35. What is CQL?

Ans: CQL is Cassandra Query language to access and question the Apache disbursed database. It consists of a CQL parser that incites all the implementation information to the server. The syntax of CQL is similar to SQL but it does not modify the Cassandra statistics version.

Q36. Explain the idea of compaction in Cassandra.

Ans: Compaction refers to a maintenance manner in Cassandra , wherein, the SSTables are reorganized for statistics optimization of statistics structure son the disk. The compaction process is beneficial all through interactive with memtable. There are two type sof compaction in Cassandra:

Minor compaction: started robotically whilst a brand new sstable is created. Here, Cassandra condenses all of the equally sized sstables into one.

Major compaction is prompted manually the use of nodetool. Compacts all sstables of a ColumnFamily into one.

Q37. Does Cassandra help ACID transactions?

Ans: Unlike relational databases, Cassandra does no longer assist ACID transactions.

Q38. Explain Cqlsh.

Ans: Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive terminal. It is a Python-base command-line set off used on Linux or Windows and exequte CQL commands like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE and many others. With cqlsh, customers can define a schema, insert statistics and execute a query.

Q39.What is SuperColumn in Cassandra?

Ans: Cassandra Super Column is a completely unique element consisting of comparable collections of information. They are truely key-price pairs with values as columns. It is a taken care of array of columns, and that they follow a hierarchy when in action: keystore> column own family> amazing column> column facts shape in JSON.

Similar to row keys, first-rate column records entries contains no independent values however are used to accumulate other columns. It is thrilling to be aware that fantastic column keys acting in special rows do now not always in shape and could no longer ever.

Q40. Define the consistency degrees for examine operations in Cassandra.


ALL: Highly regular. A write need to be written to commitlog and memtable on all replica nodes inside the cluster

EACH_QUORUM: A write ought to be written to commitlog and memtable on quorum of reproduction nodes in all records facilities.

LOCAL_QUORUM:A write ought to be written to commitlog and memtable on quorum of duplicate nodes within the identical middle.

ONE: A write need to be written to commitlog and memtableof at least one replica node.

TWO, Three: Same as One however as a minimum two and three duplicate nodes, respectively

LOCAL_ONE: A write need to be written for as a minimum one replica node inside the neighborhood records middle


SERIAL: Linearizable Consistency to prevent unconditional replace

LOCAL_SERIAL: Same as Serial however restrained to neighborhood statistics center

Q41. What is distinction among Column and Super Column?

Ans: Both elements paintings at the principle of tuple having call and fee. However, the previous‘s fee is a string whilst the fee in latter is a Map of Columns with exceptional facts kinds.

Unlike Columns, Super Columns do not comprise the third factor of timestamp.

Q42. What is ColumnFamily?

Ans: As the name shows, ColumnFamily refers to a structure having endless range of rows. That are referred by way of a key-fee pair, where key is the call of the column and price represents the column information. It is much similar to a hashmap in java or dictionary in Python. Rememeber, the rows are not limited to a predefined list of Columns right here. Also, the ColumnFamily is simply bendy with one row having one hundred Columns while the alternative handiest 2 columns.

Q43. Define using Source Command in Cassandra.

Ans: Source command is used to execute a file along with CQL statements.

Q44. What is Thrift?

Ans: Thrift is a legacy RPC protocol or API unified with a code era tool for CQL. The purpose of using Thrift in Cassandra is to facilitate get entry to to the DB throughout the programming language.

Q45. Explain Tombstone in Cassandra.

Ans: Tombstone is row marker indicating a column deletion. These marked columns are deleted all through compaction. Tombstones are of incredible importance as Cassnadra supports eventual consistency, in which the records must respond earlier than any a success operation.

Q46. What Platforms Cassandra runs on?

Ans: Since Cassandra Online Training is a Java utility, it may effectively run on any Java-pushed platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux platforms.

Q47. Name the ports Cassandra uses.

Ans: The default settings country that Cassandra uses 7000 ports for Cluster Management, 9160 for Thrift Clients, 8080 for JMX. These are all TCP ports and can be edited within the configuration record: bin/Cassandra.In.Sh

Q48. Can you upload or get rid of Column Families in a operating Cluster?

Ans: Yes, however retaining in thoughts the subsequent techniques.

Do now not forget to clean the commitlog with ‘nodetool drain’

Turn off Cassandra to check that there's no information left in commitlog

Delete the sstable documents for the removed CFs

Q49. What is Replication Factor in Cassandra?

Ans: ReplicationFactor is the degree of number of records copies existing. It is critical to growth the replication aspect to log into the cluster.

Q50. Can we exchange Replication Factor on a stay cluster?

Ans: Yes, however it will require jogging restore to regulate the replica matter of current information.

Q51. How to iterate all rows in ColumnFamily?

Ans: Using get_range_slices. You can start new release with the empty string and after every iteration, the remaining key read serves because the begin key for next new release.