Apache Kafka Interview Questions and Answers
Q1. Mention what is Apache Kafka?
Ans: Apache Kafka is a publish-subscribe messaging device developed by using Apache written in Scala. It is a disbursed, partitioned and replicated log provider.
Q2. Mention what's the conventional approach of message switch?
Ans: The traditional method of message transfer includes two methods
Queuing:In a queuing, a pool of customers can also examine message from the server and every message goes to one of them
Publish-Subscribe:In this model, messages are broadcasted to all customers
Kafka caters unmarried purchaser abstraction that generalized each of the above- the customer organization.
Q3. Mention what is the advantages of Apache Kafka over the traditional technique?
Ans: Apache Kafka has following advantages above traditional messaging method
Fast:A single Kafka broker can serve heaps of customers by means of dealing with megabytes of reads and writes according to 2nd
Scalable: Data are partitioned and streamlined over a cluster of machines to enable large statistics
Durable: Messages are continual and is replicated within the cluster to save you facts loss
Distributed through Design: It offers fault tolerance ensures and durability
Q4. Mention what's the that means of broker in Kafka?
Ans: In Kafka cluster, broking time period is used to refer Server.
Q5. Compare Kafka & Flume
Criteria Kafka Flume
Data waft Pull Push
Hadoop Integration Loose Tight
Functionality Publish-subscribe model messaging gadget System for records collection, aggregation & movement
Q6. What position ZooKeeper performs in a cluster of Kafka?
Ans: Kafka is an open supply system and also a disbursed device is built to use Zookeeper. The fundamental obligation of Zookeeper is to construct coordination among exclusive nodes in a cluster. Since Zookeeper works as periodically devote offset in order that if any node fails, it is going to be used to get over previously committed to offset.
The ZooKeeper is also accountable for configuration control, leader detection, detecting if any node leaves or joins the cluster, synchronization, and so forth.
Q7. What is Kafka?
Ans: Kafka is a message divider mission coded in Scala. Kafka is at first evolved by using LinkedIn and evolved as an open sourced in early 2011. The purpose of the assignment is to attain the nice stand for undertaking the actual-time statistics nourishment.
Q8. Why do you watched the replications are dangerous in Kafka?
Ans: Duplication assures that issued messages which might be to be had are absorbed within the case of any equipment mistake, plan fault or recurrent software promotions.
Q9. What major function a Kafka Producer API performs?
Ans: It is chargeable for protecting the 2 producers- kafka.Manufacturer.SyncProducer and the kafka.Producer.Async.AsyncProducer. The essential purpose is to reveal all the manufacturer overall performance thru a unmarried API to the customers.
Q10. Distinguish between the Kafka and Flume?
Ans: Flume’s principal use-case is to gulp down the statistics into Hadoop. The Flume is integrated with the Hadoop’s monitoring device, record codecs, file machine and utilities together with Morphlines. Flume’s layout of sinks, sources and channels mean that with the aid of Flume you can shift records among other systems lithely, however the main characteristic is its Hadoop integration.
The Flume is the great option used when you have non-relational data resources if you have an extended record to flow into the Hadoop.Kafka’s primary use-case is a allotted post- subscribe messaging machine. Kafka isn't always developed specifically for Hadoop and using Kafka to examine and write records to Hadoop is appreciably trickier than it is in Flume.
Kafka can be used when you mainly need a pretty reliable and scalable agency messaging gadget to attach many a couple of structures like Hadoop.
Q11. Describe partitioning key?
Ans: Its function is to specify the target divider of the memo, inside the manufacturer. Usually, a hash-oriented divider concludes the divider ID according to the given elements. Consumers additionally use the tailor-made Partitions.
Q12. Inside the producer, while does the Queue Full Exception emerge?
Ans: Queue Full Exception clearly takes place while the producer attempts to propel communications at a velocity which Broker can’t grip. Consumers need to insert sufficient agents to collectively grip the amplified load for the reason that Producer doesn’t block.
Q13. Can Kafka be utilized with out Zookeeper?
Ans: It is impossible to use Kafka with out Zookeeper as it is not possible to move round Zookeeper and attach in a immediately line to the server. If the Zookeeper is down for some of reasons, then we can no longer be able to serve any customer call for.
Q14. What are customers or customers?
Ans: Kafka offers unmarried customer abstractions that discover each queuing and post-subscribe Consumer Group. They tag themselves with a person organization and each communication available on a topic is shipped to at least one person case inside each promising user organization. User times are in disconnected technique. We can decide the messaging model of the customer primarily based on the client businesses.
If all purchaser instances have the identical customer set, then this works like a conventional queue adjusting load over the clients.
If all client times have numerous purchaser corporations, then this works like a post-subscribe and all messages are transmitted to all the purchasers.
Q15. Describe an Offset?
Ans: The messages in the walls may be given a sequential ID quantity referred to as an offset, the offset will be used to pick out each message within the partition uniquely. With the aid of Zookeeper Kafka shops the offsets of messages consumed for a particular subject matter and partition through this purchaser organization.
Q16. What do you already know approximately partitioning key?
Ans: A partition key may be particular to factor to the aimed division of a communique, in Kafka manufacturer. Usually, a hash-orientated divider concludes the division identity with the input and people uses changed divisions also.
Q17. Why is Kafka era great to use?
Ans: Kafka being allotted publish-subscribe gadget has the advantages as underneath.Fast: Kafka contains of a broker and a single dealer can serve lots of clients by way of dealing with megabytes of reads and writes consistent with second.Scalable: statistics are partitioned and streamlined over a cluster of machines to allow large informationDurable: Messages are persistent and is replicated inside the cluster to prevent file loss Distributed through Design: It provides fault tolerance ensures and strong.
Q18. Mention what is the maximum size of the message does Kafka server can get hold of?
Ans: The maximum size of the message that Kafka server can receive is 1000000 bytes.
Q19. Which are the elements of Kafka?
The maximum important factors of Kafka:
Topic – It is the bunch of comparable form of messages
Producer – the use of this you'll be able to issue communications to the subject
Consumer – it endures to a ramification of subjects and takes information from brokers.
Brokers – that is the region where the issued messages are stored
Q20. Explain what's Zookeeper in Kafka? Can we use Kafka with out Zookeeper?
Ans: Zookeeper is an open supply, high-performance co-ordination provider used for disbursed packages adapted by means of Kafka.
No, it is not feasible to bye-pass Zookeeper and join directly to the Kafka dealer. Once the Zookeeper is down, it cannot serve client request.
Zookeeper is basically used to communicate among different nodes in a cluster
In Kafka, it is used to dedicate offset, so if node fails anyhow it could be retrieved from the previously devoted offset
Apart from this it also does different activities like chief detection, allotted synchronization, configuration control, identifies when a new node leaves or joins, the cluster, node repute in actual time, and so forth.
Q21. Explain how message is ate up by means of consumer in Kafka?
Ans: Transfer of messages in Kafka is completed by way of using sendfile API. It allows the switch of bytes from the socket to disk thru kernel space saving copies and speak to between kernel user back to the kernel.
Q22. Explain how you may improve the throughput of a faraway patron?
Ans: If the customer is positioned in a one-of-a-kind data center from the dealer, you could require to music the socket buffer length to amortize the long community latency.
Q23. Explain how you could get exactly once messaging from Kafka at some stage in information production?
Ans: During information, manufacturing to get exactly once messaging from Kafka you need to follow two things warding off duplicates in the course of records intake and warding off duplication in the course of information manufacturing.
Here are the 2 ways to get precisely one semantics while information production:
Avail a single author in keeping with partition, every time you get a community errors checks the last message in that partition to look if your ultimate write succeeded
In the message consist of a number one key (UUID or something) and de-replica on the purchaser
Q24. Explain how you can lessen churn in ISR? When does broking depart the ISR?
Ans: ISR is a hard and fast of message replicas which are absolutely synced up with the leaders, in different phrase ISR has all messages which might be committed. ISR should usually consist of all replicas until there may be a real failure. A duplicate will be dropped out of ISR if it deviates from the leader.
Q25. Why replication is needed in Kafka?
Ans: Replication of message in Kafka ensures that any posted message does not lose and may be consumed in case of machine blunders, application error or more common software enhancements.
Q26. What does it imply if replica stays out of ISR for a long time?
Ans: If a replica stays out of ISR for an prolonged time, it shows that the follower is not able to fetch facts as speedy as information gathered on the leader.
Q27. Mention what takes place if the preferred duplicate isn't inside the ISR?
Ans: If the favored duplicate isn't always inside the ISR, the controller will fail to move management to the preferred duplicate.
Q28. Is it feasible to get the message offset after producing?
Ans: You cannot do this from a class that behaves as a manufacturer like in most queue structures, its position is to fireplace and forget the messages. The broker will do the relaxation of the paintings like suitable metadata dealing with with id’s, offsets, and so forth.
As a customer of the message, you could get the offset from a Kafka broker. If you gaze in the SimpleConsumer elegance, you'll observe it fetches MultiFetchResponse items that consist of offsets as a list. In addition to that, whilst you iterate the Kafka Message, you may have MessageAndOffset items that consist of each, the offset and the message despatched.
Q29. Elaborate Kafka architecture.
Ans: A cluster carries more than one agents seeing that it's far a distributed gadget. Topic inside the machine will get divided into a couple of partitions and each broking shop one or greater of those walls so that more than one producers and clients can put up and retrieve messages at the equal time.
Q30. How to begin a Kafka server?
Ans: Given that Kafka physical games Zookeeper, we ought to begin the Zookeeper’s server.
Learn extra in this Zookeeper Tutorial now.
One can use the convince script packaged with Kafka to get a crude however powerful single node Zookeeper example> bin/zookeeper-server-start.Shconfig/zookeeper.PropertiesNow the Kafka server can start> bin/Kafka-server-begin.Shconfig/server.Residences