CrowdforGeeks | Build Skills with Online Courses from Top Institutions

Top 35 Apache Kafka Interview Questions

Q1. What Is Mod_vhost_alias?

This module creates dynamically configured virtual hosts, by allowing the IP cope with and/or the Host: header of the HTTP request for use as part of the pathname to determine what documents to serve. This lets in for easy use of a huge wide variety of virtual hosts with similar configurations.

Q2. List The Various Components In Kafka?

The four principal additives of Kafka are:

Topic – a flow of messages belonging to the same kind

Producer – which can post messages to a topic

Brokers – a set of servers where the publishes messages are stored

Consumer – that subscribes to diverse subjects and pulls information from the brokers.

Q3. When Do You Call The Cleanup Method?

The cleanup method is referred to as whilst a Bolt is being shutdown and have to cleanup any sources that had been opened. There’s no assure that this method might be known as at the cluster: For instance, if the device the assignment is going for walks on blows up, there’s no way to invoke the method. The cleanup technique is intended whilst you run topologies in local mode (wherein a Storm cluster is simulated in technique), and also you need for you to run and kill many topologies without struggling any resource leaks.

Q4. Mention What Is The Traditional Method Of Message Trfer?

The traditional technique of message trfer consists of methods

Queuing: In a queuing, a pool of clients might also read message from the server and every message is going to one in every of them

Publish-Subscribe: In this model, messages are broadcasted to all consumers

Kafka caters unmarried customer abstraction that generalized each of the above- the purchaser organization.

Q5. Explain The Role Of The Kafka Producer Api.

The position of Kafka’s Producer API is to wrap the 2 manufacturers – kafka.Manufacturer.SyncProducer and the kafka.Producer.Async.AsyncProducer. The aim is to expose all of the producer capability thru a unmarried API to the purchaser.

Q6. Why Replication Is Required In Kafka?

Replication of message in Kafka guarantees that any posted message does now not lose and can be consumed in case of gadget errors, software blunders or more common software program enhancements.

Q7. Mention What Is The Maximum Size Of The Message Does Kafka Server Can Receive?

The maximum size of the message that Kafka server can acquire is one million bytes.

Q8. Which Components Are Used For Stream Flow Of Data?

Bolt:- Bolts constitute the processing common sense unit in Storm. One can make use of bolts to do any sort of processing including filtering, aggregating, joining, interacting with statistics stores, speaking to outside structures and so forth. Bolts also can emit tuples (statistics messages) for the subsequent bolts to method. Additionally, bolts are accountable to well known the processing of tuples after they're done processing.

Spout:- Spouts represent the supply of records in Storm. You can write spouts to examine data from facts assets together with database, dispensed record systems, messaging frameworks and so forth. Spouts can broadly be classified into following –

Reliable:- These spouts have the capability to replay the tuples (a unit of data in data movement). This helps applications gain ‘as a minimum as soon as message processing’ semantic as in case of failures, tuples can be replayed and processed again. Spouts for fetching the data from messaging frameworks are normally reliable as those frameworks provide the mechanism to replay the messages.

Unreliable:- These spouts don’t have the functionality to replay the tuples. Once a tuple is emitted, it can not be replayed no matter whether or not it was processed correctly or no longer. This form of spouts follow ‘at maximum once message processing’ semantic.

Tuple:- The tuple is the principle records shape in Storm. A tuple is a named list of values, where every cost can be any kind. Tuples are dynamically typed — the varieties of the fields do now not need to be declared. Tuples have helper techniques like getInteger and getString to get field values while not having to solid the result. Storm desires to realize how to serialize all of the values in a tuple. By default, Storm knows how to serialize the primitive kinds, strings, and byte arrays. If you want to apply some other type, you’ll need to put in force and sign in a serializer for that kind.

Q9. How Can We Kill A Topology?

To kill a topology, actually run:

typhoon kill stormname

Give the same name to hurricane kill as you used whilst submitting the topology. Storm won’t kill the topology right now. Instead, it deactivates all of the spouts so that they don’t emit any greater tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds earlier than destroying all of the workers. This offers the topology sufficient time to finish any tuples it was processing when it were given killed.

Q10. Explain When Using Field Grouping In Storm, Is There Any Time-out Or Limit To Known Field Values?

Field grouping in typhoon uses a mod hash function to decide which challenge to send a tuple, making sure which undertaking might be processed in an appropriate order. For that, you don’t require any cache. So, there's no time-out or restrict to regarded area values.

Q11. Explain How You Can Improve The Throughput Of A Remote Consumer?

If the customer is positioned in a extraordinary statistics center from the broking, you could require to track the socket buffer size to amortize the long network latency.

Q12. Does Apache Act As A Proxy Server?

Yes, It acts as proxy additionally through using the mod_proxy module. This module implements a proxy, gateway or cache for Apache. It implements proxying capability for AJP13 (Apache JServ Protocol version 1.Three), FTP, CONNECT (for SSL),HTTP/zero.Nine, HTTP/1.0, and (due to the fact that Apache 1.3.23) HTTP/1.@The module can be configured to connect to other proxy modules for these and other protocols.

Q13. What Are The Key Benefits Of Using Storm For Real Time Processing?

Easy to perform: Operating typhoon is quiet easy

Real fast: It can technique one hundred messages per second according to node

Fault Tolerant: It detects the fault mechanically and re-begins the useful attributes

Reliable: It ensures that each unit of facts can be completed as a minimum once or exactly as soon as

Scalable: It runs throughout a cluster of system.

Q14. What Is Struct And Explain Its Purpose?

A struts is a open source framework for creating a Java internet packages.

Q15. What Does It Indicate If Replica Stays Out Of Isr For A Long Time?

If a replica remains out of ISR for an prolonged time, it shows that the follower is unable to fetch statistics as rapid as records collected at the chief.

Q16. Mention What Is Apache Kafka?

Apache Kafka is a post-subscribe messaging system evolved by Apache written in Scala. It is a dispensed, partitioned and replicated log carrier.

Q17. Explain How You Can Reduce Churn In Isr? When Does Broker Leave The Isr?

ISR is a fixed of message replicas which can be absolutely synced up with the leaders, in other phrase ISR has all messages that are committed. ISR must continually include all replicas until there's a actual failure. A duplicate could be dropped out of ISR if it deviates from the leader.

Q18. Explain How You Can Get Exactly Once Messaging From Kafka During Data Production?

During facts, production to get precisely once messaging from Kafka you have to follow two things keeping off duplicates in the course of records intake and avoiding duplication throughout statistics manufacturing.

Here are the two ways to get precisely one semantics even as statistics manufacturing:

Avail a single creator in keeping with partition, whenever you get a community errors assessments the closing message in that partition to look if your ultimate write succeeded

In the message include a primary key (UUID or something) and de-replica on the patron

Q19. Explain The Role Of The Offset?

Messages contained within the walls are assigned a completely unique ID quantity that is known as the offset. The position of the offset is to uniquely discover every message in the partition.

Q20. What Is Zeromq?

ZeroMQ is “a library which extends the usual socket interfaces with functions traditionally supplied by means of specialized messaging middleware products”. Storm is predicated on ZeroMQ in the main for undertaking-to-project communication in running Storm topologies.

Q21. Is It Possible To Get The Message Offset After Producing?

You can't do this from a class that behaves as a producer like in most queue structures, its role is to hearth and neglect the messages. The dealer will do the rest of the paintings like appropriate metadata managing with id’s, offsets, and many others.

As a consumer of the message, you may get the offset from a Kafka broker. If you gaze inside the SimpleConsumer magnificence, you'll observe it fetches MultiFetchResponse gadgets that encompass offsets as a list. In addition to that, while you iterate the Kafka Message, you may have MessageAndOffset objects that consist of each, the offset and the message despatched.

Q22. Mention What Is The Meaning Of Broker In Kafka?

In Kafka cluster, dealer time period is used to refer Server.

Q23. In Which Folder Are Java Applications Stored In Apache?

Java applications are not saved in Apache, it could be best connected to a other Java webapp hosting webserver the use of the mod_jk connector.

Q24. How Do You Define A Partitioning Key?

Within the Producer, the role of a Partitioning Key is to suggest the vacation spot partition of the message. By default, a hashing-primarily based Partitioner is used to determine the partition ID given the key. Alternatively, customers also can use custom designed Partitions.

Q25. Mention What Is The Benefits Of Apache Kafka Over The Traditional Technique?

Apache Kafka has following benefits above traditional messaging approach

Fast: A single Kafka dealer can serve heaps of customers via dealing with megabytes of reads and writes according to second

Scalable: Data are partitioned and streamlined over a cluster of machines to permit larger information

Durable: Messages are chronic and is replicated in the cluster to save you information loss

Distributed via Design: It affords fault tolerance ensures and durability

Q26. Explain How Message Is Consumed By Consumer In Kafka?

Trfer of messages in Kafka is achieved by way of using sendfile API. It allows the trfer of bytes from the socket to disk thru kernel space saving copies and speak to between kernel user returned to the kernel.

Q27. While Installing, Why Does Apache Have Three Config Files - Srm.Conf, Access.Conf And Httpd.Conf?

The first are remnants from the NCSA times, and usually you have to be ok in case you delete the first , and stay with httpd.Conf.

Q28. What Is Combiner Aggregator?

A Combiner Aggregator is used to combine a set of tuples right into a single field. It has the subsequent signature:

public interface CombinerAggregator

T init (TridentTuple tuple);

T combine(T val1, T val2);

T zero();

Storm calls the init() method with every tuple, and then repeatedly calls the combine()approach until the partition is processed. The values surpassed into the combine() technique are partial aggregations, the end result of mixing the values returned through calls to init().

Q29. Explain What Is Zookeeper In Kafka? Can We Use Kafka Without Zookeeper?

Zookeeper is an open supply, high-performance co-ordination carrier used for distributed programs tailored through Kafka.

No, it isn't possible to bye-pass Zookeeper and join straight to the Kafka broker. Once the Zookeeper is down, it can't serve consumer request.

Zookeeper is basically used to talk among different nodes in a cluster

In Kafka, it's far used to dedicate offset, so if node fails in any case it is able to be retrieved from the formerly devoted offset

Apart from this it also does different activities like leader detection, allotted synchronization, configuration control, identifies when a new node leaves or joins, the cluster, node repute in actual time, and many others.

Q30. Explain How To Write The Output Into A File Using Storm?

In Spout, whilst you are studying report, make FileReader object in Open() method, as such that point it initializes the reader item for employee node. And use that item in nextTuple() approach.

Q31. Is It Necessary To Kill The Topology While Updating The Running Topology?

Yes, to replace a walking topology, the simplest alternative currently is to kill the contemporary topology and resubmit a new one. A deliberate function is to implement a Storm change command that swaps a going for walks topology with a brand new one, ensuring minimal downtime and no threat of each topologies processing tuples on the same time.

Q32. Mention What Is The Difference Between Apache Kafka And Apache Storm?

Apach Kafeka: It is a disbursed and robust messaging device that may handle massive quantity of information and lets in passage of messages from one quit-factor to every other.

Apache Storm: It is a actual time message processing gadget, and you could edit or manipulate records in real time. Apache hurricane pulls the statistics from Kafka and applies some required manipulation.

Q33. In The Producer, When Does Queuefullexception Occur?

QueueFullException usually takes place whilst the Producer attempts to send messages at a pace that the Broker can not take care of. Since the Producer doesn’t block, users will want to feature enough brokers to collaboratively manage the improved load.

Q34. Mention What Happens If The Preferred Replica Is Not In The Isr?

If the desired reproduction isn't always within the ISR, the controller will fail to move leadership to the preferred replica.

Q35. Explain The Concept Of Leader And Follower?

Every partition in Kafka has one server which plays the position of a Leader, and none or greater servers that act as Followers. The Leader plays the mission of all read and write requests for the partition, while the function of the Followers is to passively replicate the leader. In the event of the Leader failing, one of the Followers will take on the position of the Leader. This guarantees load balancing of the server.