Interview Questions.

Interview Questions For Apache Storm - Jul 17, 2022


Interview Questions For Apache Storm

Q1. Compare Spark & Storm





Data operation Data at rest Data in motion
Parallel computation Task parallel Data parallel
Latency Few seconds Sub-second
Deploying the application Using Scala, Java, Python Using Java API

Q2. Which additives are used for flow drift of records?


Ans: For streaming of statistics waft, 3 additives are used

Bolt : Bolts constitute the processing common sense unit in Storm. One can make use of bolts to do any kind of processing which includes filtering, aggregating, becoming a member of, interacting with facts stores, talking to outside structures etc. Bolts can also emit tuples (data messages) for the following bolts to manner. Additionally, bolts are accountable to renowned the processing of tuples after they may be done processing.

Spout : Spouts represent the source of data in Storm. You can write spouts to examine records from statistics assets together with database, disbursed file structures, messaging frameworks and so forth. Spouts can broadly be classified into following –

Reliable – These spouts have the functionality to replay the tuples (a unit of statistics in statistics stream). This allows programs acquire ‘at the least once message processing’ semantic as in case of screw ups, tuples can be replayed and processed again. Spouts for fetching the facts from messaging frameworks are usually dependable as those frameworks offer the mechanism to replay the messages.

Unreliable – These spouts don’t have the functionality to replay the tuples. Once a tuple is emitted, it cannot be replayed no matter whether or not it became processed successfully or no longer. This type of spouts comply with ‘at maximum once message processing’ semantic.

Tuple : The tuple is the principle statistics shape in Storm. A tuple is a named list of values, in which every cost may be any kind. Tuples are dynamically typed — the styles of the fields do not need to be declared. Tuples have helper methods like getInteger and getString to get discipline values without having to cast the result. Storm desires to recognise how to serialize all the values in a tuple. By default, Storm is aware of the way to serialize the primitive sorts, strings, and byte arrays. If you want to apply any other kind, you’ll want to implement and sign up a serializer for that type

Q3. What are the important thing blessings of using Storm for Real Time Processing?


Easy to operate : Operating typhoon is quiet easy.

Real rapid : It can system a hundred messages consistent with second according to node.

Fault Tolerant : It detects the fault automatically and re-starts offevolved the useful attributes.

Reliable : It ensures that every unit of facts could be done at the least once or exactly once.

Scalable : It runs across a cluster of system.

Q4. Does Apache act as a Proxy server?


Ans: Yes, It acts as proxy additionally by way of using the mod_proxy module. This module implements a proxy, gateway or cache for Apache. It implements proxying capability for AJP13 (Apache JServ Protocol model 1.Three), FTP, CONNECT (for SSL),HTTP/0.Nine, HTTP/1.Zero, and (given that Apache 1.3.23) HTTP/1.1. The module may be configured to connect to different proxy modules for those and different protocols.

Q5. What is using Zookeeper in Storm?

Ans: Storm makes use of Zookeeper for coordinating the cluster. Zookeeper isn't used for message passing, so the load that Storm places on Zookeeper is pretty low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying big Storm clusters you can want larger Zookeeper clusters. Instructions for deploying Zookeeper are here.A few notes approximately Zookeeper deployment:


It’s essential which you run Zookeeper below supervision, on account that Zookeeper is fail-speedy and will exit the system if it encounters any mistakes case. See right here for greater details.

It’s important that you installation a cron to compact Zookeeper’s data and transaction logs. The Zookeeper daemon does no longer do that on its personal, and if you don’t set up a cron, Zookeeper will quick run out of disk space. See here for extra info.

Q6. What is ZeroMQ?

Ans: ZeroMQ is “a library which extends the same old socket interfaces with features traditionally provided via specialised messaging middleware products”. Storm relies on ZeroMQ frequently for challenge-to-undertaking conversation in jogging Storm topologies.

 Q7. How many distinct layers are of Storm’s Codebase?

Ans: There are three wonderful layers to Storm’s codebase.

First : Storm become designed from the very starting to be well suited with more than one languages. Nimbus is a Thrift provider and topologies are described as Thrift structures. The utilization of Thrift allows Storm to be used from any language.

Second : all of Storm’s interfaces are designated as Java interfaces. So even though there’s lots of Clojure in Storm’s implementation, all usage must go through the Java API. This manner that every characteristic of Storm is usually available through Java.

Third : Storm’s implementation is basically in Clojure. Line-sensible, Storm is ready 1/2 Java code, half Clojure code. But Clojure is a great deal extra expressive, so in fact the top notch majority of the implementation good judgment is in Clojure.

Q8. What does it imply for a message to be

A tuple coming off a spout can trigger lots of tuples to be created based on it. Consider, as an example,

the streaming word count topology:TopologyBuilder builder = new TopologyBuilder();

builder.SetSpout("sentences", new KestrelSpout("kestrel.Backtype.Com",



new StringScheme()));

builder.SetBolt("cut up", new SplitSentence(), 10)


builder.SetBolt("remember", new WordCount(), 20)

.FieldsGrouping("break up", new Fields("word"));

Ans: This topology reads sentences off a Kestrel queue, splits the sentences into its constituent phrases, after which emits for every word the variety of times it has seen that phrase earlier than. A tuple coming off the spout triggers many tuples being created primarily based on it: a tuple for each word in the sentence and a tuple for the up to date depend for each phrase.

Storm considers a tuple coming off a spout “absolutely processed” while the tuple tree has been exhausted and each message inside the tree has been processed. A tuple is taken into consideration failed while its tree of messages fails to be fully processed inside a specified timeout. This timeout may be configured on a topology-particular foundation using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.

Q9. When do you name the cleanup method?

Ans: The cleanup approach is called while a Bolt is being shutdown and should cleanup any resources that had been opened. There’s no assure that this approach will be called at the cluster: For instance, if the system the undertaking is walking on blows up, there’s no way to invoke the approach.

The cleanup technique is intended when you run topologies in nearby mode (in which a Storm cluster is simulated in manner), and also you want as a way to run and kill many topologies with out struggling any useful resource leaks.

Q10. How are we able to kill a topology?


Ans: To kill a topology, certainly run:

typhoon kill stormname

Give the equal name to typhoon kill as you used whilst filing the topology.

Storm gained’t kill the topology right now. Instead, it deactivates all of the spouts so that they don’t emit any more tuples, after which Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all of the employees. This gives the topology sufficient time to finish any tuples it became processing when it were given killed.

Q11. What is combinerAggregator?


Ans: A CombinerAggregator is used to combine a set of tuples into a unmarried subject. It has the subsequent signature:

public interface CombinerAggregator 

T init (TridentTuple tuple);

T combine(T val1, T val2);

T zero();


Storm calls the init() approach with every tuple, and then again and again calls the integrate()technique till the partition is processed. The values exceeded into the integrate() technique are partial aggregations, the end result of combining the values again through calls to init().

Q12. What are the not unusual configurations in Apache Storm?

Ans: There are a spread of configurations you can set according to topology. A listing of all of the configurations you can set can be observed right here. The ones prefixed with “TOPOLOGY” may be overridden on a topology-precise foundation (the alternative ones are cluster configurations and can not be overridden). Here are some common ones that are set for a topology:

Config.TOPOLOGY_WORKERS: This sets the range of worker tactics to apply to execute the topology. For example, in case you set this to twenty-five, there might be 25 Java tactics throughout the cluster executing all of the responsibilities. If you had a combined one hundred fifty parallelism throughout all components within the topology, each worker procedure may have 6 responsibilities going for walks inside it as threads.

Config.TOPOLOGY_ACKER_EXECUTORS: This sets the wide variety of executors as a way to tune tuple bushes and locate while a spout tuple has been absolutely processed By now not setting this variable or placing it as null, Storm will set the wide variety of acker executors to be equal to the quantity of people configured for this topology. If this variable is about to 0, then Storm will straight away ack tuples as quickly as they come off the spout, effectively disabling reliability.

Config.TOPOLOGY_MAX_SPOUT_PENDING: This units the maximum wide variety of spout tuples that can be pending on a unmarried spout mission immediately (pending way the tuple has not been acked or failed yet). It is extraordinarily encouraged you set this config to prevent queue explosion.

Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the maximum quantity of time a spout tuple needs to be completely completed before it's miles considered failed. This price defaults to 30 seconds, that's sufficient for most topologies.

Config.TOPOLOGY_SERIALIZATIONS : You can register greater serializers to Storm using this config so you can use custom kinds inside tuples.

Learn greater about Apache Storm in this Apache Storm Video to get beforehand on your career.

Q13. Is it necessary to kill the topology at the same time as updating the running topology?


Ans: Yes, to replace a running topology, the most effective alternative presently is to kill the modern topology and resubmit a new one. A deliberate characteristic is to put into effect a Storm change command that swaps a strolling topology with a new one, ensuring minimum downtime and no hazard of both topologies processing tuples at the equal time.

Q14. How Storm UI can be utilized in topology?


Ans: Storm UI is utilized in monitoring the topology. The Storm UI presents records about errors occurring in tasks and great-grained stats at the throughput and latency performance of every thing of every walking topology.

Q15. Why does now not Apache encompass SSL?


Ans: SSL (Secure Socket Layer) data transport calls for encryption, and lots of governments have regulations upon the import, export, and use of encryption era. If Apache protected SSL in the base bundle, its distribution might involve all styles of felony and bureaucratic issues, and it'd not be freely to be had. Also, some of the technology required to talk to cutting-edge clients using SSL is patented with the aid of RSA Data Security, who restricts its use without a license.

Q16. Does Apache consist of any form of database integration?


Ans: Apache is a Web (HTTP) server, now not an application server. The base bundle does not encompass the sort of capability. PHP undertaking and the mod_perl project can help you paintings with databases from in the Apache environment.

Q17. While putting in, why does Apache have three config files – srm.Conf, get right of entry to.Conf and httpd.Conf?


Ans: The first  are remnants from the NCSA times, and typically you must be best if you delete the primary two, and stay with httpd.Conf.

Srm.Conf: This is the default report for the ResourceConfig directive in httpd.Conf. It is processed after httpd.Conf however before get right of entry to.Conf.

Access.Conf: This is the default file for the AccessConfig directive in httpd.Conf.It is processed after httpd.Conf and srm.Conf.

Httpd.Conf:The httpd.Conf document is well-commented and ordinarily self-explanatory.

Q18. How to check for the httpd.Conf consistency and any errors in it?


Ans: We can check syntax for httpd configuration record with the aid of the use of

following command.

Httpd –S

This command will sell off out an outline of the way Apache parsed the configuration document. Careful exam of the IP addresses and server names can also help discover configuration errors.

Q19. Explain when to apply field grouping in Storm? Is there any time-out or limit to acknowledged area values?


Ans: Field grouping in typhoon makes use of a mod hash characteristic to decide which project to send a tuple, making sure which assignment could be processed in the suitable order. For that, you don’t require any cache. So, there's no time-out or restriction to acknowledged area values.

The flow is partitioned with the aid of the fields certain inside the grouping. For example, if the movement is grouped via the “user-id” subject, tuples with the same “consumer-identification” will constantly visit the equal mission, but tuples with different “user-id”‘s may work to one-of-a-kind obligations.

Q20. What is mod_vhost_alias?


Ans: This module creates dynamically configured digital hosts, through allowing the IP cope with and/or the Host: header of the HTTP request for use as part of the course name to determine what documents to serve. This allows for easy use of a huge variety of virtual hosts with similar configurations.

Q21. Tell me Is running apache as a root is a protection danger?


Ans: No. Root process opens port eighty, but never listens to it, so no user will without a doubt enter the site with root rights. If you kill the basis technique, you'll see the other roots disappear as nicely.

Learn Apache Storm in this Apache Storm Certification Course.

Q22. What is Multiviews?


Ans: MultiViews seek is enabled by the MultiViews Options. It is the overall name given to the Apache server’s capability to provide language-specific record variants in response to a request. This is documented pretty thoroughly inside the content material negotiation description web page. In addition, Apache Week carried a piece of writing in this issue entitled It then chooses the first-class healthy to the purchaser’s requirements, and returns that file.

Q23. Does Apache include a search engine?


Ans: Yes, Apache incorporates a Search engine. You can seek a record name in Apache with the aid of using the “Search identify”.

Q24. Explain how you could streamline log documents using Apache typhoon?


Ans: To read from the log documents, you may configure your spout and emit according to line because it study the log. The output then can be assign to a bolt for reading.

Q25. Mention how storm utility may be useful in financial offerings?


Ans: In financial services, Storm can be helpful in stopping:

Securities fraud:

Perform actual-time anomaly detection on recognized patterns of sports and use learned styles from earlier modeling and simulations.

Correlate transaction facts with different streams (chat, electronic mail, and so on.) in a price-powerful parallel processing environment.

Reduce question time from hours to minutes on huge volumes of information.

Build a single platform for operational programs and analytics that reduces overall cost of ownership (TCO)

Order routing: Order routing is the process by using which an order goes from the end person to an trade. An order may go directly to the exchange from the client, or it could cross first to a dealer who then routes the order to the change.

Pricing: Pricing is the technique whereby a business sets the charge at which it will promote its services and products, and can be a part of the commercial enterprise’s advertising plan.

Compliance Violations : compliance approach conforming to a rule, inclusive of a specification, policy, trendy or law. Regulatory compliance describes the goal that organizations aspire to acquire of their efforts to make sure that they're aware of and take steps to comply with applicable laws and guidelines. And any disturbance in regarding compliance is violations in compliance.

Q26. Can we use Active server pages(ASP) with Apache?


Ans: Apache Web Server package deal does not encompass ASP guide.

However, some of initiatives provide ASP or ASP-like functionality for Apache. Some of those are:

Apache:ASP :- Apache ASP presents Active Server Pages port to the Apache Web Server with Perl scripting most effective, and allows developing of dynamic net applications with consultation control and embedded Perl code. There are also many powerful extensions, consisting of XML taglibs, XSLT rendering, and new occasions not originally a part of the ASP AP.

Mod_mono :- It is an Apache 2.Zero/2.2/2.4.Three module that gives ASP.NET assist for the internet’s preferred server, Apache. It is hosted inner Apache. Depending on your configuration, the Apache box could be one or a dozen of separate approaches, all of these processes will ship their ASP.NET requests to the mod-mono-server process. The mod-mono-server technique in flip can host more than one independent programs. It does this by way of using Application Domains to isolate the applications from each different, while the use of a single Mono digital system.

Q27. Explain what is Toplogy_Message_Timeout_secs in Apache hurricane?


Ans: It is the most amount of time allocated to the topology to absolutely procedure a message launched with the aid of a spout. If the message in now not mentioned in given time body, Apache Storm will fail the message at the spout.

Q28. Mention the distinction between Apache Kafka and Apache Storm?



Apache Kafka: It is a allotted and sturdy messaging device that may deal with massive quantity of records and allows passage of messages from one cease-point to some other. Kafka is designed to permit a unmarried cluster to function the primary statistics spine for a massive organisation. It may be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to permit facts streams larger than the capability of any single system and to allow clusters of coordinated purchasers.


Apache Storm: It is a actual time message processing gadget, and you may edit or control statistics in real-time. Storm pulls the records from Kafka and applies some required manipulation. It makes it easy to reliably technique unbounded streams of information, doing real-time processing what Hadoop did for batch processing. Storm is easy, may be used with any programming language, and is lots of a laugh to apply.

Q29. What is ServerType directive in Apache Server?


Ans: It defines whether Apache must spawn itself as a child method (standalone) or keep everything in a single procedure (inetd). Keeping it inetd conserves assets.

The ServerType directive is protected in Apache 1.Three for background compatibility with older UNIX-based totally version of Apache. By default, Apache is ready to standalone server because of this Apache will run as a separate application on the server. The ServerType directive isn’t to be had in Apache 2.0.

Q30. In which folder are Java Application stored in Apache?


Ans: Java applications are not saved in Apache, it may be simplest connected to a other Java webapp hosting webserver the use of the mod_jk connector. Mod_jk is a substitute to the elderly mod_jserv. It is a very new Tomcat-Apache plug-in that handles the conversation among Tomcat and Apache.Several motives:

mod_jserv become too complex. Because it turned into ported from Apache/JServ, it delivered with it plenty of JServ particular bits that aren’t needed by Apache.

Mod_jserv supported only Apache. Tomcat helps many internet servers via a compatibility layer named the jk library. Supporting  exclusive modes of labor have become difficult in terms of help, documentation and bug fixes. Mod_jk have to restoration that.

The layered technique provided by means of the jk library makes it easier to aid both Apache1.Three.X and Apache2.Xx.

Better aid for SSL. Mod_jserv couldn’t reliably discover whether or not a request turned into made via HTTP or HTTPS. Mod_jk can, the use of the more recent Ajpv13 protocol.