YouTube Icon

Interview Questions.

Top 27 Apache Storm Interview Questions - Jul 25, 2022

fluid

Top 27 Apache Storm Interview Questions

Q1. Explain How You Can Streamline Log Files Using Apache Storm?

To examine from the log documents, you can configure your spout and emit consistent with line as it read the log. The output then may be assign to a bolt for studying.

Q2. What Are The Key Benefits Of Using Storm For Real Time Processing?

Easy to perform : Operating hurricane is quiet smooth.

Real fast : It can manner 100 messages consistent with second per node.

Fault Tolerant : It detects the fault routinely and re-starts offevolved the useful attributes.

Reliable : It guarantees that each unit of information will be achieved at least as soon as or exactly as soon as.

Scalable : It runs throughout a cluster of device

Q3. In Which Folder Are Java Application Stored In Apache?

Java programs aren't saved in Apache, it can be only related to a different Java webapp website hosting webserver the use of the mod_jk connector. Mod_jk is a replacement to the elderly mod_jserv. It is a totally new Tomcat-Apache plug-in that handles the communique among Tomcat and Apache.

Several reasons:

mod_jserv become too complicated : Because it changed into ported from Apache/JServ, it brought with it lots of JServ unique bits that aren’t wished by Apache.

Mod_jserv supported best Apache : Tomcat helps many net servers through a compatibility layer named the jk library. Supporting two unique modes of labor became elaborate in phrases of assist, documentation and computer virus fixes. Mod_jk must restore that.

The layered method : furnished by the jk library makes it easier to guide each Apache1.Three.X and Apache2.Xx.

Better support for SSL : mod_jserv couldn’t reliably become aware of whether or not a request became made thru HTTP or HTTPS. Mod_jk can, the use of the more recent Ajpv13 protocol.

Q4. Mention How Storm Application Can Be Beneficial In Financial Services?

In economic offerings, Storm may be beneficial in preventing

Securities fraud :

Perform actual-time anomaly detection on recognised styles of activities and use discovered styles from previous modeling and simulations.

Correlate traction facts with different streams (chat, email, and many others.) in a price-effective parallel processing environment.

Reduce query time from hours to minutes on large volumes of information.

Build a unmarried platform for operational programs and analytics that reduces total cost of possession (TCO)

Order routing : Order routing is the system by which an order is going from the quit user to an exchange. An order may work directly to the exchange from the patron, or it may move first to a broker who then routes the order to the trade.

Pricing : Pricing is the system wherein a enterprise units the rate at which it'll sell its services and products, and can be part of the business’s marketing plan.

Compliance Violations : compliance me conforming to a rule, inclusive of a specification, coverage, preferred or regulation. Regulatory compliance describes the goal that companies aspire to acquire in their efforts to ensure that they may be aware of and take steps to conform with applicable laws and rules. And any disturbance in regarding compliance is violations in compliance.

Q5. Can We Use Active Server Pages(asp) With Apache?

Apache Web Server package deal does now not include ASP guide. However, a number of projects offer ASP or ASP-like functionality for Apache.

Some of these are:

Apache:ASP :- Apache ASP presents Active Server Pages port to the Apache Web Server with Perl scripting most effective, and allows developing of dynamic net programs with consultation control and embedded Perl code. There are also many powerful extensions, such as XML taglibs, XSLT rendering, and new events not at the start part of the ASP AP.

Mod_mono :- It is an Apache 2.0/2.2/2.4.Three module that gives ASP.NET aid for the net’s preferred server, Apache. It is hosted interior Apache. Depending for your configuration, the Apache container could be one or a dozen of separate procedures, all of these approaches will send their ASP.NET requests to the mod-mono-server manner. The mod-mono-server method in turn can host a couple of impartial packages. It does this through the usage of Application Domains to isolate the applications from every other, whilst the use of a single Mono virtual gadget.

Q6. Why Does Not Apache Include Ssl?

SSL (Secure Socket Layer) data trport calls for encryption, and lots of governments have regulations upon the import, export, and use of encryption technology. If Apache covered SSL inside the base package deal, its distribution might contain all sorts of criminal and bureaucratic problems, and it might no longer be freely to be had. Also, some of the generation required to talk to present day customers the usage of SSL is patented via RSA Data Security, who restricts its use without a license.

Q7. What Is Servertype Directive In Apache Server?

It defines whether or not Apache should spawn itself as a baby system (standalone) or keep the whole lot in a single method (inetd). Keeping it inetd conserves sources.

The ServerType directive is protected in Apache 1.Three for historical past compatibility with older UNIX-based totally version of Apache. By default, Apache is ready to standalone server which me Apache will run as a separate application on the server. The ServerType directive isn’t to be had in Apache 2.0.

Q8. What Is Combineraggregator?

A CombinerAggregator is used to mix a hard and fast of tuples into a unmarried area. It has the subsequent signature:

public interface CombinerAggregator 

T init (TridentTuple tuple);

T combine(T val1, T val2);

T 0();

 

Storm calls the init() technique with each tuple, and then again and again calls the integrate()technique until the partition is processed. The values handed into the combine() technique are partial aggregations, the end result of combining the values again through calls to init().

Q9. When Do You Call The Cleanup Method?

The cleanup approach is called while a Bolt is being shutdown and need to cleanup any resources that had been opened. There’s no assure that this method will be called at the cluster: For instance, if the machine the challenge is going for walks on blows up, there’s no manner to invoke the technique.

The cleanup approach is intended while you run topologies in local mode (wherein a Storm cluster is simulated in method), and you want so as to run and kill many topologies with out suffering any resource leaks.

Q10. Does Apache Include A Search Engine?

Yes, Apache carries a Search engine. You can search a report call in Apache via using the “Search identify”.

Q11. How To Check For The Httpd.Conf Consistency And Any Errors In It?

We can check syntax for httpd configuration report through the use of following command.

Httpd –S

This command will dump out an outline of ways Apache parsed the configuration document. Careful exam of the IP addresses and server names can also help find configuration mistakes.

Q12. What Is Zeromq?

ZeroMQ is “a library which extends the usual socket interfaces with functions historically supplied by means of specialised messaging middleware merchandise”. Storm relies on ZeroMQ in most cases for venture-to-undertaking verbal exchange in going for walks Storm topologies.

Q13. How Storm Ui Can Be Used In Topology?

Storm UI is used in tracking the topology. The Storm UI affords facts approximately errors going on in responsibilities and best-grained stats at the throughput and latency performance of each issue of each running topology.

Q14. What Does It Mean For A Message To Be?

A tuple coming off a spout can cause lots of tuples to be created primarily based on it. Consider.

As an instance:

the streaming phrase remember topology:TopologyBuilder builder = new TopologyBuilder();

builder.SetSpout("sentences", new KestrelSpout("kestrel.Backtype.Com",

22133,

"sentence_queue",

new StringScheme()));

builder.SetBolt("break up", new SplitSentence(), 10)

.ShuffleGrouping("sentences");

builder.SetBolt("matter", new WordCount(), 20)

.FieldsGrouping("split", new Fields("phrase"));

This topology reads sentences off a Kestrel queue, splits the sentences into its constituent phrases, after which emits for every phrase the range of times it has visible that phrase before. A tuple coming off the spout triggers many tuples being created based on it: a tuple for each word in the sentence and a tuple for the up to date remember for every phrase.

Storm considers a tuple coming off a spout “fully processed” when the tuple tree has been exhausted and every message within the tree has been processed. A tuple is considered failed whilst its tree of messages fails to be fully processed within a certain timeout. This timeout can be configured on a topology-precise basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.

Q15. Does Apache Include Any Sort Of Database Integration?

Apache is a Web (HTTP) server, now not an utility server. The base package does now not encompass this type of functionality. PHP venture and the mod_perl assignment permit you to work with databases from in the Apache surroundings.

Q16. Explain What Is Toplogy_message_timeout_secs In Apache Storm?

It is the maximum amount of time allocated to the topology to completely manner a message released by using a spout. If the message in not stated in given time frame, Apache Storm will fail the message at the spout.

Q17. What Is Mod_vhost_alias?

This module creates dynamically configured digital hosts, through permitting the IP address and/or the Host: header of the HTTP request to be used as part of the course call to determine what files to serve. This permits for clean use of a massive variety of digital hosts with similar configurations.

Q18. Mention The Difference Between Apache Kafka And Apache Storm?

Apache Kafka : It is a dispensed and strong messaging machine which can cope with massive quantity of statistics and lets in passage of messages from one cease-factor to every other. Kafka is designed to allow a unmarried cluster to function the crucial records backbone for a massive corporation. It may be elastically and trparently increased without downtime. Data streams are partitioned and unfold over a cluster of machines to permit information streams large than the functionality of any unmarried gadget and to permit clusters of coordinated customers.

Whereas.

Apache Storm : It is a actual time message processing machine, and you could edit or manage records in real-time. Storm pulls the records from Kafka and applies a few required manipulation. It makes it smooth to reliably manner unbounded streams of records, doing real-time processing what Hadoop did for batch processing. Storm is easy, can be used with any programming language, and is lots of amusing to use.

Q19. Explain When To Use Field Grouping In Storm? Is There Any Time-out Or Limit To Known Field Values?

Field grouping in typhoon uses a mod hash feature to determine which assignment to ship a tuple, ensuring which challenge can be processed in the correct order. For that, you don’t require any cache. So, there's no time-out or limit to recognised field values.

The move is partitioned via the fields designated within the grouping. For instance, if the movement is grouped by the “consumer-id” subject, tuples with the identical “consumer-id” will continually visit the equal undertaking, but tuples with different “user-identification”‘s may fit to one-of-a-kind tasks.

Q20. What Are The Common Configurations In Apache Storm?

There are a ramification of configurations you can set according to topology. A listing of all the configurations you may set can be located here. The ones prefixed with “TOPOLOGY” may be overridden on a topology-unique foundation (the opposite ones are cluster configurations and can not be overridden).

Here are some commonplace ones which might be set for a topology:

1   Config.TOPOLOGY_WORKERS : This units the wide variety of worker methods to use to execute the topology. For instance, if you set this to 25, there may be 25 Java processes throughout the cluster executing all of the responsibilities. If you had a combined 150 parallelism throughout all additives in the topology, every worker technique will have 6 responsibilities running inside it as threads.

2  Config.TOPOLOGY_ACKER_EXECUTORS : This sets the number of executors on the way to track tuple bushes and hit upon whilst a spout tuple has been completely processed By not placing this variable or putting it as null, Storm will set the variety of acker executors to be identical to the variety of employees configured for this topology. If this variable is set to 0, then Storm will right away ack tuples as soon as they arrive off the spout, efficiently disabling reliability.

3  Config.TOPOLOGY_MAX_SPOUT_PENDING : This sets the maximum wide variety of spout tuples that can be pending on a single spout venture straight away (pending me the tuple has not been acked or failed yet). It is relatively advocated you set this config to prevent queue explosion.

4  Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the most quantity of time a spout tuple needs to be fully finished earlier than it's miles considered failed. This price defaults to 30 seconds, which is enough for most topologies.

Five  Config.TOPOLOGY_SERIALIZATIONS : You can sign in extra serializers to Storm using this config so that you can use custom types within tuples.

Q21. Is It Necessary To Kill The Topology While Updating The Running Topology?

Yes, to update a walking topology, the simplest choice presently is to kill the contemporary topology and resubmit a brand new one. A deliberate feature is to put into effect a Storm change command that swaps a strolling topology with a new one, ensuring minimum downtime and no danger of each topologies processing tuples on the identical time.

Q22. Which Components Are Used For Stream Flow Of Data?

For streaming of records float, 3 additives are used:

Bolt :- Bolts represent the processing good judgment unit in Storm. One can make use of bolts to do any sort of processing which include filtering, aggregating, joining, interacting with statistics shops, speakme to external structures and many others. Bolts can also emit tuples (data messages) for the following bolts to process. Additionally, bolts are accountable to well known the processing of tuples after they may be executed processing.

Spout :- Spouts constitute the source of data in Storm. You can write spouts to study data from facts sources along with database, distributed record structures, messaging frameworks and so on. Spouts can broadly be classified into following –

Reliable – These spouts have the functionality to replay the tuples (a unit of facts in statistics stream). This helps packages acquire ‘at the least as soon as message processing’ semantic as in case of disasters, tuples may be replayed and processed once more. Spouts for fetching the statistics from messaging frameworks are normally reliable as these frameworks offer the mechanism to replay the messages.

Unreliable – These spouts don’t have the capability to replay the tuples. Once a tuple is emitted, it can't be replayed no matter whether or not it changed into processed successfully or no longer. This type of spouts observe ‘at maximum once message processing’ semantic.

Tuple :- The tuple is the primary records shape in Storm. A tuple is a named listing of values, where each price may be any kind. Tuples are dynamically typed — the varieties of the fields do now not need to be declared. Tuples have helper techniques like getInteger and getString to get subject values while not having to cast the result. Storm wishes to recognize the way to serialize all the values in a tuple. By default, Storm knows a way to serialize the primitive sorts, strings, and byte arrays. If you need to apply another type, you’ll want to put into effect and register a serializer for that type.

Q23. Tell Me Is Running Apache As A Root Is A Security Risk?

No. Root manner opens port 80, but by no means listens to it, so no user will really input the site with root rights. If you kill the basis technique, you'll see the other roots disappear as well.

Q24. How Can We Kill A Topology?

To kill a topology, really run:

typhoon kill stormname

Give the same name to hurricane kill as you used while submitting the topology.

Storm received’t kill the topology at once. Instead, it deactivates all of the spouts in order that they don’t emit any extra tuples, after which Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This offers the topology enough time to complete any tuples it was processing while it got killed.

Q25. What Is The Use Of Zookeeper In Storm?

Storm makes use of Zookeeper for coordinating the cluster. Zookeeper isn't always used for message passing, so the weight that Storm locations on Zookeeper is pretty low. Single node Zookeeper clusters need to be sufficient for most instances, however in case you want failover or are deploying large Storm clusters you can want larger Zookeeper clusters. Instructions for deploying Zookeeper are right here.

A few notes approximately Zookeeper deployment :

It’s essential that you run Zookeeper beneath supervision, because Zookeeper is fail-fast and will exit the process if it encounters any errors case. See right here for extra information.

It’s critical that you installation a cron to compact Zookeeper’s statistics and traction logs. The Zookeeper daemon does no longer do this on its personal, and in case you don’t set up a cron, Zookeeper will speedy run out of disk area.

Q26. Does Apache Act As A Proxy Server?

Yes, It acts as proxy also by using the mod_proxy module. This module implements a proxy, gateway or cache for Apache. It implements proxying functionality for AJP13 (Apache JServ Protocol model 1.Three), FTP, CONNECT (for SSL),HTTP/zero.9, HTTP/1.0, and (considering that Apache 1.Three.23) HTTP/1.@The module may be configured to connect with other proxy modules for these and different protocols.

Q27. What Is Multiviews?

MultiViews seek is enabled by the MultiViews Options. It is the overall call given to the Apache server’s capability to offer language-unique report variants in response to a request. This is documented pretty thoroughly inside the content material negotiation description web page. In addition, Apache Week carried a piece of writing in this situation entitled It then chooses the high-quality in shape to the purchaser’s necessities, and returns that report.




CFG