YouTube Icon

Interview Questions.

Top 100+ Parallel Computing Interview Questions And Answers - Jun 01, 2020

fluid

Top 100+ Parallel Computing Interview Questions And Answers

Question 1. What Is Shared-memory Architecture?

Answer :

A unmarried deal with space is visible to all execution threads.

Question 2. What Is Numa Memory Architecture?

Answer :

NUMA stands for Non-Uniform memory get admission to and is a special type of shared reminiscence architecture in which get admission to times to exceptional memory locations with the aid of a processor may additionally range as may additionally access instances to the same memory region by means of unique processors.

C++ Interview Questions
Question three. Name Some Network Architectures Prevalent In Machines Supporting The Message Passing Paradigm?

Answer :

Ethernet, Infiniband, Tree.

Question 4. What Is Data-parallel Computation?

Answer :

Data is partitioned across parallel execution threads, every of which carry out a few computation on its partition – usually independent of other threads.

C++ Tutorial
Question 5. What Is Task-parallel Computation?

Answer :

The parallelism manifests across functions. A set of features want to compute, which may additionally or may not have order constraints amongst them.

C preprocessor Interview Questions
Question 6. What Is Task-latency?

Answer :

The time taken for a mission to complete given that a request for it is made.

Question 7. What Is Task-throughput?

Answer :

The wide variety of responsibilities finished in a given time

Parallel Algorithm Tutorial Basic C Interview Questions
Question eight. What Is Speed-up?

Answer :

The ratio of some performance metric (like latency) received using a unmarried processor with that received the use of a hard and fast of parallel processors.

Question nine. What Is Parallel Efficiency?

Answer :

The Speed-up in step with processor

IBM DataPower Interview Questions
Question 10. What Is An Inherently Sequential Task?

Answer :

On whose maximum speed-up (the use of any number of processors) is 1.

Parallel Computer Architecture Tutorial
Question eleven. What Is The Maximum Time Speed-up Possible According To Amdahl's Law?

Answer :

1/f, where f is inherently sequential fraction of the time taken by the fine sequential execution of the project.

Base Sas Interview Questions
Question 12. What Is Simd?

Answer :

A class belonging to Flynn’s taxonomy of parallel architectures, it stands for unmarried education a couple of records. In this structure, one-of-a-kind processing elements all execute the same coaching in a given clock cycle, with the respective facts (e.G., in registers) being impartial of every different.

C++ Interview Questions
Question thirteen. What Is Cache Coherence?

Answer :

Different processors may preserve their personal local caches. This effects in probably more than one copies of the same information. Coherence means that get entry to to the neighborhood copies behave further to access from the neighborhood replica – aside from the time to get right of entry to.

Question 14. What Is A Hypercube Connection?

Answer :

A unmarried node is a hypercube. An n node hypercube is product of two n/2 node hypercube, with their corresponding nodes connected to every other.

Question 15. What Is The Diameter Of An N-node Hypercube?

Answer :

log n. The diameter is the minimal number of links required to reach  furthest nodes.

Parallels Workstation for Windows and Linux Interview Questions
Question 16. How Does Openmp Provide A Shared-reminiscence Programming Environment.?

Answer :

OpenMP makes use of pragmas to control automated advent of threads. Since the thread share the address space, they proportion memory. However, they're allowed a neighborhood view of the shared variables via “non-public” variables. The compiler allocates a variable-reproduction for every thread and optionally initializes them with the unique variable. Within the thread the references to personal variable are statically changed to the brand new variables.

Question 17. What Is The Memory Consistency Model Supported By Openmp?

Answer :

There is not any “guaranteed” sharing/consistency of shared variables until a flush is referred to as. Flush units that overlap are sequentially constant and the writes of a variable turn out to be seen to each different thread at the point flush is serialized. This is barely weaker than “vulnerable consistency.”

Ibm Websphere Process Server Interview Questions
Question 18. How Are Threads Allocated To Processors When There Are More Threads Than The Number Of Processors?

Answer :

Once a thread is finished on a center, a brand new thread is administered on it. The order can be managed the usage of the “Schedule” clause.

C preprocessor Interview Questions
Question 19. What Is Common Crcw Pram?

Answer :

Parallel Random Access Model of Computation in which the processors can write to a common reminiscence deal with in the equal step, so long as they're all writing the same price.

Question 20. What Is The Impact Of Limiting Pram Model To A Fixed Number Of Processors Or A Fixed Memory Size?

Answer :

Prams with better capacities may be simulated may be simulated (with linear slowdown).

Embedded Processors Interview Questions
Question 21. What Is The Impact Of Eliminating Shared Write From Pram?

Answer :

It can be simulated by group pram with a log n thing in the time. However, the algorithms on this model can end up a touch complicated, as they have to make sure battle loose writes.

Question 22. What Is The Significance Of Work Complexity Analysis?

Answer :

Time complexity does no longer account for the dimensions of the gadget. Work complexity is greater reflective of realistic performance. Work-time scheduling precept describes the predicted time for a p processor pram as work/p.

Question 23. What Does Bulk Synchronous Model Add To Pram For Parallel Algorithm Analysis?

Answer :

Pram assumes regular time get admission to to shared memory, that's unrealistic. Bsp counts time in "message verbal exchange" and in this version a step isn't initiated till the input information has arrived.

Parallel Computer Architecture Interview Questions
Question 24. Is It True That All Nc Problems Parallelize Well?

Answer :

In general NC issues do parallelize properly in phrases of getting a poly-log solution in pram version while it best has a high-quality log answer in ram model. However, for issues with poly-log solution in ram models, there won't be an effective speed-up.

Basic C Interview Questions
Question 25. Is User Locking Required To Control The Order Of Access To Guarantee Sequential Consistency?

Answer :

Sequential consistency is unbiased of person locking but does require delaying of reminiscence operations on the machine level. Precise ordering of operations want not be pre-ordained via this system good judgment. There just ought to exist a international ordering that is constant with the neighborhood view determined with the aid of every processor.

Question 26. What Is The Difference Between Processor And Fifo Consistency?

Answer :

In FIFO consistency only writes from a unmarried processor are visible within the order issued. In processor consistency, additionally there exists a worldwide ordering of writes to any deal with x by different methods exists that is consistent with the nearby views.

Back Office Interview Questions
Question 27. What Is False Sharing?

Answer :

Sharing of a cache line with the aid of distinct variables. As a end result, overall performance troubles come into play. If such variables aren't accessed together, the un-accessed variable is unnecessarily added into cache along side the accessed variable.

IBM DataPower Interview Questions
Question 28. What Is A Task Dependency Graph?

Answer :

A directed graph with nodes representing obligations and area from assignment a to b indicating that challenge b can simplest begin after mission a is finished.

Question 29. When Can An Mpi Send Call Return?

Answer :

If it's far a synchronous call, it may return best while the pairing name on another procedure is ready. For asynchronous variations, it is able to return as soon as the provided buffer is prepared for re-use.

Question 30. What Is A Collective Communication Call?

Answer :

It's a call that should be made in any respect members of the communication organization. No call can go back until all calls had been as a minimum been made.

Question 31. How Cam Mpi Be Used For Shared Memory Style Programming?

Answer :

Each method registers its nearby reminiscence and attaches it to a "window." Accesses via this window get translated to ship or fetch requests to the desired member of the institution. The pairing communique is handled by way of the MPI device asynchronously.

Question 32. What Is The Complexity Of Prefix Sum In Pram Model?

Answer :

Time O(log n) and paintings O(n)

Question 33. What Is The Time Complexity Of Optimal Merge Algorithm (on Pram)?

Answer :

O(log log n) via first merging sub-sequences of the unique lists of size n/(log log n) each. The remaining elements are inserted into the simply computed collection inside the subsequent step.

Base Sas Interview Questions
Question 34. What Is Accelerated Cascading?

Answer :

The elevated cascading method combines a fast however paintings-inefficient algorithm with a work premier one. The trouble is recursively divided into many smaller sub-troubles, which might be first solved solved the usage of the superior algorithm. The sub-outcomes are then mixed with the faster model of the set of rules.

Question 35. Why Must Cuda Divide Computation Twice: Into Grids And Then Blocks?

Answer :

The hardware is primarily based on maximizing throughput. This has been carried out by way of permitting a big variety of running threads -- all with a live context. This means that most effective a set variety of threads can fit inside the hardware. This in turn way that those threads can't talk with or depend upon other thread that couldn't be match and as a result ought to await the primary set of threads to complete execution. Hence, a two level decomposition. Further, even the set of threads jogging collectively can also execute at distinct SMs, and synchronization across SMs might be gradual and laborious and subsequently no longer supported.

Question 36. How Do Memory Operations In Gpus Differ From Those In Cpus?

Answer :

GPUs have a substantially smaller cache making average latency of reminiscence operations a lot better. This calls for many concurrent threads to hid the latency. Also, the shared memory can be used as an opaque cache in direct manage of the programmer -- making it possible to utilize the cache better in some conditions. Further, because of SIMD warp commands, multiple reminiscence accesses are made in step with coaching. These accesses can be coalesced right into a smaller variety of actual accesses, if the cope with set is contiguous for global reminiscence or strided for shared reminiscence.

Parallels Workstation for Windows and Linux Interview Questions
Question 37. How Can Two Gpu Threads Communicate Through Shared Memory?

Answer :

If the threads belong to a non-divergent warp, writes earlier than reads are visible to the study. Two threads inside the equal block should have an intervening sync for the write to have an effect on the read. Two thread in distinct blocks within the identical kernel cannot be assured an order and the study ought to be moved to a later kernel for the write to emerge as visible.

Question 38. How Can Prefix Minima Be Found In O(1) Time?

Answer :

This may be computed by first locating all nearest smaller values first in O(1) after which checking in O(1) time for every detail (using O(n) processor for that detail), that biggest index smaller than its personal, whose detail has no nearest smaller price on its left. The paintings complexity of O(n2) can be progressed using multiplied cascading.

Question 39. How Long Does Bitonic Sorting Require On Pram?

Answer :

O(log2n)

Question 40. How Long Does Batcher’s Odd-even Merge Require?

Answer :

O(log n) time, O(n log n) paintings

Ibm Websphere Process Server Interview Questions
Question 41. In Order To Balance Load For Parallel Bucket Sort Of N Elements, Uniformly Spaced Splitters Need To Be Selected. This Can Be Done By First Dividing The List Into B Lists And Choosing B Equi-spaced Samples From Each. The Final B Splitters Are Chosen Uniformly Spaced From These Samples. How Balanced Are The Buckets If These Splitters Are Used?

Answer :

No bucket will comprise more than 2n/B factors.

Question forty two. How Fast Can Two Sorted Lists Of Size N Each Be Merged Into One Using P Processors?

Answer :

O(n/p) time using surest multi-way merge.

Embedded Processors Interview Questions
Question forty three. How Fast Can A List Be Sorted Using N Processors Using Local Sorting Of N/p Elements Each Followed By Optimal Multi-manner Merge?

Answer :

O(n/p log n)

Question 44. When Stealing Load From A Random Loaded Processor, What Type Of Synchronization Is Needed?

Answer :

One desires to make certain that the queue being stolen from is operated in a synchronized style – both locked or edited in a lock-loose way.

Question forty five. How Can One Ensure Mutual Exclusion Without Locks?

Answer :

When references of  (or extra) threads (or tactics) may be serialized with recognize to a variable, gadget primitives like compare and change can help discover the warfare with some other thread. Lock unfastened implementations of a thread generally locate the warfare atomically (e.G., the usage of compare and change) and one succeeds at the same time as the alternative backs off and retries.

Question 46. How Long Does The Parallel Version Of Prim’s Minimum Spanning Tree Finding Algorithm Require For A Graph With N Nodes Using P Processors?

Answer :

O(n2/p + n log p)




CFG