HBase Interview Questions and Answers
Q1. Compare HBase & Cassandra.
Basis for the cluster
Q2. What is Apache HBase?
Ans: It is a column-oriented database which is used to shop the sparse data units. It is administered on the top of Hadoop file distributed device. Apache HBase is a database that runs on a Hadoop cluster. Clients can access HBase facts thru both a native Java API or through a Thrift or REST gateway, making it on hand by means of any language. Some of the key houses of HBase consist of:
NoSQL: HBase is not a traditional relational database (RDBMS). HBase relaxes the ACID (Atomicity, Consistency, Isolation, Durability) properties of conventional RDBMS systems a good way to attain a lot more scalability. Data stored in HBase additionally does not need to healthy into a rigid schema like with an RDBMS, making it perfect for storing unstructured or semi-established facts.
Wide-Column: HBase stores records in a table-like layout with the capacity to save billions of rows with thousands and thousands of columns. Columns may be grouped collectively in “column families” which allows bodily distribution of row values onto one-of-a-kind cluster nodes.
Distributed and Scalable: HBase group rows into “regions” which outline how desk information is cut up over multiple nodes in a cluster. If a area gets too massive, it's miles mechanically cut up to percentage the weight across extra servers.
Consistent: HBase is architected to have “strongly-steady” reads and writes, in place of different NoSQL databases which might be “sooner or later consistent”. This approach that when a write has been executed, all examine requests for that records will return the identical value.
Read this blog, to research more about Apache HBase.
Q3. Give the call of the important thing additives of HBase.
Ans: The key components of HBase are Zookeeper, RegionServer, Region, Catalog Tables and HBase Master.
Q4.What is S3?
Ans: S3 stands for easy storage carrier and it is a one of the report system utilized by hbase.
Q5. What is the use of get() technique?
Ans: get() technique is used to examine the information from the table.
Q6. What is the cause of the use of HBase?
Ans: HBase is used as it presents random examine and write operations and it could perform a number of operation according to 2d on a massive facts sets.
Interested in Learning HBase? Click Here.
Q7. In how many modes HBase can run?
Ans: There are two run modes of HBase i.E. Standalone and distributed.
Q8. Define the distinction between hive and HBase?
Ans: HBase is used to assist report degree operations but hive does now not guide record stage operations.
Q9. Define column households?
Ans: It is a group of columns while row is a group of column households.
Q10. Define standalone mode in HBase?
Ans: It is a default mode of HBase. In standalone mode, HBase does not use HDFS—it makes use of the local filesystem instead—and it runs all HBase daemons and a nearby ZooKeeper within the same JVM method.
Q11.What is redecorating Filters?
Ans: It is beneficial to modify, or enlarge, the conduct of a filter out to gain additional manage over the returned records.
Q12. What is the whole shape of YCSB?
Ans: YCSB stands for Yahoo! Cloud Serving Benchmark.
Q13. What is the usage of YCSB?
Ans: It may be used to run similar workloads against one-of-a-kind garage structures.
Q14. Which operating device is supported through HBase?
Ans: HBase supports the ones OS which helps java like windows, Linux.
Q15. What is the most commonplace file system of HBase?
Ans: The maximum commonplace file machine of HBase is HDFS i.E. Hadoop Distributed File System.
Q16. Define Pseudodistributed mode?
Ans: A pseudodistributed mode is simply a distributed mode this is run on a unmarried host.
Q17. What is regionserver?
Ans: It is a document which lists the regarded place server names.
Q18. Define MapReduce.
Ans: MapReduce as a system become designed to remedy the hassle of processing in extra of terabytes of information in a scalable way.
Q19. What are the operational instructions of HBase?
Ans: Operational instructions of HBase are Get, Delete, Put, Increment, and Scan.
Q20. Which code is used to open the relationship in Hbase?
Ans: Following code is used to open a connection:
Configuration myConf = HBaseConfiguration.Create();
HTableInterface usersTable = new HTable(myConf, “users”);
Q21. Which command is used to show the model?
Ans: Version command is used to show the model of HBase.
Syntax – hbase> version
Q22.What is locate of gear command?
Ans: This command is used to list the HBase surgical procedure equipment.
Q23. What is using shutdown command?
Ans: It is used to shut down the cluster.
Q24. What is the usage of truncate command?
Ans: It is used to disable, recreate and drop the required tables.
Q25. Which command is used to run HBase Shell?
Ans: $ ./bin/hbase shell command is used to run the HBase shell.
Q26.Which command is used to reveal the present day HBase user?
Ans: The whoami command is used to show HBase person.
Q27. How to delete the table with the shell?
Ans: To delete desk first disable it then delete it.
Q28. What is find of InputFormat in MapReducr manner?
Ans: InputFormat the enter information, after which it returns a RecordReader instance that defines the training of the key and fee gadgets, and affords a next() technique that is used to iterate over every enter report.
Q29. What is the full shape of MSLAB?
Ans: MSLAB stands for Memstore-Local Allocation Buffer.
Q30. Define LZO?
Ans: Lempel-Ziv-Oberhumer (LZO) is a lossless data compression algorithm that is targeted on decompression pace, and written in ANSIC.
Q31. What is HBaseFsck?
Ans: HBase comes with a tool referred to as hbck that's implemented by means of the HBaseFsck elegance. It affords numerous command-line switches that have an impact on its conduct.
Q32. What is REST?
Ans: Rest stands for Representational State Transfer which defines the semantics in order that the protocol may be utilized in a established way to address far flung sources. It also affords help for one-of-a-kind message codecs, presenting many choices for a patron application to communicate with the server.
Q33. Define Thrift?
Ans: Apache Thrift is written in C++, but affords schema compilers for many programming languages, together with Java, C++, Perl, PHP, Python, Ruby, and extra.
Q34. What are the essential key structures of HBase?
Ans: The fundamental key structures of HBase are row key and column key.
Q35. What is JMX?
Ans: The Java Management Extensions generation is the usual for Java programs to export their fame.
Q36. What is nagios?
Ans: Nagios is a completely typically used guide device for gaining qualitative records regarding cluster popularity. It polls modern-day metrics on a regular foundation and compares them with given thresholds.
Q37. What is the syntax of describe Command?
Ans: The syntax of describe command is –hbase> describe tablename
Q38. What the use is of exists command?
Ans: The exists command is used to test that the required desk is exists or now not.
Q39. What is using MasterServer?
Ans: MasterServer is used to assign a location to the area server and additionally cope with the load balancing.
Q40. What is HBase Shell?
Ans: HBase shell is a java API by means of which we talk with HBase.
Check Here, to research extra about operations the use of HBase .
Q41. What is the usage of ZooKeeper?
Ans: The zookeeper is used to maintain the configuration records and communique between region servers and customers. It also gives allotted synchronization.
Q42. Define catalog tables in HBase?
Ans: Catalog tables are used to maintain the metadata information.
Q43. Define cell in HBase?
Ans: The cell is the smallest unit of HBase desk which shops the facts within the shape of a tuple.
Q44. Define compaction in HBase?
Ans: Compaction is a method which is used to merge the Hfiles into the only file and after the merging file is created and then vintage document is deleted.
Q45. What is using HColumnDescriptor magnificence?
Ans: HColumnDescriptor stores the records about a column circle of relatives like compression settings , Number of versions and many others.
Q46. What is the characteristic of HMaster?
Ans: It is a MasterServer that is liable for tracking all regionserver instances in a cluster.
Q47. How many compaction sorts are in HBase?
Ans: There are kinds of Compaction i.E. Minor Compaction and Major Compaction.
Q48. Define HRegionServer in HBase
Ans: It is a RegionServer implementation which is liable for coping with and serving areas.
Q49. Which clear out accepts the pagesize as the parameter in HBase?
Ans: PageFilter accepts the pagesize because the parameter.
Q50. Which approach is used to get right of entry to HFile immediately with out using HBase?
Ans: HFile.Essential() method used to get entry to HFile at once with out the use of HBase.
Q51. Which kind of statistics HBase can store?
Ans: HBase can store any kind of records that may be transformed into the bytes.
Q52. What is the use of Apache HBase?
Ans: Apache HBase is used when you need random, realtime read/write access in your Big Data. This undertaking’s aim is the web hosting of very huge tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data via Chang et al. Just as Bigtable leverages the allotted statistics garage furnished by using the Google File System, Apache HBase offers Bigtable-like competencies on pinnacle of Hadoop and HDFS.
Learn extra approximately HBase Online Training Course for a better career.
Q53. What are the capabilities of Apache HBase?
Linear and modular scalability.
Strictly regular reads and writes.
Automatic and configurable sharding of tables
Automatic failover aid among RegionServers.
Convenient base lessons for backing Hadoop MapReduce jobs with Apache HBase tables.
Easy to use Java API for customer get admission to.
Block cache and Bloom Filters for actual-time queries.
Query predicate push down via server aspect Filters
Thrift gateway and an REST-ful Web carrier that supports XML, Protobuf, and binary facts encoding options
Extensible JRuby-primarily based (JIRB) shell
Support for exporting metrics thru the Hadoop metrics subsystem to files or Ganglia; or thru JMX
Q54. How do I improve Maven-managed initiatives from HBase 0.94 to HBase zero.96+?
Ans: In HBase 0.96, the project moved to a modular shape. Adjust your assignment’s dependencies to depend on the HBase-client module or any other module as suitable, instead of a unmarried JAR. You can version your Maven depency after one of the following, relying in your centered model of HBase. See Section three.Five, “Upgrading from zero.Ninety four.X to 0.96.X” or Section 3.Three, “Upgrading from 0.96.X to zero.Ninety eight.X” for extra information.
Maven Dependency for HBase zero.Ninety eight
Maven Dependency for HBase zero.Ninety six
Maven Dependency for HBase 0.94
Q55. How should I layout my schema in HBase?
Ans: HBase schemas can be created or updated the use of ‘The Apache HBase Shell’ or through using ‘Admin within the Java API’.
Tables have to be disabled when making ColumnFamily changes, for instance:
Configuration config = HBaseConfiguration.Create();
Admin admin = new Admin(conf);
String desk = “myTable”;
HColumnDescriptor cf1 = …;
admin.AddColumn(desk, cf1); // including new ColumnFamily
HColumnDescriptor cf2 = …;
admin.ModifyColumn(table, cf2); // modifying current ColumnFamily
Q56. What is the Hierarchy of Tables in Apache HBase?
Ans: The hierarchy for tables in HBase is as follows:
When a desk is created, one or greater column families are defined as excessive-stage categories for storing records similar to an access within the desk. As is suggested via HBase being “column-oriented”, column family data for all desk entries, or rows, are stored together. For a given (row, column own family) mixture, more than one columns may be written at the time the facts is written. Therefore, two rows in an HBase table need now not necessarily percentage the equal columns, handiest column households. For every (row, column-family, column) combination HBase can save more than one cells, with each cell related to a version, or timestamp corresponding to while the statistics was written. HBase customers can pick out to simplest examine the maximum recent version of a given mobile, or examine all variations.
Q57. How can I troubleshoot my HBase cluster?
Ans: Always start with the master log (TODO: Which traces?). Normally it’s simply printing the identical strains again and again again. If no longer, then there’s an trouble. Google or search-hadoop.Com must go back a few hits for the ones exceptions you’re seeing.
An mistakes not often comes by myself in Apache HBase, generally when some thing receives screwed up what will observe can be masses of exceptions and stack traces coming from all over the region. The pleasant manner to approach this kind of problem is to walk the log up to in which it all began, for instance, one trick with RegionServers is that they may print some metrics whilst aborting so grapping for Dump have to get you across the start of the trouble.
RegionServer suicides are ‘everyday’, as this is what they do whilst something goes wrong. For instance, if ulimit and max transfer threads (the 2 most crucial preliminary settings, see [ulimit] and dfs.Datanode.Max.Switch.Threads) aren’t changed, it's going to make it not possible in some unspecified time in the future for DataNodes to create new threads that from the HBase factor of view is visible as though HDFS become long gone. Think about what could happen in case your MySQL database was suddenly not able to get right of entry to documents in your neighborhood document system, nicely it’s the same with HBase and HDFS.
Another very common purpose to peer RegionServers committing seppuku is after they enter extended garbage series pauses that closing longer than the default ZooKeeper consultation timeout. For greater records on GC pauses, see the three element blog publish by using Todd Lipcon and Long GC pauses above.
Interested in mastering HBase? Click right here
Q58. Compare HBase with Cassandra?
Ans: Both Cassandra and HBase are NoSQL databases, a time period for which you could locate severa definitions. Generally, it approach you can't control the database with SQL. However, Cassandra has implemented CQL (Cassandra Query Language), the syntax of that's obviously modeled after SQL.
Both are designed to control extraordinarily massive statistics sets. HBase documentation announces that an HBase database should have loads of tens of millions or — even better — billions of rows. Anything much less, and you’re advised to stay with an RDBMS.
Both are disbursed databases, now not handiest in how statistics is saved but additionally in how the facts can be accessed. Clients can hook up with any node within the cluster and get admission to any facts.
In each Cassandra and HBase, the primary index is the row key, however information is stored on disk such that column circle of relatives individuals are stored in close proximity to each other. It is, therefore, essential to cautiously plan the employer of column families. To hold question performance excessive, columns with similar get admission to patterns should be positioned within the equal column circle of relatives. Cassandra helps you to create extra, secondary indexes on column values. This can improve statistics access in columns whose values have a high stage of repetition — including a column that stores the state field of a consumer’s mailing address.
HBase lacks built-in guide for secondary indexes however offers some of mechanisms that offer secondary index capability. These are described in HBase’s online reference guide and on HBase community blogs.
Q59. Compare HBase with Hive?
Ans: Hive can assist the SQL savvy to run MapReduce jobs. Since its JDBC compliant, it also integrates with existing SQL-based totally equipment. Running Hive queries could take a while considering that they cross over all the information within the desk by default. Nonetheless, the quantity of information may be restricted via Hive’s partitioning function. Partitioning permits going for walks a clear out question over statistics this is saved in separate folders, and only examine the records which fits the question. It might be used, as an example, to only process documents created between positive dates, if the files include the date format as a part of their name.
HBase works by storing records as key/price. It supports 4 number one operations: put to add or replace rows, experiment to retrieve quite a number cells, get to return cells for a specific row, and delete to eliminate rows, columns or column versions from the table. Versioning is available in order that preceding values of the information can be fetched (the history may be deleted every now and then to clear area through HBase compactions). Although HBase consists of tables, a schema is simplest required for tables and column households, but now not for columns, and it includes increment/counter functionality.
Hive and HBase are different Hadoop-based technologies – Hive is an SQL-like engine that runs MapReduce jobs, and HBase is a NoSQL key/fee database on Hadoop. But whats up, why not use them both? Just like Google may be used for seek and Facebook for social networking, Hive may be used for analytical queries at the same time as HBase for real-time querying. Data can even be read and written from Hive to HBase and returned again.
Q60. What version of Hadoop do I want to run HBase?
Ans: Different variations of HBase require distinctive versions of Hadoop. Consult the table underneath to find which version of Hadoop you will need:
HBase Release Number Hadoop Release Number
0.Ninety.Four (current stable) ???
Releases of Hadoop may be observed right here. We advocate the usage of the maximum recent model of Hadoop viable, as it will incorporate the maximum worm fixes. Note that HBase-zero.2.X can be made to work on Hadoop-zero.18.X. HBase-0.2.X ships with Hadoop-zero.17.X, so to apply Hadoop-0.18.X you should recompile Hadoop-0.18.X, remove the Hadoop-zero.17.X jars from HBase, and replace them with the jars from Hadoop-0.18.X.
Also word that once HBase-0.2.X, the HBase launch numbering schema will change to align with the Hadoop release range on which it depends.