HBase is the open-source implementation of Google’s Big Table, with slight modifications. HBase was created in 2007 and was initially a part of contributions to Hadoop which later became a top-level Apache project. It is a distributed column-oriented key value database built on top of the Hadoop file system and is horizontally scalable which means that we can add the new nodes to Hbase when data grows. But, how is the data Read or Written to the tables? This is the topic of discussion in this blog.
Readers are requested to go through few blogs which might help get a clear understanding of HBase. Hbase Major Components.
Two major components which play a vital role in data read and write are, HFile and META Table.
It is the basic level HBase architecture where the tables exist in physical form. It is important to understand this component since Read and Write take place here.
The key features of HFile are:
- Row key is primary identifier.
- Keys are stored in lexicographical order.
- According to this order, data is stored and split across the nodes.
- HFile is allocated to 1 region
- HFiles store the rows in sorted by KeyValues on disk.
- When the MemStore accumulates data more than its limit, the entire sorted set is written to a new HFile in HDFS.
- HBase uses multiple HFiles per column family, containing the actual cells, or KeyValue instances.
- The highest sequence number is stored as a meta field in each HFile, to a better state where it has ended previously and where to continue next.
- HFile contains a multi-layered index which allows HBase to search the data without having to read the whole file.
- HDFS replicates the WAL and HFile blocks.
- HFile block replication happens automatically.
- IO in HBase happens at HFile block level which is 64KB by default.
- One HFile can contain only one column family. What if data grows in a particular HFile? Different HFile is created with same column family and writing data is continued.
- No two column families can exist in single HFile.
Integrating HFile component to have HRegion is controlled by HRegion Server.
Another component is META Table.This is majorly used in Read operation, as Read operation needs to know which HRegion server has to be accessed for reading actual data.
Also, after every Write process, this table is updated so that for the next Read, the table will have the updated data.
- This META table is a HBase table that keeps a list of all regions in the system.
- The .META. table is like a binary tree.
- The .META. table structure is as follows:
– Key: region start key, region id
– Values: RegionServer
When the client gives a command to Write, the following steps occur:
- Instruction is directed to Write Ahead Log and first, writes important logs to it. Although it is not the area where the data is stored, it is done for the fault tolerant purpose. So, later if any error occurs while writing data, HBase always has WAL to look into.
- Once the log entry is done, the data to be written is forwarded to MemStore which is actually the RAM of the data node. All the data is written in MemStore which is faster than RDBMS (Relational databases).
- Later, the data is dumped in HFile, where the actual data is stored in HDFS. If the MemCache is full, the data is stored in HFile directly.
- Once writing data is completed, ACK (Acknowledgement) is sent to the client as a confirmation of task completed.
Read process starts when a client sends a request to Hbase. A request is sent to zookeeper which keeps all the status for the distributed system, where HBase is also present. Refer the figure above.
- Zookeeper has the location for META table which is present in HRegion Server. When a client requests zookeeper, it gives the address for the table (1).
- The process continues to HRegionServer and gets to META table, where it gets the region address of table where the data is present to be read (2).
- Moving forward to a specific HRegion, the process enters the BlockCache where data is present from the previous read. If a user queries the same records, the client will get the same data in no time. If the table is found, the process returns to the client with the data as result (3).
- If the table is not found, the process starts to search MemStore since data would have been written to HFile sometime back. If it is found, the process returns to the client with the data as result (4).
- If the table is not found, the process moves forward in search of data within the HFile. The data will be located here and once the search is completed, the process takes required data and moves forward (5).
- The data taken from HFile is the latest read data and can be read by the user again. Hence the data is written in BlockCache, so that the next time, it can be instantly accessed by the client (6). This is the benefit of step 6; the read process can be completed just after step 3 the next time for the same data because of this read procedure of Hbase.
- When the data is written in BlockCache and all the search is completed, the read process with required data will be returned to the client along with ACK(Acknowledgment) (7).
This is how Hbase performs Read and Write operations internally.