Transactions in Hive

In this blog post, we have explained about the row-level transactions available in Hive. This post will provide you a good idea of how to implement the row-level transactions on the Hive table.

Before beginning with the transactions in Hive, let’s look at the ACID properties, which are vital for any transaction.

What is ACID?

ACID stands for Atomicity, Consistency, Isolation, and Durability.

Atomicity means, a transaction should complete successfully or else it should fail completely i.e. it should not be left partially. Consistency ensures that any transaction will bring the database from one valid state to another state. Isolation states that every transaction should be independent of each other i.e. one transaction should not affect another. And Durability states that if a transaction is completed, it should be preserved in the database even if the machine state is lost or a system failure might occur.

These ACID properties are essential for a transaction and every transaction should ensure that these properties are met.

Transactions in Hive

Transactions in Hive are introduced in Hive 0.13, but they only partially fulfill the ACID properties like atomicity, consistency, durability, at the partition level. Here, Isolation can be provided by turning on one of the locking mechanisms available with zookeeper or in memory.

But in Hive 0.14, new API’s have been added to completely fulfill the ACID properties while performing any transaction.

Transactions are provided at the row-level in Hive 0.14. The different row-level transactions available in Hive 0.14 are as follows:

  1. Insert
  2. Delete
  3. Update

There are numerous limitations with the present transactions available in Hive 0.14. ORC is the file format supported by Hive transaction. It is now essential to have ORC file format for performing transactions in Hive. The table needs to be bucketed in order to support transactions.

Row-level Transactions Available in Hive 0.14

Let’s perform some row-level transactions available in Hive 0.14. Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.

The below properties needs to be set appropriately in hive shell , order-wise to work with transactions in Hive:

If the above properties are not set properly, the ‘Insert’ operation will work but ‘Update’ and ‘Delete’ will not work and you will receive the following error:

Creating a Table That Supports Hive Transactions

The above syntax will create a table with name ‘college’ and the columns present in the table are ‘clg_id, clg_name, clg_loc’. We are bucketing the table by ‘clg_id’ and the table format is ‘orc’, also we are enabling the transactions in the table by specifying it inside the TBLPROPERTIES as ‘transactional’=’true’

We have successfully created a table with name ‘college’ which supports row-level transactions of Hive.

The create table can be checked using the command show tables.

Inserting Data into a Hive Table

The above command is used to insert row wise data into the Hive table. Here, each row is seperated by ‘( )’ brackets.

Now, we have successfully inserted the data into the Hive table.

The contents of the table can be viewed using the command select * from college

From the above image, we can see that the data has been inserted successfully into the table.

Now if we try to re-insert the same data again, it will be appended to the previous data as shown below:

 

Updating the Data in Hive Table

The above command is used to update a row in Hive table.

From the above image, we can see that we have received an error message. This means that the Update command is not supported on the columns that are bucketed.

In this table, we have bucketed the ‘clg_id’ column and performing the Update operation on the same column, so we have go the error

FAILED: SemanticException[Error 10302]: Updating values of bucketing columns is not supported. Column clg_id

Now let’s perform the update operation on Non bucketed column

We have successfully updated the data.

The updated data can be checked using the command select * from college.

We can see that the data has been updated successfully.

Now let’s perform the Delete operation on the same table.

Deleting a Row from Hive Table

The above command will delete a single row in the Hive table.

We have now successfully deleted a row from the Hive table. This can be checked using the command select * from college.

We can see that there is no row with clg_id =1. This means that we have successfully deleted the row from the Hive table.

This is how the transactions or row-wise operations are performed in Hive.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s