Managed and External Tables in Hive

In this blog, we will be discussing the types of tables in Hive and the difference between them and how to create those tables and when to use those tables for a particular dataset.

Hive has two types of tables:

  • Managed table

  • External table

Let us see about the above tables in detail

Managed table

Managed table is also called as Internal table. This is the default table in Hive. When we create a table in Hive without specifying it as a managed or external, by default we will get a Managed table.

If we create a table as a managed table, the table will be created in a specific location in HDFS.

By default, the table data will be created in /usr/hive/warehouse directory of HDFS.

If we delete a Managed table, both the table data and metadata for that table will be deleted from the HDFS.

Let us create a managed table with the below command.

1
create table employee(Name String, Sal Int) row format delimited fields terminated by ‘,’;

hive > create table employee(Name String, Sal Int) row format delimited fields terminated by ‘,’;

We have successfully created the table and to check the details of the table type the below command:

hive > describe formatted table_name;

In the above image we can see MANAGED_TABLE as the entry for the option Table type which means that we have created a Managed table.

We will try to load one sample dataset which we have created into the table by using the below command:

1
load data local inpath ‘path of the file’ into table employee;

If we check in the hdfs location we can get the contents of the table.

Check the contents of the table in HDFS by using the below command:

hadoop dfs ls hdfs://localhost:9000/user/hive/warehouse/employee

Now let us delete the above created table by using the command

drop table employee;

We have successfully deleted the table.

Now let us try to check the contents of the table in HDFS using the below command:

hadoop dfs ls hdfs://localhost:9000/user/hive/warehouse/employee

In the above image, you can see that it is displaying like No such file or directory because both the table and its contents are deleted from the HDFS location.

Now let us create a table as External table.

EXTERNAL TABLE

External table is created for external use as when the data is used outside Hive. Whenever we want to delete the table’s metadata and want to keep the table’s data as it is, we use External table. External table only deletes the schema of the table.

Let us create an external table by using the below command:

create external table employee(Name String, Sal Int) row format delimited fields terminated by ‘,’;

We have now successfully created the external table.

Let us check the details regarding the table using the below command:

describe formatted employee;

In the above image we can see the EXTERNAL_TABLE as the entry for the option Table type which says that the above table is an External table.

Now let us load some data into the table using the below command:

load data local inpath ‘path of the file’ into table employee;

We have successfully loaded data into the Hive table.

Let us check the contents in HDFS by using the below command:

hadoop dfs ls hdfs://localhost:9000/user/hive/warehouse/employee

In the above image we can see the contents of the External table inside the HDFS.

Let’s now delete the table using the below command:

drop table employee;

We have successfully deleted the table.

Now let us check the HDFS location of the table using the below command:

hadoop dfs ls hdfs://localhost:9000/user/hive/warehouse/employee

You can see that the contents of the table are still present in the HDFS location.

If we create an External table, after deleting the table only the metadata related to table is deleted but not the contents of the table.

When to use External and Managed table

Managed table

  • Data is temporary

  • Hive to Manage the table data completely not allowing any external source to use the table

  • Don’t want data after deletion

External table

  • The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files

  • Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things

  • You are not creating table based on existing table (AS SELECT)

  • Can create table back and with the same schema and point the location of the data

We hope this blog helped you in learning the importance of Managed and External tables in Hive and when to use those tables with particular data.

Regards

Anand Pandey

Beginner’s Guide for Hive

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s