Exporting Files From HDFS To MySQL Using SQOOP

Apache Sqoop is a tool designed to efficiently transfer bulk data between Hadoop and structured data stores such as relational databases.

In this blog, we will discuss about using Sqoop to export data from HDFS to MySQL, with weblog entry as an example.

Getting Started

Before you proceed, we recommend you to go through the blogs mentioned below which discuss about importing and exporting data into HDFS using Hadoop shell commands:

33 Frequently used HDFS shell commands

Steps to Export Data from HDFS to MySQL

Follow the steps below to transfer data from HDFS to MySQL table:

Step1:

Create a new database in the MySQL instance.

CREATE DATABASE db1;

Step 2:

Create a table named acad.

USE db1;

CREATE TABLE acad (

emp_id int(2),

emp_name varchar(10),

emp_sal int(10),

date date);

 

The image below indicates that the table inside MySQL is empty.

Figure 1

Give a command, describe <table name>, to show the various fields and types of it.

This will help in comparing the type of data present inside HDFS which is ready to be mapped.

Figure 2

The files inside HDFS must have the same format as that of MySQL table, to enable the mapping of the data.

Refer the screenshot below to see two files which are ready to be mapped inside MySQL.

Figure 3

 

Step3:

Export the input.txt and input2.txt file from HDFS to MySQL

sqoop export -m 1 –connect jdbc:MySQL://localhost/db1 –username sqoop –password root –table acad –export-dir /sqoop_msql/

To obtain a filtered map, we can use the following option:

–input-fields-terminated-by ‘/t’ –MySQL-delmiters

Where ‘/t’ denotes tab.

Once the table inside MySQL and data inside HDFS is ready to be mapped, we can execute the export command. Refer the screenshot below:

Figure 4

Once you give the export command, the job completion statement should be displayed.

Figure 5

Note that only Map job needs to be completed. Other error messages will be displayed because of software version compatibility. These errors can be ignored.

How it Works

Sqoop calls the JDBC driver written in the –connect statement from the location where Sqoop is installed. The –username and –password options are used to authenticate the user and Sqoop, internally generates the same command against the MySQL instance.

The –table argument defines the MySQL table name, that will receive the data from HDFS. This table must be created prior to running the export command. Sqoop uses the number of columns, their types, and the metadata of the table to validate the data inserted from the HDFS directory. When the export statement is executed, it initiates and creates INSERT statements in MySQl. For example, the export-job will read each line of the input.txt file from HDFS and produces the following intermediate statements.

 

INSERT INTO acad VALUES (5,”HADOOP”,50000,’2011-03-21′);

INSERT INTO acad VALUES(6,”SPARK”,600000,’2011-03-22′);

INSERT INTO acad VALUES(7,”JAVA”,700000,’2011-03-23′);

By default, Sqoop export creates INSERT statements. If the –update-key argument is stated, UPDATE statements will be created instead.

The -m argument sets the number of map jobs for reading the file splits from HDFS. Each mapper will have its own connection to the MySQL Server.

Now, on querying inside MySQL, we see that all the data is mapped inside the table.

Figure 6

Hope this blog was useful in understanding the process of exporting data from HDFS to MySQL, using sqoop. For more technical blogs keep visiting.

Regards

 

Anand Pandey

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s