Beginner’s Guide for Sqoop

In this blog we will be discussing about the basics of Sqoop. You will also learn how to import data from RDBMS to HDFS and to export data from HDFS into RDBMS using Sqoop.

Note: We hope that Sqoop is already installed in your system and all the required connectors and jar files are imported into lib folder of your Sqoop.

Importing a Table from RDBMS to HDFS

Here we are using a sample MySQL data and importing it into HDFS using Sqoop. Let us see how to create a table in MySQL.

  • Login to your MySQL shell.
  • Create a database by using the below command:

  • To work in the created database, use the following command:

Now we are in Acadgild database.

  • Create a table employee by using the below command:

A table has been created with name employee and with the columnsemp_id,emp_name,emp_sal. The scheme of this table can be checked using the following command:

 

  • Let’s now try inserting some sample data into the created table by using the below command:

  • We can check the inserted using this command: select * from employee;

We have successfully created a table in MySQL, and we will now import the same into HDFS by using Sqoop.

The following command can be used to import the table into HDFS.

Syntax:

Here we are connecting to MySQL through JDBC connectors and using the database Acadgild.Here it is necessary to specify the MySQL ‘s username and password and the table name.

In our case, the command will be as shown below:

Here ‘-m’ specifies the number of map task that can be run simultaneously and ‘m1’ means that only one map task can run.

Note: If we do not use -m1 at the end of the statement, for each record in the MySQL table we will get separate files in the HDFS.

Now the data in RDBMS has been successfully imported into HDFS. By default, the files will be stored here: /user/$user_name/table_name/part-m-00000 file.

Refer to the below image for the same.

Importing all the Tables in a Database to HDFS

We can import all the tables present in a database by specifying import-all-tables as shown in the below command:

When we specify import-all-tables in the command, Sqoop will internaly call a ‘import all tables’ tool. This tool will import all the tables into the HDFS so that we can specify the directory of our own by using —target-dir.

While using the above command, for every table a MapReduce job will be launched and each table will get imported into HDFS sequentially.

We now have two tables in our MySQL database.

When we check the /user/$User_name/ directory of HDFS, we can see that the two tables have been successfully imported. You can refer the below screen shot for the same.

.

Importing table data into a Specific Directory in HDFS

We can also import data to a specific directory by specifying it in the command as –target-dir as shown in the below command:

Importing Specific Table Data into HDFS

We can also import specific data in RDBMS table to HDFS, by specifying the Where Class in Sqoop command as shown below:

We can check the contents of imported table data by using the command hadoop fs -cat ‘/path of the file’. You can refer the below screen shot for the same.

Importing Table as a Sequence File into HDFS

It is possible to store the RDBMS data as a sequence file, by specifying —as-sequencefile as specified in the below command:

As the sequence file stores the contents in binary format, we will get the binary output as shown in the below screen shot.

We have now successfully stored the RDBMS data in a sequence file using Sqoop.

Importing Table as a Avro File into HDFS

We can also store the RDBMS data as an Avro file by specifying –as-avrodatafile as shown in the below command:

When storing a file using Avro file format, we will get the output file with .avro extension and the contents inside the file will be in binary format. We can see the same in the below screenshot:

We have now successfully stored the contents of RDBMS in an Avro file using Sqoop.

The above mentioned scenarios are just few of the circumstances for importing data from RDBMS to HDFS using Sqoop.

Now, let’s take a look at how to export data from HDFS to RDBMS using Sqoop.

Export Data from HDFS to RDBMS

To export the data in HDFS to MySQL, first we need to create a table in MYSQL which matches the schema of the data in HDFS.

Let’s export the Customers data set shown in the below screen shot in HDFS to RDBMS.

It consists of the fields, Customer_Id, Customer_name and Customer_city.

Let’s create a table in MYSQL using the below command:

We can check the schema of the created table using the following command:

We have theCustomers data set in HDFS in the below specified location.

/Customers.csv

We will now export the data in above location of HDFS into RDBMS using Sqoop, through the below command:

The data has now been successfully transferred from HDFS to RDBMS. We can check the data in MySQL using the below command:

We can see that the contents of dataset is now available inside the table customers, which is present in MySQL. Thus we have successfully transferred data from HDFS to RDBMS.

Hope this post helped you to learn how to import and export data between HDFS and RDBMS. Keep visiting our blog for more updates on Big Data and other technologies.

https://bigishere.wordpress.com/

Installing Mysql and Sqoop in Hadoop

Exporting Files From HDFS To MySQL Using SQOOP

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s