Hadoop Interview Questions Based on Sqoop and Kafka

What will happen if target directory already exists during sqoop import?

Ans: Sqoop runs a map-only job and if the target directory is present, it will throw an exception.

What is the use of warehouse directory in Sqoop import?

Ans: warehouse directory is the HDFS parent directory for table destination. If we specify target-directory all our files are stored in that location. But, with warehouse directory, a child directory is created inside it with the name of the table. All the files are stored inside the child directory.

What is the default number of mappers in a Sqoop job?

Ans: 4

How to bring data directly into Hive using Sqoop?

Ans: To bring data directly into Hive using Sqoop use –hive-import command.

We wish to bring data in CSV format in HDFS from RDBMS source. The column in RDBMS table contains ‘,’. How to distinctly import data in this case?

Ans: Use can use the option –optionally-enclosed-by

How to import data directly to HBase using Sqoop?

Ans: You need to use –hbase-table to import data into HBase using sqoop. Sqoop will import data to the table specified as the argument to –hbase-table. Each row of input table will be transformed into an Hbase put operation to a row of output table.

What is incremental load in Sqoop?

Ans: To import records which are new. For this, you should specify –last-value parameter so that the sqoop job will import values after the specified value.

What is the benefit of using a Sqoop job?

In the scenario where you must perform incremental import multiple times, you can create a sqoop job for incremental import and run the job. Whenever you run the sqoop job, it will automatically identify last imported value and then the import will start after the identified value.

Where does Sqoop job store the last imported value?

Ans: In its metastore.

What is Kafka?

Ans: It is a distributed, partitioned and replicated publish-subscribe messaging framework.

How is Apache Kafka different from Apache Flume?

Ans: Kafka is a publish-subscribe messaging system, whereas, flume is system for data collection, aggregation and movement

What are important elements of Kafka?

Ans: Kafka Producer, Consumer, Broker, and Topic.

What role does zookeeper play in a kafka cluster?

Ans: The basic responsibility of a Zookeeper is to build coordination between Kafka cluster.

How can consumer control the offset consumed by it.?

Ans: Automatic Commit or Manual commit.

We hope the above questions will help you in answering the Hadoop interview questions asked in the various organizations.


Anand Pandey


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s