Merging Files in HDFS

In this blog, we will discuss about merging files in HDFS and creating a single file. Before proceeding further, we recommend you to refer to our blogs on HDFS. The links are provided below:

Beginners-Guide-For-HDFS

HDFS-Commands-For-Beginners

Merging multiple files is useful when you want to retrieve the output of a MapReduce computation with multiple reducers, where each reducer produces a part of the output.

The HDFS getmerge command can copy the files present in a given path in HDFS to a single concatenated file in the local filesystem.

hadoop fs -getmerge /user/hadoop/demo_files merged.txt

The getmerge command has the following syntax:

Hadoop fs -getmerge -nl <source file path> <local system destination path>

The getmerge command has three parameters:

  • <src files> is the HDFS path to the directory that contains the files to be concatenated
  • <dist file> is the local filename of the merged file
  • [-nl] is an optional parameter that adds a new line in the result file.

Steps to merge the files

Step1:

We need to place more than 1 file inside the HDFS directory.

In the figure below, you can see that there are three files named acadgild, hadoop and FlumeData, on which we will perform merging operation.

The content of the files is shown in the below screenshot.

 

Step 2:

We now have to type the command as shown in the screenshot, to merge the files.

We have used -nl as an optional parameter to add extra line after the content of each file.

A file will be created in a specific location of your local machine with merged content. In this case, a new file with the name merged_file will be created, having the content from acadgild, hadoop and FlumeData.

You can directly open the file to see the merged content. Refer the figure below.

From the above figure, you can see that a single file is created after merging the content of three individual files

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s