HDFS commands is a Java-based file system that provides scalable and reliable data storage in the Hadoop Ecosystem. So, you need to know basic HDFS commands to work in HDFS.
Let’s first discuss why HDFS is used and the advantages of using it in Hadoop.
HDFS – Features and Advantages
HDFS is popularly known as Hadoop Distributed File System,which is the core component of Hadoop. HDFS is a java-based file system and is the place where all the data in the Hadoop cluster resides.In typical terms, Hadoop has the Master-Slave architecture. This is named in perspective to the HDFS.
It is called as Master-Slave architecture because there is a Master which takes control of all the Slaves. Here the Master is named as Namenodes and the Slaves are named as Datanodes.
HDFS has been restructured in the second version of Hadoop to support multiple types of data processing units.
HDFS has become a key tool for managing pools of Big Data and supporting Big Data Analytics applications.
The advantages of HDFS in clusters are as follows:
- Offers a cost effective storage solution for businesses.
- Uses commodity direct attached storage and shares the cost of the network & computers.
- It is highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in simultaneously.
- Businesses can use Hadoop to derive valuable insights from data sources such as social media, email conversations or clickstream data(flexible)
- Maps data quickly, wherever it is located on a cluster.
- When data is sent to an individual node, that data is also replicated to other nodes in the cluster. Meaning that, in the event of failure there is another copy of the data is available for use.
Basic HDFS Commands
Moving forward to HDFS commands, we need to understand the syntax of each command. The general syntax is as follows:
hadoop dfs [COMMAND [COMMAND_OPTIONS]]
This will run a filesystem command on the file system supported in Hadoop (HDFS). The various Command options are shown below:
Let’s discuss each of these commands in detail.
1. Put Command
The ‘put’command feeds the data in to the HDFS.
Syntax: hadoop dfs –put </source path> </destination path>
2. List Command
The ‘list’command displays all the available files inside a particular path.
Syntax: hadoop dfs –ls </source path>
The ‘get’ command copies the entire contents of the mentioned file to the local drive.
Syntax: hadoop dfs –get </source path> </destination path>
4. Make Directory Command
The ‘mkdir’ command creates a new directory in the specified location.
Syntax: hadoop dfs –mkdir </source path>
5. View contents of particular file
The ‘cat’ command is used to display all the contents of a file.
Syntax: hadoop dfs –cat </path[filename]>
6.Duplicating a Complete File inside the HDFS.
The ‘copyfromlocal’ command will copy file from the local file system to the HDFS.
Syntax: hadoop dfs –copyFromLocal </source path> </destination path>
7.Duplicating a File from HDFS to the Local File System.
The ‘copytolocal’ command will copy files from the HDFS to the local file system.
Syntax: hadoop dfs –copyToLocal </source path> </destination path>
8.Removing the File
The command ‘rm’ will delete the file stored inside the HDFS.
Syntax: hadoop dfs –rm </path[filename]>
9.Run a DFS Filesystem to Check Utility
The command ‘fsck’ is used for checking the consistency of a file system
Syntax: hadoop fsck </file path>
10.Run a Cluster Balancing Utility
The command ‘balancer’ will check for work load on nodes in cluster and balance it.
Syntax: hadoop balancer
11.Check Directory Space in HDFS
The command will show the file size occupied by file inside cluster.
Syntax: hadoop dfs -du -s -h </file path>
12. List all the Hadoop File System Shell Commands
The ‘fs’ command lists down all the shell commands of the Hadoop File System.
Syntax: hadoop fs [options]
Last but not the least, always ask for help!
13.Asking for Help
The ‘help’ command is for asking for help or querying a particular question.
Command: hadoop fs -help