Do we need JAVA to learn HADOOP?

When Doug Cutting, the creator of Hadoop, named his new framework after his son’s toy elephant, little did he know that it would take the open source software world by storm. Today, we can also presume that Doug did not wish to create an elephantine misconception about Java being required to master Hadoop. True, Hadoop is built on Java. But do you need Java to learn Hadoop? This blog answers the question for you.

Do you need Java to learn Hadoop?

Two important Hadoop components endorse the fact that you can work with Hadoop without having functional knowledge of Java – Pig and Hive.

Pig is a high-level data flow language and execution framework for parallel computation, while Hive is a data warehouse infrastructure that provides data summarization and ad- hoc querying. Pig is widely used by researchers and programmers while Hive is a favourite with data analysts.

10 lines of Pig = 200 lines of Java. Check out this blog for a Pig demo.

In order to navigate through Pig and Hive, you only need to learn Pig Latin and Hive Query Language (HQL), both of which need only an SQL base. Pig Latin is very similar to SQL, while HQL can best be described as a much faster and more tolerant avatar of SQL. These languages are easy to learn, and more than 80% of Hadoop projects revolve around them.

Careers in Hadoop

Hadoop has become the poster boy of Big Data. With its ability to store huge amounts of data – both structured and unstructured – on the cloud, with lesser capital investment, Hadoop is on top of every CIO’s to-do list, today. This had led to a burgeoning growth in career opportunities around Hadoop.

In order to explore job roles related to Hadoop without having Java as a prerequisite, you need to just orient yourself to two critical aspects of Hadoop; Storage and Processing. For a job around Hadoop storage, you can learn how Hadoop cluster functions, and how Hadoop makes its data secure and stable. For this, knowing the various nuances of the Hadoop Distributed File System (HDFS) and HBase, Hadoop’s distributed database, will help tremendously.

If you choose to work on the processing side of Hadoop, you have Pig and Hive at your disposal, that automatically convert your code in the backend to work with the Java-based MapReduce cluster programming model.

So, without running MapReduce, you can still control the entire life cycle of your project. As long as you master Pig and Hive, along with HDFS and HBase, Java can take a backseat.


Rare requirements for Java coding

However, Java coding is needed if you wish to add user-defined functions to Pig, Hive and other tools. This is required only if you wish to create custom input/output formats. We are happy to inform that this requirement is a rarity.

Another rare scenario where basic Java coding might be necessary is for debugging. In the rare event of a Hadoop program crashing, you might need to debug the program using Java. It’s a fair guess how insignificant a debugging role is going to be, in your career.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s