Pig Script in Local Mode

Pig Script in Local Mode Step1: Writing a Script Open an editor (e.g. gedit) in your Cloudera Demo VM environment. Write the following command to create ‘sample.pig’ file inside the home directory of cloudera user: Command:  gedit sample.pig Let’s write few PIG commands in the sample script! Let’s say our task is to read data from a […]


Hadoop – the solution for deciphering the avalanche of Big Data – has come a long way from the time Google published its paper on Google File System in 2003 and MapReduce in 2004. It created waves with its scale-out and not scale-up strategy. Inroads from Doug Cutting and team at Yahoo and Apache Hadoop […]

HIVE-QL -word count

Word Count In Hive In this post I am going to discuss how to write word count program in Hive.Assume we have data in our table like below This is a Hadoop Post and Hadoop is a big data technology and we want to generate word count like below a 2 and 1 Big 1 […]

PIG-LATIN – word Count

Word Count in Pig Latin In this Post, we learn how to write word count program using Pig Latin. Assume we have data in the file like below. This is a hadoop post hadoop is a bigdata technology and we want to generate output for count of each word like below (a,2) (is,2) (This,1) (class,1) […]

Linux Interview Questions For Beginners: Top Questions You Must Prepare For In 2016

System Administrator, Storage Administrator, Web Applications Expert, Database Administrator – these are just a handful of job titles that have seen an upsurge since October 2015 (according to Indeed.com). Job opportunities are skyrocketing, and with organizations adopting Linux far and wide, Linux Administrator roles are getting hard to fill. The signal has never been clearer […]

Running A Map Reduce Program Using Oozie

In Big data projects different extract/transform/load (ETL) and pre-processing operations are needed to start the actual processing jobs and Oozie is a framework that helps to automate this process and codify this work into repeatable and reusable units or workflows. In this blog we will  be learning regarding the creation of  a workflow to run a […]

Hadoop Admin Commands

In this blog we will be discussing about some of the Administration commands and how they work? Hadoop fsck Commands 1. Hadoop fsck / fsck command is used to check the HDFS file system. There are different arguments that can be passed with this command to emit different results. Please follow the below screenshot for […]

Beginner’s Guide for Hdfs

This post is about how HDFS handles data in batches and in real-time. This post will address all your queries on how HDFS stores data coming from different sources and in different forms, and also includes the basics of HDFS, starting from what Hadoop is, the different versions of Hadoop, changes in these versions, and […]

Merging Files in HDFS

In this blog, we will discuss about merging files in HDFS and creating a single file. Before proceeding further, we recommend you to refer to our blogs on HDFS. The links are provided below: Beginners-Guide-For-HDFS HDFS-Commands-For-Beginners Merging multiple files is useful when you want to retrieve the output of a MapReduce computation with multiple reducers, […]

Setting up Hadoop Cluster on Cloud

This blog focuses on Setting up a Hadoop Cluster on Cloud. Before we start with the configuration, we need to have a Linux platform in cloud. We will setup our Pseudo mode Hadoop cluster on AWS ec2 Instance. Note: Here we are assuming that you have an active AWS account and your Linux Instance is running. Also, make sure […]

Big Data Terminologies You Must Know

In this blog, we will discuss the terminology related to Big Data ecosystem. This will give you a complete understanding of Big Data and its terms. Over time, Hadoop has become the nucleus of the Big Data ecosystem, where many new technologies have emerged and have got integrated with Hadoop. So it’s important that, first, […]