Linux Commands You Must Know

In this Blog we are going to discuss all the necessary and basic commands which  every IT professional working on linux platform should know. Computer-savvy individuals consider Linux as the best operating system owing its feature of getting customized more easily than its popular counterparts. To learn better there are screenshot attached with every commands […]

Converting JSON into CSV Using Pig

In this blog we will see how to convert JSON format of data into CSV format. We have created our own JSON format data from a CSV file using AVRO file format and we will be using the same JSON data in this blog. You can also download the dataset from this link. We will […]

Transactions in Hive

In this blog post, we have explained about the row-level transactions available in Hive. This post will provide you a good idea of how to implement the row-level transactions on the Hive table. Before beginning with the transactions in Hive, let’s look at the ACID properties, which are vital for any transaction. What is ACID? […]

Hdfs Commands for Beginners

HDFS commands is a Java-based file system that provides scalable and reliable data storage in the Hadoop Ecosystem. So, you need to know basic HDFS commands to work in HDFS. Let’s first discuss why HDFS is used and the advantages of using it in Hadoop. HDFS – Features and Advantages HDFS is popularly known as […]

Bucketing in Hive

In our previous post, we have discussed on the concept of Partitioning in Hive. In this post, we will be discussing the concept of Bucketing in Hive, which gives a fine structure to Hive tables while performing queries on large datasets. As we all know, Partition helps in increasing the efficiency when performing a query […]

Partitioning In Hive

Introduction to partitioning: Hive has been one of the preferred tool for performing queries on large datasets, especially when full table scan is done on the datasets. In the case of tables which are not partitioned, all the files in a table’s data directory is read and then filters are applied on it as a […]

Integrating Hive with HBase

A brief introduction to Hive: Apache Hive is a data warehouse software that facilitates querying and managing of large datasets residing in distributed storage. Hive provides SQL-like language called HiveQL for querying the data. Hive is considered friendlier and more familiar to users who are used to using SQL for querying data. Hive is best […]

Beginner’s Guide for Spark

In this Blog we will be discussing the basics of Spark’s functionality and its installation. Apache spark is a cluster computing framework which runs on top of the Hadoop eco-system and handles different types of data. It is a one stop solution to many problems. Spark has rich resources for handling the data and most […]

How to Import Table from MySQL to HBase

Importing Table from MySQL to HBase In this blog, we will be discussing how we can export tables from MySQL database to an HBase table. Before moving further, to know how we can import table contents from MySQL to HBase table, we should know first why HBase came into the picture and how it overpowered […]

Read and Write Operations in HBase

HBase is the open-source implementation of Google’s Big Table, with slight modifications. HBase was created in 2007 and was initially a part of contributions to Hadoop which later became a top-level Apache project. It is a distributed column-oriented key value database built on top of the Hadoop file system and is horizontally scalable which means […]