Sentiment Analysis on Demonetization – Pig Use Case

Let us find out the views of different people on the demonetization by analysing the tweets from twitter. Here is the dataset where twitter tweets are gathered in CSV format. You can download the dataset from the below link or ask for data via mail.

Now we will load the data into pig using PigStorage as follows:

Now after loading successfully, you can see the tweets loaded successfully into pig by using the dump command.

Here is the sample tweet

Metadata of the tweets are as follows:

  • id
  • Text (Tweets)

  • favorited

  • favoriteCount

  • replyToSN

  • created

  • truncated

  • replyToSID

  • id

  • replyToUID

  • statusSource

  • screenName

  • retweetCount

  • isRetweet

  • retweeted

Now from this columns, we will extract the id and the tweet_text as follows

Now if you dump the extracted columns, you will get the id and the tweet_text as follows:

Now we will divide the tweet_text into words to calculate the sentiment of the whole tweet.

For every word in the tweet_text, each word will be taken and created as a new row

You can use the dump command to check the same. Here is the sample.

In the above sample record, you can see that at the last RT word has been taken and created a new record for that.

You can use the describe tokens command to check the schema of that relation and is as follows:

tokens: {id: bytearray,text: bytearray,word: chararray}

Now, we have to analyse the Sentiment for the tweet by using the words in the text. We will rate the word as per its meaning from +5 to -5 using the dictionary AFINN. The AFINN is a dictionary which consists of 2500 words which are rated from +5 to -5 depending on their meaning. You can download the dictionary from the following link:

AFINN dictionary

We will load the dictionary into pig by using the below statement:

We can see the contents of the AFINN dictionary in the below screen shot.

Now, let’s perform a map side join by joining the tokens statement and the dictionary contents using this relation:

We can see the schema of the statement after performing join operation by using the below command:

In the above statement describe word_rating, we can see that the word_rating has joined the tokens (consists of id, tweet text, word) statement and the dictionary(consists of word, rating).

Now we will extract the id,tweet text and word rating(from the dictionary) by using the below relation.

We can now see the schema of the relation rating by using the command describe rating.

rating: {id: bytearray,text: bytearray,rate: int}

In the above statement describe rating we can see that our relation now consists of id,tweet text andrate(for each word).

Now, we will group the rating of all the words in a tweet by using the below relation:

Here we have grouped by two constraints, id and tweet text.

Now, let’s perform the Average operation on the rating of the words per each tweet.

Now we have calculated the Average rating of the tweet using the rating of each word.

From the above relation, we will get all the tweets i.e., both positive and negative.

Here, we can classify the positive tweets by taking the rating of the tweet which can be from 0-5. We can classify the negative tweets by taking the rating of the tweet from -5 to -1.

We have now successfully performed the Sentiment Analysis on Twitter data using Pig. We now have the tweets and its rating, so let’s perform an operation to filter out the positive tweets.

Now we will filter the positive tweets using the below statement:

Here are the sample tweets with positive ratings.

Like this we will also filter the negative tweets as follows:

Here are the sample tweets with negative rating

Like this, you can perform sentiment analysis using Pig.

We hope that this blog helped you in understanding how to perform sentiment analysis on the views of different people using Pig. Keep visiting our site ,


Anand Pandey


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s