Hive Use Case – Pokemon Data Analysis

In this post, we will be performing certain Hive queries to perform data analysis on Pokémon Go characters.

So, what is Pokémon Go?

Pokémon Go is a free-to-play, location-based augmented reality game developed by Niantic for iOS and Android devices. It was released only in July 2016 and only in selected countries. You can download Pokémon for free of cost and start playing. You can also use PokéCoins to purchase Pokéballs, the in-game item you need to be able to catch Pokémon.

Performing Pokémon Data Analysis –

You can download the dataset necessary for this data analysis from here.

The dataset consists of 11 columns and their respective description is as follows:

Pokemonid_Number: This column represents id of each Pokémon.

Name: This column represents the name of the Pokémon.

Type 1: This column represents the property of a Pokémon.

Type 2: This column represents the extended property of the same Pokémon.

A Pokémon may be one or both the types. For instance, Charmander is a Fire type, while Bulbasaur is both a Grass type as well as a Poison type. With the current 18-type system, there are 324 possible ways to assign these types to Pokémon, along with 171 unique combinations. As of Generation VI, 133 different type combinations have been used.

Total: This column represents the sum of all character points of a Pokémon (HP, Attack, Defense, Sp. Atk, Sp. Def, and Speed).

HP (Hit Points): This column represents Pokémon Hit Points, which is a value that determines how much damage a Pokémon can receive. When a Pokémon’s HP is down to ‘0’, the Pokémon will faint. HP is the most frequently affected stat of them all, as a depleting HP is a key factor in winning a battle.

Attack: This column represents the Attack stat.

Defense: This column represents the Defense stat.

Sp. Atk: This column represents a Pokémon’s Special Attack stat.

Sp. Def: This column represents a Pokémon’s Special Defense stat.

Speed: This column represents the speed stat of a Pokémon.

Let’s begin by creating a table to hold the dataset, as shown below.

The above command will create a table called ‘pokemon’, with the fields as shown in the dataset description. We have given the parameter to skip the header line, so while loading the dataset, this ‘pokemon’ table will ignore the header line.

Next, let’s load the dataset into the table as shown below.

Problem Statement 1:

Find out the average HP (Hit points) of all the Pokémon, using the below query.

In the above screenshot, you can see that the average Hit point of the Pokémon is 69.25875.

Problem Statement 2:

Create and insert values of existing table ‘pokemon’ into a new table ‘pokemon1’, with an additional column ‘power_rate’ to find the count of ‘powerful’ and ‘moderate’ from the table ‘pokemon1’.

Now, based on the average hit points, we will create another column called ‘power_rate’.

In order to segregate the Pokémon, we will use if condition inside the select statement, which will create one more column in our dataset. The if condition should be used in the following manner inside a Hive query.

if(expr1,expr2,expr3)

Now, we will create a table based on the condition that if the HP is greater than the average HP, then it is powerful, and if the HP is less than the average, then it is Moderate and a neutral condition is considered as powerless. The same is given as a Hive query below.

With the above query, a new column power_rate has been created.

Now, we will find out the number of powerful and moderate HP Pokémons present, using the below query.

You can see that in the above screenshot, we have the result as 422 Pokémons with moderate power and 378 Pokémons with high power.

Problem Statement 3:

Find out the top 10 Pokémons according to their HP’s using the below query.

You can see the list of top 10 Pokémons according to their HP’s.

Problem Statement 4:

Find out the top 10 Pokémons based on their Attack stat, using the below query.

You can see the list of top 10 Pokémons according to their attack.

Problem statement 5:

Find out the top 10 Pokémons based on their Defense stat, using the below query.

Now, we will see the list of top 10 Pokémons according to their defense, using the below query.

In the above screenshot, you can see the list of top 10 Pokémons according to their defense.

Problem statement 6:

Find out the top 10 Pokémons based on their total power.

In the above screenshot, you can see the list of top 10 Pokémons according to their all round powers.

Problem statement 7:

Find out the top 10 Pokémons having a drastic change in their attack and sp.attack, using the below query.

In the above screenshot, you can see the list of top 10 Pokémons having a drastic change in their attack and sp.attack.

Problem statement 8:

Find out the top 10 Pokémons having a drastic change in their defense and sp.defense, using the below query.


In the above screenshot, you can see the list of top 10 Pokémons having a drastic change in their defense and sp.defense.

Problem statement 9:

Find out the top 10 fastest Pokémons, using the below query.

We hope this post has been helpful in understanding how to perform data analysis using Hive. In the case of any queries, feel free to comment below and we will get back to you at the earliest.

Regards

Anand  Pandey

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s