MapReduce Use Case – Uber Data Analysis

In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java.

The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from here-

https://drive.google.com/open?id=0B2nmxAJLHEE8YVA0Rkd5VjZvUzA

Problem Statement 1:

In this problem statement, we will find the days on which each basement has more trips.

Source Code

Mapper Class:

From the Mapper, we will take the combination of the basement and the day of the week as key and the number of trips as value.

First, we will parse the date, which is in string format into date format using SimpleDateFormat class in Java. Now, to take out the day of the date, we will use the getDay() which will return an integer value with the day of the week’s number. So, we have created an array which consists of all the days from Sunday to Monday and have passed the value returned by getDay() into the array in order to get the day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the week as keyand the number of trips as value.

Reducer Class:

In the reducer, we will calculate the sum of trips for each basement and for each particular day, by using the below lines of code.

Whole Source Code:

Running the Program:

First, we need to build a jar file for the above program and we need to run it as a normal Hadoop program by passing the input dataset and the output file path as shown below.

hadoop jar uber1.jar /uber /user/output1

In the output file directory, a part of the file is created and contains the below output:

Problem Statement 2:

In this problem statement, we will find the days on which each basement has more number of active vehicles.

Source Code

Mapper Class:

From the Mapper, we will take the combination of the basement and the day of the week as key and the number of active vehicles as value.

First, we will parse the date which is in string format to date format using SimpleDateFormat class in Java. Now, to take out the day of the date, we will use the getDay(), which will return an integer value with the day of the week’s number. So, we have created an array which consists of all the days from Sunday to Monday and have passed the value returned by getDay(), into the array in order to get the day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the week as keyand the number of active vehicles as value.

Reducer Class:

Now, in the reducer, we will calculate the sum of active vehicles for each basement and for each particular day, using the below lines of code.

Whole Source Code:

Running the Program:

First, we need to build a jar file for the above program and run it as a normal Hadoop program by passing the input dataset and the output file path as shown below.

hadoop jar uber2.jar /uber /user/output2

In the output file directory, a part file is created and contains the below output:

We hope this post has been helpful in understanding the Uber Data Analysis use case using MapReduce. In the case of any queries, feel free to comment below and we will get back to you at the earliest.

Regards

Anand Pandey

Advertisements
Categories:

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s