Month: December 2016

Data aggregation using dplyr package in R (sample snippet)

Posted on Updated on

In this post, I am using SuperStore data to explore some of the data wrangling functions from these two packages.

The data is in the form of an Excel workbook with three sheets namely – Orders, Returns, and Region. I am loading the three different sheets into separate datasets into R, joining them and performing necessary aggregations.

#----------------------------------------------------------------------
# Author: Naveen Goje, Email: naveen.goje@gmail.com
#----------------------------------------------------------------------

setwd("C:/R")

#----------------------------------------------------------------------
# 1 Load required libraries 
#----------------------------------------------------------------------

#For accessing and dumping excel files
install.packages("openxlsx")
library(openxlsx) 
#Used for data wrangling
install.packages("dplyr")
library(dplyr) 
#Used for data wrangling
install.packages("tidyr")
library(tidyr)

#----------------------------------------------------------------------
# 2 Load three individuals sheets into separate datasets 
#----------------------------------------------------------------------
superstore.wb

datastore

# Sum of Sales by Product Category 
data.storeOrders%>%
 group_by(Product.Category) %>%
 summarise(Total.Sales = sum(Sales)) %>%
 arrange(Total.Sales)

aggr_1

#----------------------------------------------------------------------
# 3 Join data sets and aggregate as per requirement
#----------------------------------------------------------------------
# Inner join the two data sets Order and Users by Region and look at Total Sales by Region
data.final %
 group_by(Region) %>%
 summarise(Total.Sales = sum(Sales))
data.final

aggr_2