Data aggregation using dplyr package in R (sample snippet)

Posted on Updated on

In this post, I am using SuperStore data to explore some of the data wrangling functions from these two packages.

The data is in the form of an Excel workbook with three sheets namely – Orders, Returns, and Region. I am loading the three different sheets into separate datasets into R, joining them and performing necessary aggregations.

# Author: Naveen Goje, Email:


# 1 Load required libraries 

#For accessing and dumping excel files
#Used for data wrangling
#Used for data wrangling

# 2 Load three individuals sheets into separate datasets 


# Sum of Sales by Product Category 
 group_by(Product.Category) %>%
 summarise(Total.Sales = sum(Sales)) %>%


# 3 Join data sets and aggregate as per requirement
# Inner join the two data sets Order and Users by Region and look at Total Sales by Region %
 group_by(Region) %>%
 summarise(Total.Sales = sum(Sales))


Spotfire IronPython: Reset Markings

Posted on Updated on

Here is a code snippet for removing markings across visualizations/data tables within an analyses. This script will not reset the filters and only work on the marking. To illustrate this, I have constricted the date range for couple of date filters in the filter column.


I will now make some selections on the bar chart


To remove the markings without resetting the applied filters, we can execute the script through the button Reset Markings


Code snippet for reference: Reset_Markings_4

# Import required libraries
from Spotfire.Dxp.Data import *
from Spotfire.Dxp.Application.Filters import *

def resetMarking():
# Loop through each data table
for dataTable in Document.Data.Tables:
# Navigate through each marking in a given data table
for marking in Document.Data.Markings:
# Unmark the selection
rows = RowSelection(IndexSet(dataTable.RowCount, False))
marking.SetSelection(rows, dataTable)

# Call the function

Below is one more use of the snippet applied for multiple visualizations which use cascading markings.


When the analysis contains multiple markings, then visualizations using these multiple markings have to be set individually. Below I am removing the marking for a single data table.


Though we Unmark marking for a single visualization, the markings which have got applied through cascading are not reset in other visualizations. In the bottom visualization, the markings are not reset as it inherits markings from the center visualization.


In such cases, our script should do the job.


Cohort Analysis

Posted on Updated on

Cohort Analysis is a technique used to analyze characteristics of a cohort (a group of customers distinguished on a common characteristic) over time. It is actually another type of customer segmentation which extends the analysis over a defined period.

One of the frequently applied use case in sales function is to segment customer base based on some set of characteristics. The criteria could be to categorize them into groups who are likely to continue buying, who are likely to defect or who have already defected (went inactive).

Once these groups are formed, some of the common applications for analysis would be to:

  1. Study customer retention – use the results to learn about conversion rates of certain groups and accordingly focus marketing initiatives (may be try to focus on customers who could be retained)
  2. Forecast transactions for cohorts/individual customers and predict purchase volume
  3. Bring more business – Identify groups for upselling and cross-selling
  4. Estimate marketing costs by calculating lifetime value of a customer by cohort
  5. Improve customer experience based on individual customer needs across websites and stores


Marketing Analytics

Posted on Updated on

Marketing is hugely important for a business to succeed. Being able to clearly define marketing objectives and accordingly prioritize on marketing spend is one of the major challenges marketers face. And in order to tune their approach, marketers need important metrics from various business functions to determine marketing effectiveness. Below is an attempt to categorize some of the generally applied analytic techniques that can be used to measure the marketing performance.

Our first step in this analysis would be to identify relevant data sources and develop automation capabilities to streamline data into well-defined repositories. Next, we could use a combination of descriptive and predictive analytic techniques to gain insights. And further we could integrate different models and automate their execution to perform prescriptive analytics for continuous monitoring and feedback.

Marketing drives sales and sales in turn should help improve marketing strategy. Let’s look at some techniques to identify sales patterns and then work on improving our mix of marketing activities.

Applications Applicable Tools/Techniques Required Measures/Expected Results
Sales Performance (Descriptive) Visualizing data using Time Series Analysis and other metrics using standard/ad hoc reporting and operational dashboards that cater to different audiences Use accumulated data over time to learn about correlations and identify patterns
ARIMA models for time series data
Sales Performance (Predictive) Simple and multiple linear regression techniques for forecasting and simulation Determine future possibilities and predicting events to make more informed decisions
Customer Service
Applications Applicable Tools/Techniques Required Measures/Expected Results
Customer Acquisition and Retention Logistic Regression (Churn Analysis) Using historical data to identify ingress and egress of customers
Customer Segmentation Cluster Analysis Identify potential markets and improve on promotion, product, pricing and distribution decisions
Decision Trees
Hypothesis Testing
Product and Brand Feedback Text Analytics using Natural Language Processing Toolkit from Python Analyze unstructured data from social media platforms such as Facebook, Twitter, Yelp etc.
Sentiment Analysis using Stanford NLP
Customer Loyalty Logistic Regression Understand customer behavior and improve decisions around targeted promotions
Multivariate Analysis using Factor Analysis, Principal Component Analysis or Canonical Correlation Analysis
E-Marketing Clickstream Analysis (Traffic and E-commerce-based) Improve conversion and sales
Drive email marketing campaigns
Google Analytics for website statistics Search engine optimization (SEO)
Channel adaptation

Note: The above mentioned techniques can always be used across a set of problems depending on their applicability.

After analyzing the results from our analytical models, we have to take measures on improving crucial marketing activities such as generating leads, demand creation and product promotion. Further, above analysis could be used to design and implement marketing strategies including product and brand promotion, pricing strategy, distribution and customer service. And the findings can be employed in improving questionnaires and other mechanisms of collecting marketing data and customer feedback to learn about product performance and brand value.

With these new analytics capabilities, we can make predictions much more accurately and provide our marketing teams with new ideas to drive promotions and boost sales.

In general, adoption and effective application of these analytic techniques is challenging. Building the right analytics should be informed by industry knowledge and subject to the business function in context. However, this is a process which requires constructive iteration over a long term and in most cases should lead in optimizing marketing performance and delivering tremendous value to the organization.

Spotfire IronPython: Accessing Column Values in Script Context

Posted on Updated on

Below is a code snippet to pull data from a data table into Script context. The data from a particular column(s) could be used to perform validations or compared against values from other data tables.


IronPython code:

from Spotfire.Dxp.Data import *

columnToFetch='Order Date'
rowCount = activeTable.RowCount
rowsToInclude = IndexSet(rowCount,True)
cursor1 = DataValueCursor.CreateFormatted(activeTable.Columns[columnToFetch])
ctr1 = 0
for row in activeTable.GetRows(rowsToInclude,cursor1):
rowIndex = row.Index
val1 = cursor1.CurrentValue
ctr1 = ctr1 + 1
if (ctr1 == 5):

Further, we could push the data into an array for temporary storage and use as per requirement.

Spotfire: Show/Hide Legend and Switch View Type

Posted on Updated on

In some cases (for ex. Trellis view), to get more visualization space, we could simply hide the legend.

For this, on change of View Type (using a drop down property control – ViewType), I am invoking a script to show or hide the legend.

Below, creating the property control ViewType:


Script executed on change of drop down value: ShowHide_Legend_3

IronPython Code:

from Spotfire.Dxp.Application import Visuals
from Spotfire.Dxp.Application.Visuals import LineChart
from Spotfire.Dxp.Application.Visuals import Legend
for vis in Application.Document.ActivePageReference.Visuals:
if vis.Title == "Sales - Trellis View":
vis.As[LineChart]().Legend.Visible = False
if vis.Title == "Sales - Full View":
vis.As[LineChart]().Legend.Visible = True

In Full View, Color by legend is shown. ShowHide_Legend_2

Disabling the legend in Trellis view.ShowHide_Legend_1

Analyses (dxp) for reference from TibCommunity

Spotfire: Show/Hide Regression Line (Straight Line Fit)

Posted on Updated on

The Straight Line Fit from Lines & Curves could be customized and shown/hidden as per requirement.

Below is a Scatter plot used to add a straight line fit.


Make sure that the Straight Line Fit is the first fitting model in Lines & Curves (to use the script below).


Add a button with script as below. In the previous step, since the first fitting model is our Straight Line Fit, we refer it in the script as FittingModels[0]. If your Straight Line is the second fitting model, you could refer it as FittingModels[1] and so on. Additionally, we need to pass the Scatter Plot as Visualization input vis to the script.


IronPython Script:

from Spotfire.Dxp.Application.Visuals import ScatterPlot
for fm in vis.As[ScatterPlot]().FittingModels:
for vis in Application.Document.ActivePageReference.Visuals:
if vis.Title == "Sales by Profit":
if vis.As[ScatterPlot]().FittingModels[0].Enabled == True:
vis.As[ScatterPlot]().FittingModels[0].Enabled = False
vis.As[ScatterPlot]().FittingModels[0].Enabled = True

Now you can use the button to toggle and show/hide the regression line.