Saturday, March 22, 2014

Date Based Cohort Analysis for Adobe SiteCatalyst using R

Over the years I've generally avoided Excel. Being a programmer, I could just pick up python and write code to do what I needed, I didn't need to hack something together in Excel. But I always ended up back there for the charting.

Then I learned R and have even more reason to avoid Excel.

Recently I needed to implement date based cohorts in SiteCatalyst. While there are a few blog posts on how to do this in Excel using Report Builder (http://adam.webanalyticsdemystified.com/2013/03/07/conducting-cohort-analysis-with-adobe-sitecatalyst/ , http://blogs.adobe.com/digitalmarketing/mobile/what-is-mobile-cohort-analysis-and-why-should-you-use-it/) they didn't work for me. My team is all on MacOS, and Report Builder isn't.

In this example I'm going to use events tracked by the Mobile Library lifecycle stats. One plus of this solution is it doesn't require any SAINT classifiers to convert mobileinstalldate to a month/year.

The idea here is you use QueueTrended to chunk together uniqueusers by month, with mobileinstalldate as the counted event. If you look at the data output from QueueTrended is makes more sense. The rest is then using plyr and reshape2 to beat the data into the form we want. It works just fine with segments.

I'm not sharing my code that generates percentages yet because I'm not particularly happy with it yet. Drop me a line if you are interested.

And yes, the data is small, this is from a private unreleased product I am working on.

> print(cohort_chart)
startmonth 0 1 2 3 4
1 Oct 2013 9 1 0 0 0
2 Nov 2013 27 13 12 10 0
3 Dec 2013 35 14 9 0 0
4 Jan 2014 45 28 0 0 0
5 Feb 2014 29 0 0 0 0
view raw cohort chart hosted with ❤ by GitHub
# needed to access SiteCatalyst programatically
require(RSiteCatalyst) || install.packages("RSiteCatalyst")
library(RSiteCatalyst)
# has a handy yearmon class for dealing with months
require(zoo) || install.packages("zoo")
library(zoo)
library(reshape2)
library(plyr)
#Call SCAuth function to set credentials before usage
SCAuth("your creds", "your secret")
MonthCohortChart <- function(trended_data){
# my apologies, I'm new at R, there is probably
# a more elegant way of doing this
# first off, create a column for start day by month
f = ddply(trended_data,
.(year, month, day),
transform,
startmonth = as.yearmon(mobileinstalldate, "%m/%d/%Y") )
# then create one for the bucket month
j = ddply(f,
.(year, month, day),
transform,
bucketmonth = as.yearmon(ISOdate(year, month, day)) )
# lastly, calculate the cohort month
k = ddply(j,
.(year, month, day),
transform,
cohortmonth = round((bucketmonth - startmonth) * 12 ) )
# now remove all the negative cohort months as they are meaningless
#todo, clear the diagonals that don't matter instead of leaving them as 0
k = subset(k, cohortmonth >= 0)
# reshape into mobileinstalldate and name (month)
# sum together the number of unique users in each bucket
molten_form <- melt( k,
id=c("startmonth", "cohortmonth"),
measure.vars=c("uniquevisitors"))
final_output <- dcast(molten_form,
fun.aggregate=sum,
startmonth ~ cohortmonth)
}
uniques_by_month_cohorted <-
QueueTrended("amhammersmith-extpre",
"2013-10-01",
"2014-02-28",
"month",
"uniquevisitors",
"mobileinstalldate",
top="100000",
startingWith= "1",
segment_id="")
cohort_chart <- MonthCohortChart(uniques_by_month_cohorted)
view raw cohorts_in_R.r hosted with ❤ by GitHub
> print(uniques_by_month_cohorted)
mobileinstalldate name year month day uniquevisitors uniquevisitors_forselectedelements segment
1 12/19/2013 October 2013 2013 10 1 0 12
2 12/19/2013 November 2013 2013 11 1 0 41
3 12/19/2013 December 2013 2013 12 1 7 54
4 12/19/2013 January 2014 2014 1 1 2 74
5 12/19/2013 February 2014 2014 2 1 2 86
6 1/14/2014 October 2013 2013 10 1 0 12
7 1/14/2014 November 2013 2013 11 1 0 41
8 1/14/2014 December 2013 2013 12 1 0 54
9 1/14/2014 January 2014 2014 1 1 6 74
10 1/14/2014 February 2014 2014 2 1 4 86