Saturday, March 22, 2014

Date Based Cohort Analysis for Adobe SiteCatalyst using R

Over the years I've generally avoided Excel. Being a programmer, I could just pick up python and write code to do what I needed, I didn't need to hack something together in Excel. But I always ended up back there for the charting.

Then I learned R and have even more reason to avoid Excel.

Recently I needed to implement date based cohorts in SiteCatalyst. While there are a few blog posts on how to do this in Excel using Report Builder ( , they didn't work for me. My team is all on MacOS, and Report Builder isn't.

In this example I'm going to use events tracked by the Mobile Library lifecycle stats. One plus of this solution is it doesn't require any SAINT classifiers to convert mobileinstalldate to a month/year.

The idea here is you use QueueTrended to chunk together uniqueusers by month, with mobileinstalldate as the counted event. If you look at the data output from QueueTrended is makes more sense. The rest is then using plyr and reshape2 to beat the data into the form we want. It works just fine with segments.

I'm not sharing my code that generates percentages yet because I'm not particularly happy with it yet. Drop me a line if you are interested.

And yes, the data is small, this is from a private unreleased product I am working on.