Winning with Big Data - IBM Research

Earlier today (10th August 2012) IBM Research (@IBMResearch) ran a webinar titled “Box Office to Front Office – Winning with Big Data in Sports & Entertainment“. It was split in two halves, one Entertainment focused and one more Sports focused.

I tuned in primarily for;

Todd Yellin, VP of Product Innovation, Netflix
Ray Elias, CMO, StubHub

You might be able to see replay of the video on the livestream.com page.

I watched most of the webinar and participated on social media.

However if you don’t have time to go through the video yourself I can sum up the key points for you here.

Data is core to delivering new products and services – and are crucial to StubHub & Netflix,
Analytics (Data Science) is the key to distilling masses of data down to the useful information,
Implicit data collection is imperative, if you are not doing it already – start now,

I’m a really big fan of implicit data collection, if you aren’t sure of what it is – its very simple to grasp. Implicit data is anything you can learn about your user/customer without having them explicitly tell you. At a very basic level “clickstream” data from your websites is implicit data, if you serve videos like Netflix then understanding *how* your customers watched it: did they watch it all? did they watch it over two nights? did they switch to something else 5 minutes in?

The really clever part about implicit data collection is knowing *what* you showed them, and what they responded to. For example, say you are an e-commerce site and you have a landing page with a selection of 24 products. Maybe you use some form or popularity metric to pick which 24 products you show – whatever.

If you are wholly reliant on “clickstream” as your data source, you will get nothing unless they click-through to a product. However, you can be far more intelligent, track the mouse movements, did they mouse-over that summery maxi dress? Congratulations, you’ve just realised there is a world more information to be gleaned if you think of this implicit “events”.

I recently wrote a blog post about how Netflix does exactly this, by recording *what* they presented to you, and how you interacted with it, Netflix can derive masses more information that just relying on a dumb metric like content views implied from CDN logs.

The discussion went on to talk about familiar “Big Data == Big Brother” concerns. Todd Yellin downplayed the relevancy of mixing many facets of an individual’s data . o O ( does your fascination for Thriller movies mean you are more likely to buy Watermelons? )

Later a point was raised about being “covert” in your recommendations versus being “overt”. The risk of being entirely covert about your personalisation or recommendations is your users may react adversely if they think you are ‘stalking’ them. The flip side to this is being very overt about why something was recommended can garner trust with your users.

It raises an interesting point… Amazon are very overt “We recommended this because you bought XYZ Espresso Machine”. Netflix on the other hand are kinda covert, despite only recently launching in the UK I have had access to Netflix for years. I’ve struggled to grasp the relevancy of the recommendations it makes, and I’m not alone. I’d even argue that despite the Netflix prize attracting some pretty heavyweight models that solved the problem (mathematically speaking). I firmly believe humans are more complex than can be described by what I term “two dimensional recommender systems”.

Lastly, my highlight of the whole event was Todd Yellin saying “Up until a year ago, our Data Science team were called Business Intelligence”.

Winning with Big Data – IBM Research

Comments

Leave a Reply Cancel reply