Over the past couple of months I have met and talked to a lot of new and interesting people. Everywhere I go I encounter the same questions about Big Data, it’s like some sort of mass hysteria around what on the face of it is a simple concept “volumes of data”.
- “How much data is Big Data?”
- “I’m using NoSQL, is that Big Data?”
- “How many servers do I need to get started with Big Data?”
- “Do I have to use Hadoop to be Big Data?”
After spending a combined sum of many hours explaining my perspective on what is “Big Data”, I have managed to distil it down into an amazingly simple concept.
Quite simply this:
“When you need to process volumes of data at least two orders-of-magnitude greater than what you have today, you are probably doing Big Data.”
Please note, I didn’t mention Gigabytes, Terabytes or even Petabytes. Nor did I pontificate about how many billions of rows and columns are needed to become ‘Big Data’. When you think about it in terms of “volumes of data”, all the complexity falls away.
Ask yourself these three simple questions;
- Are my existing tools at their data/performance capacity?
- If I am to scale up, is it going to be too expensive?
- Would being able to process two (or more) magnitudes more data give me a competitive advantage?
If you answered ‘Yes’ to any of the above, you are probably approaching a tipping point where you need to think about your future investment in tools.
For example, if your business is currently using Excel ’97 (yes really) with a limitation of 65,536 rows, then ask yourself how you would deal with 6.5 million rows of data. While this may get you laughed out of Silicon Valley by the ‘Big Data Kool-kids’, to you – these volumes are the very definition of ‘Big Data’ and chances are you should be looking at MS SQL Server (at least).
Wait a minute, I didn’t see any mention of NoSQL or Elephants.. you mean MS SQL Server can be used for ‘Big Data’?
In this very simple example, yes. I guess the point I’m trying to get across is this…
Sure you could go and splurge your companies time and money playing with these awesome shiny new tools, but sooner or later you’re going to need to prove that investment was worthwhile.
It makes far more sense to either;
- pick a problem your company is facing that can’t be solved with your existing tools
- find an ‘edge’ or ‘advantage’ that you could deliver, if you had the tools to realise it
Now you have a clear objective and a set of requirements, you’re now ready to start looking for a new and innovative way to deliver something of clear value. Your bosses will be overjoyed if you can deliver real returns on their investment and your company stands to benefit as whole.
Afterall, I believe it was a recent Forbes article that stated: “By 2015, companies that are using ‘Big Data’ effectively will be 20% ahead of their competitors”.
It’s not about how big your data is, it is all about how effectively you use it :o)