Big Data - Ponderings of Practitioner - III

My earlier posts on this topic have taken a top down view – from model building to implementation. In this post, I attempt to understand the bottom up view of this world – the vendor landscape and tools in this space. Good summaries of this view are provided in:

a) Stonebraker’s post on Big Data – This takes the view – everything is data- It is large in volume, coming at great rates in realtime (velocity), and it is of different “types”/formats/semantics – variety. How do we support the conventional CRUD operations and more on this ever increasing dataset?
b) The vendor landscape as provided in a) , b) and a nuanced view of b) in c). In a) the big data world is seen as a continuation of the conventional world of database evolution in the past three decades evolved to include unstructured and streaming data, video,images and audio. b) and c) view it from the positioning of different “tech” buckets – each focused on “improving” some aspect of an implementation.

c) The analytics services view : Every worth while realworld application has some or all of the pieces of this architecture. One can pick a variety of tools for each component (open source or proprietary), jig them in different ways, use off-the-shelf tools such as R and SAS and more to analyze the data.

As I review the tools in this space, it is important to understand that these vendors value proposition is not to solve your “big data” problem – relevant to your business but to sell tools. Only after resolving the issues from a top down perspective, one can even constrain the technology choices and evolve the final solution incrementally. Vendors do not know your domain or your final application – so they cannot be held responsible. There are startups evolving in this space adopting either the horizontal tech view (db tool, visualization tool, selling data) or the vertical view – solving a specific problem in a vertical – say marketing/advertising/wellness etc. For example, Google is a big data company dealing with advertising at scale (vertically oriented) – they built the big data toolkit to solve a vertical problem. Amazon’s recommender system is another application of big data at scale – for books (and later extended to other products).

Betting on a vertically oriented view has better odds since the key to getting value out of big data is “model building”. Model-free approaches to big data – free ranging analyses of data, tech investments without a well-bounded/specific purpose – are more or less bound to fail. Worthwhile/reliable models do not emerge out of the blue in any domain – it requires work. The business advantage is you get to exploit the benefits of the model till someone else figures it out which is how all science/tech works. So the key question is how does a tech leader do an “evaluative” project – that provides some guidance on big data investments given limited resources? I will have some thoughts on this in future posts on this topic.