Strata 2014 Part 1: Hadoop, Bright Lights, Big City

If you’re running a conference in New York, there’s pretty much no middle ground between a large hotel and the Javits Center. And so this year, Strata Hadoop World made the leap, getting provisional access to a small part of the big conventional center to see if it could fill the place. That turned out to be a foregone conclusion.

The obvious question was whether Hadoop, and Big Data, had in fact “crossed the chasm” to become a mainstream enterprise IT market. In case you were wondering, the O’Reilly folks got Geoffrey Moore up on the podium to answer that very question.

For Big Data-powered businesses, there’s little chasm to cross when you factor in the cloud. As Moore put it, if you only need to rent capacity on AWS, the cost of entry is negligible. All that early adopter, early majority, late majority stuff doesn’t really apply. A social site has a business model of getting a million eyes or nothing, and getting there is a matter of having the right buzz to go viral – the key is that there’s scant cost of entry and you get to fail fast. Save that thought – because the fail fast principle also applies to enterprises implementing Big Data projects (we’ll explain in Part 2 of this post, soon to come).

Enterprise adoption follows Moore’s more familiar chasm model – at that we’re still at early majority where the tools of the trade are arcane languages and frameworks like Spark and Pig. But the key, Moore says, is for “pragmatists” to feel pain; that is the chasm to late majority, the point where conventional wisdom is to embrace the new thing. Pragmatists in the ad industry are feeling pain responding to Google; the same goes with media and entertainment sectors were even cable TV mainstays such as HBO are willing to risk decades-old relationships with cable providers to embrace pure internet delivery.

According to Cloudera’s Mike Olson, Hadoop must “disappear” to become mainstream. That’s a 180 switch as the platform has long required specialized skills, even if you ran an off-the-shelf BI tool against it. Connecting from familiar desktop analytics tools is the easy part – they all carry interfaces that translate SQL to the query language that can run on Hive, or on any of the expanding array of interactive-SQL-on-Hadoop frameworks that are making Hadoop analytics more accessible (and SQL on Hadoop a less painful experience).

Between BI tools and frameworks like Impala, HAWQ, Tez, Big SQL, Big Data SQL, Query Grid, Drill, or Presto, we’ve got the last mile covered. But the first miles, which involve mounting clusters, managing and optimizing them, wrangling the data into shape, and governing the data, are still works in progress (there is some good news regarding data wrangling). Tools that hide the complexity and applications that move the complexity under the hood are works in progress.

No wonder that for many enterprises, offloading ETL cycles was their first use of Hadoop. Not that there’s anything wrong with that – moving ETL off Teradata, Oracle, or DB2 can yield savings because you’ve moved low value workloads off platforms where you pay by footprint. Those savings can pay the bill while your team defines where it wants to go next,

We couldn’t agree with Olson more – Hadoop will not make it into the enterprise as this weird, difficult, standalone platform that requires special skills. Making a new platform technology like Hadoop “disappear” isn’t new — it’s been done before with BI and Data Warehousing. In fact, Hadoop and Big Data today are at the same point where BI and data warehousing were in the 1995 – 96 timeframe.

The resemblance is uncanny. At the time, data warehouses were unfamiliar and required special skills because few organizations or practitioners had relevant experience. Furthermore, SQL relational databases were the Big Data of their day, providing common repositories for data that was theoretically liberated from application silos (well, reality proved a bit otherwise). Once tools automated ETL, query, and reporting, BI and data warehousing in essence disappeared. Data Warehouses became part of the enterprise database environment, while BI tools became routine additions to the enterprise application portfolio. Admittedly, the promise of BI and Data warehousing was never completely fulfilled as analytic dashboards for “everyman” remained elusive.

Back to the original question, have Hadoop and Big Data gone mainstream? The conference had little troubled filling up the hall, and questions about economic cycles notwithstanding, shouldn’t have issues occupying more of Javits next year. We’re optimists based on Moore’s “pragmatist pain” criteria — in some sectors, pragmatists will have little choice but to embrace the Big Data analytics that their rivals are already leveraging.

More specifically, we’re bullish in the short term and long term, but are concerned over the medium term. There’s been a lot of venture funding pouring into this space over the past year for platform players and tools providers. Some players, like Cloudera, have well broken the billion-dollar valuation range. Yet, if you look at the current enterprise paid installed base for Hadoop, conservatively we’re in the 1000 – 2000 range (depending on how you count). Even if these numbers double or triple over the next year, will that be enough to satisfy venture backers? And what about the impacts of Vladimir Putin or Ebola on the economy over the near term?

At Strata we had some interesting conversations with members of the venture community, who indicated that the money pouring in is 10-year money. That’s a lot of faith – but then again, there’s more pain spreading around certain sectors where leaders are taking leaps to analyze torrents of data from new sources. But ingesting the data or pointing an interactive SQL tool (or streaming or search) at it is the easy part. When you’re getting beyond the enterprise data wall garden, you have to wonder if you’re looking at the right data or asking the right questions. In the long run, that will be the gating factor as to how, whether, and when analysis data will become routine in the enterprise. And that’s what we’re going to talk about in Part 2.

We believe that self-service will be essential for enterprises to successfully embrace Big Data. We’ll tell why in our next post.