Strata 2015 post mortem: Does the Hadoop market have a glass ceiling?

The move of this year’s west coast Strata HadoopWorld conference to the same venue as Hadoop Summit gave the event a bit of a mano a mano air: who can throw the bigger, louder party?

But show business dynamics aside, the net takeaway from these events is looking at milestones in the development of the ecosystem. Indeed, the brunt of our time was spent “speed dating” with third party tools and applications that are steadily addressing the white space in the Big Data and Hadoop markets. While our sampling is hardly representative, we saw growth, not only from the usual suspects from the data warehousing world, but also from a growing population of vendors who are aiming to package machine learning algorithms, real-time streaming, more granular data security, along with new domains such as entity analytics. Databricks, inventor of Spark, announced in a keynote a new DataFrames initiative to make it easier for R and Python programmers accustomed to working on laptops to easily commandeer and configure clusters to run their computations using Spark.

Immediately preceding the festivities, the Open Data Platform initiative announced its existence, and Cloudera announced its $100 million 2014 numbers – ground we already covered. After the event, Hortonworks did its first quarterly financial call. Depending on how you count, they did nearly $50 million business last year; but the billings, which signify the pipeline, came in at $87 million. Hortonworks closed an impressive 99 new customers in Q4. There’s little question that Hortonworks has momentum, but right now, so does everybody. We’re at a stage in the market where a rising tide is lifting all boats; even the smallest Hadoop player – Pivotal – grew from token revenues to our estimate of $20 million Hadoop sales last year.

At this point, there’s nowhere for the Hadoop market to go but up, as we estimate that the paid enterprise installed base (at roughly 1200 – 1500) as just a fraction of the potential base. Or in revenues, our estimate of $325 million for 2014 (Hadoop subscriptions and related professional services, but not including third party software or services), up against $10 billion+ for the database market. Given that Hadoop is just a speck compared to the overall database market, what is the realistic addressable market?

Keep in mind that while Hadoop may shift some data warehouse workloads, the real picture is not necessarily a zero sum game, but the commoditization of the core database business. Exhibit One: Oracle’s recent X5 engineered systems announcement designed to meet Cisco UCS at its commodity price point. Yes, there will be some contention, as databases are converging and overlapping, competing for many of the same use cases.

But the likely outcome is that organizations will use more data platforms and grow accustomed to paying more commodity process – whether that is through open source subscriptions or cloud pay-by-the-drink (or both). The value-add increasingly will come from housekeeping tools (e.g., data security; access control and authentication; data lineage and audit for compliance; cluster performance management and optimization; lifecycle and job management; query management and optimization in a heterogeneous environment).

The takeaway here is that the tasks normally associated with the care and feeding of a database, not to mention the governance of data, grow far more complex when superseding traditional enterprise data with Big Data. So the Hadoop subscription business may only grow so far, but that will be just the tip of the iceberg regarding the ultimate addressable market.