The Open Data Platform is and is not like UNIX, Cloudera cracks $100m, and what becomes of Pivotal

How’s that for a mouthful?

It shouldn’t be surprising that the run-up to Strata is full of announcement designed to shape mindsets. And so today, we have a trio of announcements that solve – for now – the issue of whether Pivotal is still in the Hadoop business (or at least with its own distro); verify that Cloudera did make $100m last year; and announce formation of a cross-industry initiative, the Open Data Platform.

First, we’ll get our thoughts on Cloudera and Pivotal out of the way. Cloudera’s announcement didn’t surprise us, we’ve estimated that they were on their way to a $100m year given our estimates of typical $250k deal sizes (outliers go a lot higher than that), a new customer run rate that we pegged at about 50 per quarter, and of course subscription renewals that inflate as customers grow their deployments. In perspective we’re still in a greenfield market where a rising tide is lifting all boats; we estimate that business is also doubling for most of Cloudera’s rivals – but that Cloudera has had an obvious head start.

As to Pivotal, they’ve been the subject of much FUD in the wake of published reports last fall of a layoff of 60 employees on the Big Data side of their business. Word on the street was that Pivotal, the last to enter the Hadoop distribution business, would would be the first to leave – Hortonworks was the logical candidate as Pivotal disclosed last summer that it would replace its Command Center with the Hortonworks-led Ambari project for Hadoop cluster management.

The news is that Pivotal is making a final break from its proprietary technology legacy and open sourcing everything – including the Greenplum database. And yes, Pivotal will OEM support HDP, but it will still offer its own distribution optimized for HAWQ and for integration with its other data engines including the GemFire in-memory database. But this announcement didn’t happen in a vacuum, but in conjunction with another announcement today of the Open Data Platform – of which Pivotal and Hortonworks (along with IBM, and others) are members. We’re frankly puzzled as to why Pivotal would continue offering its own distribution. But we’ll get back to that.

The Open Data Platform is an initiative to that tries to put the toothpaste back into the tube: define, integrate, test, and certify a standard Hadoop core. Once upon a time, Apache Hadoop could be defined by core projects, like what was on the Apache project home page. But since then there have been multiple overlapping and often competing projects regarding running interactive SQL (do we use Hive or bypass it?); cluster management (Ambari or various vendor proprietary management systems); managing security; managing resource (YARN for everything, or just batch jobs, and what about Mesos?), streaming (Storm or Spark Streaming), and so on. When even the core file system HDFS may not be in every distro, the question of what makes Hadoop, Hadoop remains key.

Of course, ODP is not just about defining core Hadoop, but designating, in effect, a stable base on which value-added features or third party software can reliably hook in. It picks up where the Apache community, which simply designates what releases are stable, leave off, by providing a formal certification base. That’s the type of thing that vendor consortia rather than open source communities are best equipped to deliver. For the record, ODP pledges to work alongside Apache.

So far so good, except that this initiative comprises only half the global Hadoop vendor base. This is where the historical analogies with UNIX come in; recall the Open Software Foundation, which was everybody vs. the industry leader Sun? It repeats the dynamic of the community vs. the market leaders – for now, the Cloudera and Amazon customer bases will outnumber ODP committers.

Over time OSF UNIXes remained overshadowed by Solaris, but eventually everybody turned their attention to dealing with Microsoft. After laying down arms, OSF morphed into The Open Group, which refocused on enterprise architecture frameworks and best practices.

The comparison between ODP and OSF is only in the competitive dynamics. Otherwise, UNIX and Hadoop are different creatures. While both are commodity technologies, Hadoop is a destination product that enterprises buy, whereas UNIX (and Linux) are foundational components that are built into the purchase of servers and appliances. Don’t get confused by those who characterize Hadoop as a data operating system, as enterprises are increasingly demanding capabilities like security, manageability, configurability, and recovery that are expected of any data platform that they would buy.

And further, where the narrative differs is that Hadoop, unlike UNIX, lacks a common enemy. Hadoop will exist alongside, not instead of other database platforms as they eventually meld into a fabric where workloads are apportioned. So we don’t necessarily expect history to repeat itself with Open Data Platform. The contribution of ODP will be the expectation of a non-moving target that becomes a consensus, although not an absolutely common one. It’s also the realization that value-add in Hadoop increasingly comes, not from the core, but from the analytics that run on it and the connective glue that the platform provider supplies.

As for Pivotal and what it’s still doing in the Hadoop business, our expectation is that ODP provides the umbrella under which its native distribution converges and becomes a de facto dialect of HDP. We believe that Pivotal’s value-add won’t be in the Hadoop distribution business, but how it integrates GemFire and optimizes implementation for its Cloud Foundry Platform-as-a-Service cloud.

Postcript: No good deed goes unpunished. Here’s Mike Olson’s take.