Further proof that Hadoop competition is going up the stack toward areas such as packaged analytics, security, and data management and integration can be seen from Hortonworks’ latest series of announcements today – refresh of the Hortonworks Data Platform with Ambari 2.0 and the acquisition of cloud deployment automation tool SequenceIQ.
Specifically, Ambari 2.0 provides much of the automation previously missing, such as automating rolling updates, restarts, Kerberos authentications, alerting and health checks, and so on. Until now, automation of deployment, monitoring and alerting, rot cause diagnosis, and authentications was a key differentiator for Cloudera Manager. While Hadoop systems management may not be a done deal (e.g., updating to major new dot zero releases is not yet a lights-out operation), the basic blocking and tackling is no longer a differentiator; any platform should have these capabilities. The recent debut of the Open Data Platform – where IBM and Pivotal are leveraging the core Hortonworks platform as the starting point for their Hadoop distributions – is further evidence. Ambari is the cornerstone of all implementations, although IBM will still offer a more “premium” value-add with options such as Platform Symphony and Adaptive MapReduce.
Likewise, Hortonworks’ acquisition of SequenceIQ is a similar move to even the score with Cloudera Director. Both handle automation of cloud deployment with policy-based elastic scaling (e.g., when to provision or kill compute nodes). The comparison may not yet be apples-to-apples; for instance, Cloudera Director has been a part of the Cloudera enterprise platform (the paid edition) since last fall, whereas the ink is just drying on the Hortonworks acquisition of SequenceIQ. And, while SequenceIQ’s product, Cloudbreak, is cloud infrastructure-agnostic but Cloudera Director right now only supports Amazon, that too will change.
More to the point is where competition is heading – we believe that it is heading from the core platform higher up the value chain to analytic capabilities and all forms of data management – stewardship, governance, and integration. In short, it’s a page out of the playbook of established data warehousing platforms that have had to provide value-add that could be embedded inside the database. Just take a look at Cloudera’s latest announcements: acquisition of Xplain and a strategic investment in Cask. Xplain automates the design, integration, and optimization of data models to reduce or eliminate hurdles to conducting self-service analytics on Hadoop. Cask on the other hand provides hooks for developers to integrate applications with Hadoop – the third way that until now has been overlooked.
As Hadoop graduates from specialized platform for complex, data science computing to an enterprise data lake, the blocking and tackling functions – e.g., systems management and housekeeping – becomes checklist items. What’s more important is how to manage data, make data and analytics more accessible beyond data scientists and statistical programming experts, and providing the security that is expected of any enterprise-grade platform.