MongoDB widens its sights

MongoDB has passed several key watershed events over the past year, including a major redesign of its core platform and a strategic shift in its management team. By now, the architectural transition is relatively old news; as we noted last winter, MongoDB 3.0 made the storage engine pluggable. So voila! Just like MySQL before it, Mongo becomes whatever you want it to be. Well eventually, anyway, but today there’s the option of substituting the more write-friendly WiredTiger engine, and in the near future, an in-memory engine now in preview could provide an even faster write-ahead cache to complement the new overcaffeinated tiger. And there are likely other engines to come.

From a platform – and market standpoint – the core theme is Mongo broadening its aim. Initially, it will be through new storage engines that allow Mongo to be whatever you make of it. MongoDB has started the fray with WiredTiger and the new in-memory data store, but with publishing of the API, there are opportunities for other engines to plug in. At MongoDB’s user conference, we saw one such result – the RocksDB engine developed at Facebook for extremely I/O-intensive transactions involving log data. And as we’ve speculated, there’s nothing to stop other storage engines like SQL from plugging in.

Letting a thousand flowers bloom
Analytics is an example where Mongo is spreading its focus. While Mongo and other NoSQL data stores are typically used for operational applications requiring fast reads and/or writes, for operational simplicity, there is also growing demand for in-line analytics. Why move data to a separate data warehouse data mart or Hadoop if it can be avoided? And why not embed some analytics with your operational applications? This is hardly an outlier – a key selling point for the latest generations of Oracle and SAP applications are the ability to embed analytics with transaction processing. Analytics evolves from after-the-fact to an inline process that is part of processing a transaction. Any real-time customer facing or operational process is ripe for analytics that can prompt inline decisions for providing next-best offers or tweaking the operation of an industrial process, supply chain, or the delivery of a service. And so a growing number of MongoDB deployments are adding analytics to the mix.

It’s almost a no-brainer for SQL BI tools to target JSON data per se because the data has a structure. (Admittedly, this is assuming the data is relatively clean, which in many cases is not a given.) But by nature, JSN has a more complex and potentially richer structure than SQL tables in the degree that the data is nested. Yet most SQL tools do away with the nesting and hierarchies that are stored in JSON documents, “flattening’ the structure into a single column.

We’ve always wondered when analytic tools would wake up to the potential of querying JSON natively – at least, not flattening the structure, but incorporating that information when processing the query. The upcoming MongoDB 3.2 release will add a new connector to BI and visualization tools that will push down analytic processing into MongoDB, rather than require data to be extracted first to populate an external data mart of data warehouse for the analytic tool to target. But this enhancement is not as much about enriching the query with information pertaining to the JSON schema; it’s more about efficiency, eliminating data transport.

But some emerging startups are looking to address that JSON native query gap. jSonar demonstrated SonarW, a data warehouse engine that plugs into the Mongo API that has a columnar format, with a key difference. It provides metadata that provides a logical representation of the nested and hierarchical relationships. We saw a reporting tool from Slamdata that applies similar context to the data, based on patent-pending algorithms that apply relational algebra to slicing, dicing, and aggregating deeply nested data.

Who says JSON data has to be dirty?
While a key advantage of NoSQL data stores, like Mongo, is that you don’t have to worry about applying strict schema or validation (e.g., ensuring that the database isn’t sparse and that the data in the fields is not gibberish). But there’s nothing inherent to JSON that rules out validation and robust data typing. MongoDB will be introducing a tool supporting schema validation for those use cases that demand it, plus a tool for visualizing the schema to provide a rough indication of unique fields and unique data (e.g., cardinality) within these fields. While maybe not a full-blown data profiling capability, it is a start.

Breaking the glass ceiling
The script for MongoDB has been familiar up ‘til now. The entrepreneurial startup whose product has grown popular through grassroots appeal. The natural trajectory for MongoDB is to start engaging the C- level and the business, who write larger checks. A decade ago, MySQL played this role. It was kinda of an Oracle or SQL Server Lite that was less complex than its enterprise cousins. That’s been very much MongoDB’s appeal. But with making the platform more extensible, MongoDB creates a technology path to grow up. Can the business grow with it?

Ove the past year MongoDB’s upper management team has largely been replaced; the CEO, CMO, and head of sales are new. It’s the classic story of startup visionaries, followed by those experienced at building the business. President and CEO Dev Ittycheria, most recently from the venture community, previously took BladeLogic public before eventually selling to BMC for $900 million in 2008. Its heads of sales and marketing come from similar backgrounds and long track records. While MongoDB is clearly not sloughing off on product development, it is plowing much of its capitalization into building out the go-to-market.

The key challenge facing Mongo, and all the new data platform players, is where (or whether) they will break the proverbial glass ceiling. There are several perspectives to this challenge. For open source players like MongoDB, it is determining where the value-add lies. It’s a moving target; while traditionally, functions that make a data store enterprise grade such as data governance, management, and security were traditionally unique to the vendor and platform, open source is eating away at it. Just look at the Hadoop world where there’s Ambari, while Cloudera and IBM offer their own either as core or optional replacement. So this dilemma is hardly unique to MongoDB. Our take is that lowest common denominator cannot be applied to governance, security, or management, but it will become a case where platform players, like MongoDB, must branch out and offer related value-add such as optimizations for cloud deployment, information lifecycle management, and so on.

Such a strategy of broadening the value-add grows even more important given market expectations for pricing; in essence, coping with the I’m not going to pay a lot for this muffler syndrome. The expectation with open source and other emerging platforms is that enterprises are not willing, or lack the budget, for paying the types of licenses customary with established databases and data warehouse systems. Yes, the land and expand value is critical for the likes of MongoDB, Cloudera, Hortonworks and others for growing revenues. They may not replace the Oracles or Microsoft of the world, but they are angling to be the favorite for new generation applications supplementing what’s already on the back end (e.g., customer experience, enhancing and working alongside classical CRM).

Land and expand into the enterprise, and broadening from data platform to data management are familiar scripts. Even in an open source, commodity platform world, these scripts will remain as important as ever for MongoDB.