MongoDB grows up

One could say that MongoDB has been at the right place at the right time. When web developers demanded a fast, read-intensive store of complex variably-structured data, the company formerly known as 10Gen came up with a simple engine backed by intuitive developer-friendly tooling. It grew incredibly popular for applications like product catalogs, tracking hierarchical events (like chat strings with responses), and some forms of web content management.

In a sense, MongoDB and JSON became the moral equivalents of MySQL and the LAMP stack, which were popular with web developers who needed an easy-to-deploy transactional SQL database sans all the overhead of an Oracle.

Some things changed. Over the past decade, Internet developers expanded from web to also include mobile developers. And the need for databases has now extended to variably structured data. Enter JSON. It provided that long-elusive answer to providing a simple operational database with an object-like representation of the world without the associated baggage (e.g., polymorphism, inheritance), using a language (JavaScript) and data structure that was already lingua franca with web developers.

Like MySQL, Mongo was known for its simplicity. It had a simple data model, a query framework that was easy for developers to use, and well-developed indexing that made reads very fast. It’s been cited by db-Engines as the fourth most popular database among practitioners.

And like MySQL, MongoDB was not known for its ability to scale (just ask Cassandra fans). For MySQL, a Berkeley company, Sleepycat Software, InnoDB, developed an engine that provided a heart transplant that could turn MySQL into a serious database.

Fast forward, and some alumni from Sleepycat Software (which developed BerkelyDB, later bought by Oracle) founded WiredTiger, ginning out an engine that could add similar scale to Mongo. WiredTiger offers a more write-friendly engine that aggressively takes advantage of compression (that is configurable) to scale and deliver high performance. And it provides a much more granular and configurable approach to locking that could alleviate much of those write bottlenecks that plagued Mongo.

History took interesting paths. Oracle bought Sleepycat and later inherited MySQL via the Sun acquisition. And last fall, MongoDB bought WiredTiger.

Which brings us to MongoDB 2.8 3.0. It’s no mystery (except for the dot zero release number ) that the WiredTiger engine would end up in Mongo as their integration was destiny. Also not surprising is that the original MongoDB MMAP engine lives on. There is a huge installed base, and for existing read-heavy applications, it works perfectly well for a wide spectrum of use cases (e.g., recommendation engines). The new release makes the storage engine pluggable via a public API.

We’ve been down this road before; today MySQL has almost a dozen storage engines. Starting off the gate, MongoDB will have the two supported by the company: classic MMAP or the industrial-strength WiredTiger engine. Then there’s also an “experimental” in-memory engine that’s part of this release. And off in the future, there’s no reason why HDFS, cloud-based object storage, or or even SQL engines couldn’t follow.

The significance with the 3.0 release is that the MongoDB architecture becomes an extensible family. And in fact, this is quite consistent with trends that we at Ovum have been seeing with other data platforms, that are all overlapping and taking on multiple persona. That doesn’t mean that every database will become the same, but that each will have its area of strength, but also be able to take on boundary cases. For instance:
• Hadoop platforms have been competing on adding interactive SQL;
• SQL databases have been adding the ability to query JSON data; and
• MongoDB is now adding the fast, scalable write capabilities associated with rival NoSQL engines like Cassandra or Couchbase, reducing the performance gap with key-value stores.

Database convergence or overlap doesn’t mean that you’ll suddenly use Hadoop to replace your data warehouse, or MongoDB instead of your OLTP SQL database. And if you really need fast write performance, key-value stores will probably remain your first choice. Instead, view these as extended capabilities that allow you to handle a greater variety of use cases, data types, and queries off the same platform with familiar development, query, and administration tools.

Back to MongoDB 3.0, there are a few other key enhancements with this release. Concurrency control (the source of those annoying write locks with the original MMAP engine) becomes more granular in this release. Instead of having to lock the entire database for consistent writes, locks can now be confined to a specific collection (the MongoDB equivalent of table) level, reducing an annoying bottleneck. Meanwhile, WiredTiger adds schema validation and more granular memory management to further improve write performance. Eventually, WiredTiger might even bring schema validation to Mongo.

We don’t view this release as being about existing MongoDB customers migrating to the new engine; yes, the new engine will support the same tools, but it will require a one-time reload of the database. Instead, we view this as expanding MongoDB’s addressable market, with the obvious target being key-value stores like Cassandra, BerkeleyDB (now commercially available as Oracle NoSQL Database), or Amazon DynamoDB. It’s just like other data platforms are doing by adding on their share of capability overlaps.