The Second Wave of Analytics

Throughout the recession, business intelligence (BI) was one of the few growth markets in IT. Given that transactional systems that report “what” is happening are simply the price of entry for remaining in a market, BI and analytics systems answer the question of “why” something is happening, and ideally, provide intelligence that is actionable so you can know ‘how’ to respond. Not surprisingly, understanding the whys and hows are essential for maximizing the top line in growing markets, and pinpointing the path to survival in down markets. The latter reason is why BI has remained one of the few growth areas in the IT and business applications space through the recession.

Analytic databases are cool again. Teradata, the analytic database provider with a 30-year track record, had its strongest Q2 in what was otherwise a lousy 2010 for most IT vendors. Over the past year, IBM, SAP, and EMC took major acquisitions in this space, while some of the loudest decibels at this year’s Oracle OpenWorld were over the Exadata optimized database machine. There are a number of upstarts with significant venture funding, ranging from Vertica to Cloudera, Aster Data, ParAccel and others that are not only charting solid growth, but the range of varied approaches that reveal that the market is far from mature and that there remains plenty of demand for innovation.

We are seeing today a second wave of innovation in BI and analytics that matches the ferment and intensity of the 1995-96 era when data warehousing and analytic reporting went commercial. There isn’t any one thing that is driving BI innovation. At one end of the spectrum, you have Big Data, and at the other end, Fast Data — the actualization of real-time business intelligence. Advances in commodity hardware, memory density, parallel programming models, and emergence of NoSQL, open source statistical programming languages, cloud are bringing this all within reach. There is more and more data everywhere that’s begging to be sliced, diced and analyzed.

The amount of data being generated is mushrooming, but much of it will not necessarily be persisted to storage. For instance, if you’re a power company that wants to institute a smart grid, moving from monthly to daily meter reads multiplies your data volumes by a factor of 30, and if you decide to take readings every 15 minutes, better multiple all that again by a factor of 100. Much of this data will be consumed as events. Even if any of it is persisted, traditional relational models won’t handle the load. The issue is not only because of overhead of operating all the iron, but with it the concurrent need for additional equipment, space, HVAC, and power.

Unlike the past, when the biggest databases were maintained inside the walls of research institutions, public sector agencies, or within large telcos or banks, today many of the largest data stores on the Internet are getting opened through APIs, such as from Facebook. Big databases are no longer restricted to use by big companies.

Compare that to the 1995-96 time period when relational databases, which made enterprise data accessible, reached critical mass adoption; rich Windows clients, which put powerful apps on the desktop, became enterprise standard; while new approaches to optimizing data storage and productizing the kind of enterprise reporting pioneered by Information Builders, emerged. And with it all came the debates OLAP (or MOLAP) vs ROLAP, star vs. snowflake schema, and ad hoc vs. standard reporting. Ever since, BI has become ingrained with enterprise applications, as reflected by recent consolidations with the acquisitions of Cognos, Business Objects, and Hyperion by IBM, SAP, and Oracle. How much more establishment can you get?

What’s old is new again. When SQL relational databases emerged in the 1980s, conventional wisdom was that the need for indexes and related features would limit their ability to perform or scale to support enterprise transactional systems. Moore’s Law and emergence of client/server helped make mockery of that argument until the web, proliferation of XML data, smart sensory devices, and realization that unstructured data contained valuable morsels of market and process intelligence, in turn made mockery of the argument that relational was the enterprise database end-state.

In-memory databases are nothing new either, but the same hardware commoditization trends that helped mainstream SQL has also brought costs of these engines down to earth.

What’s interesting is that there is no single source or style of innovation. Just as 1995 proved a year of discovery and debate over new concepts, today you are seeing a proliferation of approaches ranging from different strategies for massively-parallel, shared-nothing architectures; columnar databases; massive networked and hierarchical file systems; and SQL vs. programmatic approaches. It is not simply SQL vs. a single post-SQL model, but variations that mix and match SQL-like programming with various approaches to parallelism, data compression, and use of memory. And don’t forget the application of analytic models to complex event processes for identifying key patterns in long-running events or coming through streaming data that is arriving in torrents too fast and large to ever consider putting into persistent storage.

This time, much of the innovation is coming from the open source world as evidenced by projects like the Java-based distributed computing platform Hadoop developed by Google; MapReduce parallel programming model developed by Google; the HIVE project that makes MapReduce look like SQL; the R statistical programming language. Google has added fuel to the fire by releasing to developers its BigQuery and Prediction API for analyzing large sets of data and applying predictive algorithms.

These are not simply technology innovations looking for problems, as use cases for Big Data or real-time analysis are mushrooming. Want to extend your analytics from structured data to blogs, emails, instant messaging, wikis, or sensory data? Want to convene the world’s largest focus group? There’s sentiment analysis to be conducted from Facebook; trending topics for Wikipedia; power distribution optimization for smart grids; or predictive analytics for use cases such as real-time inventory analysis for retail chains, or strategic workforce planning, and so on.

Adding icing to the cake was an excellent talk at a New York Technology Council meeting by Merv Adrian, a 30-year veteran of the data management field (who will soon be joining Gartner) who outlined the content of a comprehensive multi-client study on analytic databases that can be downloaded free from Bitpipe.

Adrian speaks of a generational disruption occurring to the database market that is attacking new forms of age old problems: how to deal with expanding datasets while maintaining decent performance. as mundane as that. But the explosion of data coupled with commoditization of hardware and increasing bandwidth have exacerbated matters to the point where we can no longer apply the brute force approach to tweaking relational architectures. “Most of what we’re doing is figuring out how to deal with the inadequacies of existing systems,” he said, adding that the market and state of knowledge has not yet matured to the point where we’re thinking about how the data management scheme should look logically.

So it’s not surprising that competition has opened wide for new approaches to solving the Big and Fast Data challenges; the market has not yet matured to the point where there are one or a handful of consensus approaches around which to build a critical mass practitioner base. But when Adrian describes the spate of vendor acquisitions over the past year, it’s just a hint of things to come.

Watch this space.

One thought on “The Second Wave of Analytics”

  1. Pingback: Quora

Comments are closed.