Vertica Turns Analytic Databases on Their Side PDF  | Print |  E-mail
Monday, 24 March 2008

One of the early debates in the BI community has been whether conventional relational databases can be efficiently repurposed from transaction systems to analytics. Vertica has recently gone public with the first new idea since the emergence of OLAP and relational/OLAP databases a dozen years ago: turn relational databases on their side, orienting them by columns rather than rows.

Vertica’s technology responds to a well-known fact about the difference between analytics and transactional systems. Whereas in transaction systems, you are interested in specific records or groups of records, with analytics, you are looking for trends. So Vertica’s assumption is that you’re looking for columns, which are fields, rather than rows, which are individual records. They contend that conventional SQL database queries run against row-based tables waste lots of time and processing power because you scan through numerous irrelevant columns. 

Of course, OLAP databases emerged a dozen years ago to deal with the same problem. With OLAP, data was aggregated and transformed from rows into dimensions, or views of selected aggregations of rows and columns. The problem is that OLAP scalability – although it has increased over the years – could never achieve parity with standard relational databases. Another drawback was that OLAP forced you to prepackage views, and therefore tended to discourage ad hoc querying.

Vertica’s column-based approach essentially eliminates the scaling issue, not only because it preserves relational table structures and can take advantage of familiar data striping techniques well-known by DBAs, but also because it can compress column values. For instance, if there is a true/false column for a database of 30 million rows, Vertica can collapse the column to two rows, each of which contains the value for the number of rows that are true and false, respectively. Obviously the same goes for other kinds of known variables, like dates, US states, and so on.

The result is that the company claims a 50 – 200x performance boost in query response compared to standard relational databases. And it does so with a database architecture that can still be accessed by standard SQL queries.

The company was co-founded by database pioneer Michael Stonebraker, who has had his hand in almost every kind of modern database, from relational to object, and streaming (complex event analysis). Having helped found Ingres, he later developed its open source offshoot Postgres, and in the past decade, founded Illustra before it was acquired by Informix (for trivia fans, the name has since been appropriated by a software company that generates illustrations for architectural firms). He co-founded Vertica after becoming bored at Streambase.

Although the company shipped its first product in late 2006, it didn't publicize matters until the 2.0 release back in February. This week, it announced a partnership with LogiXML, a web-based BI reporting tool, which is basing product development on the Vertica engine.





Reddit!Del.icio.us!Facebook!Slashdot!Netscape!Technorati!StumbleUpon!Newsvine!Furl!Yahoo!Ma.gnolia!Free social bookmarking plugins and extensions for Joomla! websites! title=
 
< Prev   Next >