Category Archives: Virtualization

It’s happening: Hadoop and SQL worlds are converging

With Strata, IBM IOD, and Teradata Partners conferences all occurring this week, it’s not surprising that this is a big week for Hadoop-related announcements. The common thread of announcements is essentially, “We know that Hadoop is not known for performance, but we’re getting better at it, and we’re going to make it look more like SQL.” In essence, Hadoop and SQL worlds are converging, and you’re going to be able to perform interactive BI analytics on it.

The opportunity and challenge of Big Data from new platforms such as Hadoop is that it opens a new range of analytics. On one hand, Big Data analytics have updated and revived programmatic access to data, which happened to be the norm prior to the advent of SQL. There are plenty of scenarios where taking programmatic approaches are far more efficient, such as dealing with time series data or graph analysis to map many-to-many relationships. It also leverages in-memory data grids such as Oracle Coherence, IBM WebSphere eXtreme Scale, GigaSpaces and others, and, where programmatic development (usually in Java) proved more efficient for accessing highly changeable data for web applications where traditional paths to the database would have been I/O-constrained. Conversely Advanced SQL platforms such as Greenplum and Teradata Aster have provided support for MapReduce-like programming because, even with structured data, sometimes using a Java programmatic framework is a more efficient way to rapidly slice through volumes of data.

Until now, Hadoop has not until now been for the SQL-minded. The initial path was, find someone to do data exploration inside Hadoop, but once you’re ready to do repeatable analysis, ETL (or ELT) it into a SQL data warehouse. That’s been the pattern with Oracle Big Data Appliance (use Oracle loader and data integration tools), and most Advanced SQL platforms; most data integration tools provide Hadoop connectors that spawn their own MapReduce programs to ferry data out of Hadoop. Some integration tool providers, like Informatica, offer tools to automate parsing of Hadoop data. Teradata Aster and Hortonworks have been talking up the potentials of HCatalog, actuality an enhanced version of Hive with RESTful interfaces, cost optimizers, and so on, to provide a more SQL friendly view of data residing inside Hadoop.

But when you talk analytics, you can’t simply write off the legions of SQL developers that populate enterprise IT shops. And beneath the veneer of chaos, there is an implicit order to most so-called “unstructured” data that is within the reach programmatic transformation approaches that in the long run could likely be automated or packaged inside a tool.

At Ovum, we have long believed that for Big Data to crossover to the mainstream enterprise, that it must become a first-class citizen with IT and the data center. The early pattern of skunk works projects, led by elite, highly specialized teams of software engineers from Internet firms to solve Internet-style problems (e.g., ad placement, search optimization, customer online experience, etc.) are not the problems of mainstream enterprises. And neither is the model of recruiting high-priced talent to work exclusively on Hadoop sustainable for most organizations; such staffing models are not sustainable for mainstream enterprises. It means that Big Data must be consumable by the mainstream of SQL developers.

Making Hadoop more SQL-like is hardly new
Hive and Pig became Apache Hadoop projects because of the need for SQL-like metadata management and data transformation languages, respectively; HBase emerged because of the need for a table store to provide a more interactive face – although as a very sparse, rudimentary column store, does not provide the efficiency of an optimized SQL database (or the extreme performance of some columnar variants). Sqoop in turn provides a way to pipeline SQL data into Hadoop, a use case that will grow more common as organizations look to Hadoop to provide scalable and cheaper storage than commercial SQL. While these Hadoop subprojects that did not exactly make Hadoop look like SQL, they provided building blocks from which many of this week’s announcements leverage.

Progress marches on
One train of thought is that if Hadoop can look more like a SQL database, more operations could be performed inside Hadoop. That’s the theme behind Informatica’s long-awaited enhancement of its PowerCenter transformation tool to work natively inside Hadoop. Until now, PowerCenter could extract data from Hadoop, but the extracts would have to be moved to a staging server where the transformation would be performed for loading to the familiar SQL data warehouse target. The new offering, PowerCenter Big Data Edition, now supports an ELT pattern that uses the power of MapReduce processes inside Hadoop to perform transformations. The significance is that PowerCenter users now have a choice: load the transformed data to HBase, or continue loading to SQL.

There is growing support for packaging Hadoop inside a common hardware appliance with Advanced SQL. EMC Greenplum was the first out of gate with DCA (Data Computing Appliance) that bundles its own distribution of Apache Hadoop (not to be confused with Greenplum MR, a software only product that is accompanied by a MapR Hadoop distro). Teradata Aster has just joined the fray with Big Analytics Appliance, bundling the Hortonworks Data Platform Hadoop; this move was hardly surprising given their growing partnership around HCatalog, an enhancement of the SQL-like Hive metadata layer of Hadoop that adds features such as a cost optimizer and RESTful interfaces that make the metadata accessible without the need to learn MapReduce or Java. With HCatalog, data inside Hadoop looks like another Aster data table.

Not coincidentally, there is a growing array of analytic tools that are designed to execute natively inside Hadoop. For now they are from emerging players like Datameer (providing a spreadsheet-like metaphor; which just announced an app store-like marketplace for developers), Karmasphere (providing an application develop tool for Hadoop analytic apps), or a more recent entry, Platfora (which caches subsets of Hadoop data in memory with an optimized, high performance fractal index).

Yet, even with Hadoop analytic tooling, there will still be a desire to disguise Hadoop as a SQL data store, and not just for data mapping purposes. Hadapt has been promoting a variant where it squeezes SQL tables inside HDFS file structures – not exactly a no-brainer as it must shoehorn tables into a file system with arbitrary data block sizes. Hadapt’s approach sounds like the converse of object-relational stores, but in this case, it is dealing with a physical rather than a logical impedance mismatch.

Hadapt promotes the ability to query Hadoop directly using SQL. Now, so does Cloudera. It has just announced Impala, a SQL-based alternative to MapReduce for querying the SQL-like Hive metadata store, supporting most but not all forms of SQL processing (based on SQL 92; Impala lacks triggers, which Cloudera deems low priority). Both Impala and MapReduce rely on parallel processing, but that’s where the similarity ends. MapReduce is a blunt instrument, requiring Java or other programming languages; it splits a job into multiple, concurrently, pipelined tasks where, at each step along the way, reads data, processes it, and writes it back to disk and then passes it to the next task. Conversely, Impala takes a shared nothing, MPP approach to processing SQL jobs against Hive; using HDFS, Cloudera claims roughly 4x performance against MapReduce; if the data is in HBase, Cloudera claims performance multiples up to a factor of 30. For now, Impala only supports row-based views, but with columnar (on Cloudera’s roadmap), performance could double. Cloudera plans to release a real-time query (RTQ) offering that, in effect, is a commercially supported version of Impala.

By contrast, Teradata Aster and Hortonworks promote a SQL MapReduce approach that leverages HCatalog, an incubating Apache project that is a superset of Hive that Cloudera does not currently include in its roadmap. For now, Cloudera claims bragging rights for performance with Impala; over time, Teradata Aster will promote the manageability of its single appliance, and with the appliance has the opportunity to counter with hardware optimization.

The road to SQL/programmatic convergence
Either way – and this is of interest only to purists – any SQL extension to Hadoop will be outside the Hadoop project. But again, that’s an argument for purists. What’s more important to enterprises is getting the right tool for the job – whether it is the flexibility of SQL or raw power of programmatic approaches.

SQL convergence is the next major battleground for Hadoop. Cloudera is for now shunning HCatalog, an approach backed by Hortonworks and partner Teradata Aster. The open question is whether Hortonworks can instigate a stampede of third parties to overcome Cloudera’s resistance. It appears that beyond Hive, the SQL face of Hadoop will become a vendor-differentiated layer.

Part of conversion will involve a mix of cross-training and tooling automation. Savvy SQL developers will cross train to pick up some of the Java- or Java-like programmatic frameworks that will be emerging. Tooling will help lower the bar, reducing the degree of specialized skills necessary. And for programming frameworks, in the long run, MapReduce won’t be the only game in town. It will always be useful for large-scale jobs requiring brute force, parallel, sequential processing. But the emerging YARN framework, which deconstructs MapReduce to generalize the resource management function, will provide the management umbrella for ensuring that different frameworks don’t crash into one another by trying to grab the same resources. But YARN is not yet ready for primetime – for now it only supports the batch job pattern of MapReduce. And that means that YARN is not yet ready for Impala or vice versa.

Of course, mainstreaming Hadoop – and Big Data platforms in general – is more than just a matter of making it all look like SQL. Big Data platforms must be manageable and operable by the people who are already in IT; they will need some new skills and grow accustomed to some new practices (like exploratory analytics), but the new platforms must also look and act familiar enough. Not all announcements this week were about SQL; for instance, MapR is throwing a gauntlet to the Apache usual suspects by extending its management umbrella beyond the proprietary NFS-compatible file system that is its core IP to the MapReduce framework and HBase, making a similar promise of high performance. On the horizon, EMC Isilon and NetApp are proposing alternatives promising a more efficient file system but at the “cost” of separating the storage from the analytic processing. And at some point, the Hadoop vendor community will have to come to grips with capacity utilization issues, because in the mainstream enterprise world, no CFO will approve the purchase of large clusters or grids that get only 10 – 15% utilization. Keep an eye on VMware’s Project Serengeti.

They must be good citizens in data centers that need to maximize resource (e.g., virtualization, optimized storage); must comply with existing data stewardship policies and practices; and must fully support existing enterprise data and platform security practices. These are all topics for another day.

SpringSource’s surround ’em on every cloud strategy

We’re trying to stifle the puns with SpringSource’s announcement that now it’s also become the preferred Java development platform for the Google Apps Engine. Like SpringSource on a roll… OK that’s out of our system.

But coming atop the recent announcements of VMforce, along with key acquisitions of Gemstone, and to a lesser extent, RabbitMQ, we’d have to agree with VMware’s CTO Steve Herrod that VMware’s acquisition of SpringSource has not slowed the company down. Congrats to SpringSource’s Rod Johnson for keeping the momentum going under VMware’s watch, and hats off to VMware for making it all happen.

Short but sweet (we’re behind with report deadlines for our day job), SpringSource’s cloud strategy is to become as ubiquitous as possible. Grab every potential Java PaaS platform in sight, do end-arounds with IBM and Oracle who have barely placed their feet inside the door for Java development platforms as a service. Their move reminds us of Duane Reade, the well-known Manhattan pharmacy chain whose long-time strategy was to saturate every street corner location to crowd out rivals like CVS and Walgreens out of the market; as a desperation maneuver, Walgreen finally bit the bullet and snapped up Duane Reade, but in deference to its New York brand recognition, kept the name.

But given that Google app Engine is not exactly a mainstream enterprise platform (Google still struggles to understand the enterprise), for SpringSource the announcement carries more light than heat. The move nonetheless brings a halo effect to Google’s Apps Engine, which becomes more extensible with the Spring framework and with cool extras like Spring Roo, which eliminates a lot of coding legwork and is a good match for Google’s Web Toolkit, which provides a warmer, fuzzier, but more importantly simpler way to piece together web apps. More importantly, it means you can now write and run something meaningful on Google App Engine without having to rely on Python. It provides clever potential upside for Google’s newly announced Apps Engine for Business.

SpringSource’s strategy is an end-around, not only to IBM and Oracle, but also to VMware itself. The latest announcement vindicates in part VMware’s strategy for SpringSource, which we believe has been about building the de facto standard Java cloud platform. While we give hats off toe VMware for accelerating SpringSource’s expansion of its middleware stack and cloud strategy, VMware has been slower to leverage SpringSource internally, whether it be with:
1. Promotion of vCloud. That’s remains more a future bet for leapfrogging VMware past the increasingly commoditized hypervisor business, leveraging its market-leading virtualization management technologies to establish them as de facto standards for managing virtualization in the cloud.
2. Cross-fertilizing SpringSource’s dependency injection capabilities into virtualization, with the idea of simplifying virtualization in the same way that the original Spring framework simplified Java deployment.

SpringSource buying Gemstone: VMware’s written all over it

There they go again. Barely a month after announcing the acquisition of message broker Rabbit Technologies, SpringSource is adding yet one more piece to its middleware stack: it has announced the acquisition of Gemstone for its distributed data caching technology.

SpringSource’s Rod Johnson told us that he was planning to acquire such a technology even before VMware came into the picture, but make no mistake about it, VMware’s presence upped the ante.

SpringSource has been looking to fill out its stack vs. Oracle and IBM ever since its cornerstone acquisition of Covalent (which brought the expertise behind Apache Tomcat and bequeathed the world tc Server) two years ago. Adding Gemstone’s Gemfire becomes SpringSource’s response to Oracle Coherence and IBM WebSphere XD. The technologies in question allow you to replicate data from varied sources into a single logical cache, which is critical if those sources are highly dispersed.

So what about VMware? Wasn’t SpringSource planning to grow its stack anyway? There are deeper stakes at play: VMware’s aspiration to make cloud and virtualization virtually synonymous – or at least to make virtualization essential to the cloud – falls apart if you don’t have a scalable, high performance way to manage and access data. Enterprises using the cloud are not likely to move all their data there, and need a solution that allows hybrid strategies that will invariably involve a mix of cloud- and on premised-based data resources to be managed and accessed efficiently. Distributed data caching is essential.

So the next question is why SpringSource, as a historically open source company that has always made open source acquisitions, buy open source Terracotta instead? Chances are, were SpringSource still independent, it probably would have, but VMware brings deeper pockets and deeper aspirations. Gemstone is the company that sold object-oriented databases back in the 90s, and once it grew obvious that they (and other OODBMS rivals like Object Store) weren’t going to become the next Oracles, they adapted their expertise to caching. Gemfire emerged in 2002 and provided Wall Street and defense agencies an off the shelf alternative to homegrown development or a best of breed strategy. By comparison, although Terracotta boasts several Wall Street clients, its core base is in web caching for high traffic B2C oriented websites.

Bottom line: VMware needs the scale.

There are other interesting pieces that Gemstone brings to the party. It is currently developing SQLFabric, a project that embeds the Apache Derby open source relational database into Gemfire to make its distributed data grid fully SQL-compliant, which would be very strategic to VMware and SpringSource. It also has a shot-in-the-dark project, MagLev, which is more a curiosity for the mother ship. Conceivably it could provide the impetus for SpringSource to extend to the Ruby environment, but would require a lot more development work to productize.

Obviously as the deal won’t close immediately, both entities must be coy about their plans other than the obvious commitment to integrate products.

But there’s another angle that will be worth exploring once the ink dries: SpringSource has been known for simplicity. The Spring framework provided a way to abstract all the complexity out of Java EE, while tc Server, based on Tomcat, carries but a subset of the bells and whistles of full Java EE stacks. But Gemfire is hardly simple, and the market for distributed data grids has been limited to organizations with extreme processing needs who have extreme expertise and extreme budgets. Yet the move to cloud will mean, as noted above, that the need for logical data grids will trickle down to more of the enterprise mainstream, although the scope of the problem won’t be as extreme. It would make sense for the Spring framework to extend its dependency injection to a “lite” version of Gemfire (Gemcloud?) to simplify the hassle of managing data inside and outside of the cloud.

VMforce: Marriage of Necessity

Go to any vendor conference and it gets hard to avoid what has become “The Obligatory Cloud Presentation” or “Slide.” It’s beyond this discussion to discuss hype vs. reality, but potential benefits like the elasticity of the cloud have made the idea too difficult to dismiss, even if most large enterprises remain wary of trusting the brunt of their mission systems to some external hoster, SAS 70 certification or otherwise.

So it’s not surprising that cloud has become a strategic objective for VMware and SpringSource both before after the acquisition that put both together. VMware was busy forming its vCloud strategy to stay a step ahead of rivals that seek to make VMware’s core virtualization hypervisor business commodity, while SpringSource acquired CloudFoundry to take its expanding Java stack to the cloud as such options were coming available for .NET and emerging web languages and frameworks like Ruby on Rails.

Following last summer’s VMware SpringSource acquisition, the obvious path would have placed SpringSource as the application development stack that would elevate vCloud from raw infrastructure as a service to a full development platform. That remains the goal, but it’s hardly the shortest path to VMware’s goals. At this point, VMware still is getting its arms around the assets that are now under its umbrella with SpringSource. As we speculated last summer, we would see some of the features of the Spring framework itself, such as dependency injection (which abstracts dependencies so developers don’t have to worry about writing all the necessary configuration files) might be applied to managing virtualization. But that’s for another time, another day. VMware’s more pressing need is to make vSphere the de facto standard for managing virtualization and vCloud, the de facto standard for cloud virtualization (actually, if you think about it, it is virtualization squared: OS instances virtualized from hardware, and hardware virtualized form infrastructure).

In turn, Salesforce wants to become the de facto cloud alternative to Google, Microsoft, IBM, and when they get serious, Oracle and SAP. The dilemma is that Salesforce up until now has built its own wall garden. That was fine when you were confining this to CRM and third party AppExchange providers who piggybacked on Salesforce’s own multi-tenanted infrastructure using its own proprietary Force.com environment with its “Java-like” Apex stored procedures language. But at the end of the day, Apex is not going to evolve into anything more than Salesforce.com niche development platform, and Force.com is not about to challenge Microsoft .NET, or Java for that matter.

The challenge is that Salesforce, having made the modern incarnation of remote hosted computing palatable to the enterprise mainstream, now finds itself in a larger fishbowl outgunned in sheer scale by Amazon and Google, and outside the enterprise Java mainstream. Benioff conceded as much at the VMforce launch yesterday, characterizing Java as “the No. 1 developer language in the enterprise.”

So VMforce is the marriage of two suitors that each needed their own leapfrogs: VMware into a ready made with existing brand recognition, and Salesforce for getting access to the wider Java enterprise mainstream.

Apps written using the Spring Java stack will gain access to Force.com services such as search, identity and security, workflow, reporting and analytics, web services integration API, and mobile deployment. But it also means dilution of some features that make Force.com platform what it is; the biggest departure is away from the Apex language stored procedures architecture that runs directly inside the Salesforce.com relational database. Salesforce trades scalability of a unitary architecture for scalability through a virtualized one.

It means that Salesforce morphs into a different creature, and now must decide whom it means to compete with because it’s not just Oracle anymore. Our bets are splitting the difference with Amazon, as other SaaS providers like IBM that don’t want to get weighed down by sunk costs have already done. If Salesforce wants to become the enterprise Java platform-as-a-Service (PaaS) leader, it will have to ramp up capacity, and matching Amazon or Google in a capital investment race is a hopeless proposition.

Do we really need OSGi?

With the coming of Spring (framework, season, take your choice), but more to the point, concurrent announcements of the OSGi Enterprise Edition 4.2 and the Eclipse Gemini and Virgo projects, debate over OSGi has renewed. OSGi has seen great success where it is not seen – as the framework for dispensing Eclipse plug-ins, and as the invisible engine by which most of the household name Java EE servers are now factored.

We’ve always been pretty bullish on what OSGi could do. It allows your server footprint to be truly dynamic – you can deploy and kill runtime components at will without taking the whole mess offline. There’s a potential sustainability appeal to any technology that helps reduce footprint – as less apps mean less server, less power, and less space.

Interestingly, OSGi could provide a lot of the elasticity at the appserver level that virtualization promises for OS images and cloud promises for application deployment. And there’s the rub – OSGi is hardly the only path to keeping your webfarm footprint contained. Significantly, while the goal is the same for each strategy – you only want as much resource as you need – they all take different ways of getting there. OSGi is a developer decision that addresses which application or middleware modules (or functionality) do you actually want running at any time, while virtualization and cloud are largely IT operations decisions pinpointing images and choice of what and how much infrastructure to provision.

Although the goal is common, the choice of strategy differs based on where elasticity is needed; furthermore, these are not all or nothing decisions. Conceivably, if you have a highly variable application that requires, not only different amounts of processing capacity, but different functionality at different times, then OSGi could complement your virtualization and/or cloud strategies. Let’s say you process market feeds, the composition and mix of which changes by time of day and which trading centers are active around the globe. Or your organization is number crunching end of period reports. Those are a couple possibilities.

The problem is in knowledge and awareness. For most IT customers, OSGi is a black box. It’s the way that WebSphere and WebLogic are architected. But that makes a difference to the vendors, not the customers because they don’t know how to provision OSGi bundles and there are no best practices for bundling bundles into bigger pieces that can be identified as tangible modules. There is still a lot of OSGi misinformation and still a lot of debate out there. Of course, while virtualization and cloud are much better known, there’s plenty of hype and debate about cloud, and concerns about unchecked use of virtualization.

So a couple years after OSGi gained critical mass vendor acceptance, there remains a lack of tooling for configuring OSGi servers, not to mention best practices for deploying them. SpringSource, one of the first to develop an OSGi server, has now donated the technology to Eclipse as reference implementation in Gemini, with Virgo becoming the technology development project. SpringSource’s commercial direction is tc Server, which commercializes the tiny Tomcat servlet container; as of March 8, VMware is pushing tc Server through its channels and for the next couple months, is giving away two production CPU licenses and 60 days evaluation support to VMware customers.

SpringSource’s fork in the road symbolizes the existential dilemma facing OSGi: if your goal is to simply reduce your web container footprint, 10-MByte Tomcat containers should do just fine. It scales out quite nicely as well, with LinkedIn serving 40 million web pages daily on Tomcat. So again, we ask, why do we need OSGi?

Give us your answers.

SpringSource: Back to our regularly scheduled program

With the ink not yet dry on VMware’s offer to buy SpringSource, it’s time for SpringSource to get back to its regularly scheduled program. That happened to be SpringSource’s unveiling of the Cloud Foundry developer preview: This was the announcement that SpringSource was going to get out before the program got interrupted by the wheels of finance.

Cloud Foundry, a recent SpringSource acquisition, brings SpringSource’s evolution from niche technology to lightweight stack provider full circle. Just as pre-Red Hat JBoss was considered a light weight alternative to WebSphere and WebLogic, SpringSource is positioning itself as a kinder and gentler alternative to the growing JBoss-Red Hat stack. And that’s where the VMware connection comes into play, but more about that later.

The key of course is that SpringSource rides on the popularity of the Spring framework around which the company was founded. The company claims the Spring framework now shows up in roughly half of all Java installations. Its success is attributable to the way that Spring simplifies deployment to Java EE. But as popular as the Spring framework is, as an open source company, SpringSource monetizes only a fraction of al Spring framework deployments. So over the past few years it has been surrounding the framework with a stack of lightweight technologies that complement it, encompassing the:
• Tomcat servlet container (a lightweight Java server) and the newer DM server that is based on OSGi technology.
• Hyperic as the management stack;
• Groovy and Grails, which provides dynamic scripting that is native to the JVM, and an accompanying framework to make Groovy programming easy; and
• Cloud Foundry, which provided SpringSource the technology to mount its offerings in the cloud.

From a mercenary standpoint, putting all the pieces out in a cloud enables SpringSource to more thoroughly monetize the open source assets that otherwise gain revenue stream through support subscriptions.

But in another sense, you could consider the SpringSource’s Cloud Foundry as the Java equivalent of what Microsoft plans to do with Azure. In both cases, the goal is Platform-as-a-Service offerings based on familiar technology (Java, .NET) that can run in and outside the cloud. Microsoft calls it Software + Services. What both also have in common is that they are still in preview and not likely to go GA until next year.

But beyond the fact that SpringSource’s offering is Java-based, the combination with VMware adds yet a more basic differentiator. While Microsoft Azure is an attempt to preserve the Windows and Microsoft Office franchise, when you add VMware to the mix, the goal on SpringSource’s side is to make the OS irrelevant.

There are other intriguing possibilities with the link to VMware such as the possibility that some of the principles of the Spring framework (e.g., dependency injection, which abstract dependencies so developers don’t have to worry about writing all the necessary configuration files) might be applied to managing virtualization, which untamed, could become quite a beast to manage. And as we mentioned last week in the wake of the VMware announcement, SpringSource could do with some JVM virtualization so that each time you need to stretch the processing of Java objects., that you don’t have to blindly sprawl out another VM container.

Fleshing out the Cloud

VMware’s proposed $362 million acquisition of SpringSource is all about getting serious in competing with Salesforce.com and Google App Engine as the Platform-as-a-Service (PaaS) cloud with the technology that everybody already uses.

This acquisition was a means to an end, pairing two companies that could not be less alike. VMware is a household name, sells software through traditional commercial licenses, and markets to IT operations. SpringSource is a grassroots, open source developer-oriented firm whose business is a cottage industry by comparison. The cloud brought both companies together that each faced complementary limitations on their growth. VMware needed to grow out beyond its hardware virtualization niche if it was to regain its groove, while SpringSource needed to grow up and find deeper pockets to become anything more than a popular niche player.

The fact is that providing a virtualization engine, even if you pad it with management utilities that act like an operating system, is still a raw cloud with little pull unless you go higher up in the stack. Raw clouds have their appeal only to vendors that resell capacity or enterprise large firms with the deep benches of infrastructure expertise to run their own virtual environments. For the rest of us, we need a player that provides a deployment environment, handles the plumbing, that is married to a development environment. That is what Salesforce’s Force.com and Google’s App Engine are all about. VMware’s gambit is in a way very similar to Microsoft’s Software + Services strategy: use the software and platforms that you are already used to, rather than some new environment in a cloud setting. There’s nothing more familiar to large IT environments than VMware’s ESX virtualization engine, and in the Java community, there’s nothing more familiar than the Spring framework which – according to the company – accounts for roughly half of all Java installations.

With roughly $60 million in stock options for SpringSource’s 150-person staff, VMware is intent on keeping the people as it knows nothing about the Java virtualization business. Normally, we’d question a deal like this because the company’s are so dissimilar. But the fact that they are complementary pieces to a PaaS offering gives the combination stickiness.

For instance, VMware’s vSphere’s cloud management environment (in a fit of bravado, VMware calls it a cloud OS) can understand resource consumption of VM containers; with SpringSource, it gets to peer inside the black box and understand why those containers are hogging resource. That provides more flexibility and smarts for optimizing virtualization strategies, and can help cloud customers answer the question: do we need to spin out more VMs, perform some load balancing, or re-apportion all those Spring TC (Tomcat) servlet containers?

The addition of SpringSource also complements VMware’s cloud portfolio in other ways. In his blog about the deal, SpringSource CEO Rod Johnson noted that the idea of pairing VMware’s Lab Manager (that’s the test lab automation piece that VMware picked up through the Akimbi acquisition) proved highly popular with Spring framework customers. In actuality, if you extend Lab manager from simply spinning out images of testbeds to spinning out runtime containers, you would have VMware’s answer to IBM’s recently-introduced WebSphere Cloudburst appliance.

VMware isn’t finished however. The most glaring omission is need for Java object distributed caching to provide yet another alternative to scalability. If you only rely on spinning out more VMs, you get a highly rigid one-dimensional cloud that will not provide the economies of scale and flexibility that clouds are supposed to provide. So we wouldn’t be surprised if GigaSpaces or Terracotta might be next in VMware’s acquisition plans.

Private Cloudburst

To this day we’ve had a hard time getting our arms around just what exactly a private cloud is. More to the point, where does it depart from server consolidation? The common thread is that both cases involve some form of consolidation. But if you look at the definition of cloud, the implication is that what differentiates private cloud from server consolidation is that you’re talking a much greater degree of virtualization. Folks, such as Forrester’s John Rymer, fail to see any difference at all.

The topic is relevant as – since this is IBM Impact conference time, there are product announcements. In this case, the new WebSphere Cloudburst appliance. It manages, stores, and deploys IBM WebSphere Server images to the cloud, providing a way to ramp up virtualized business services with the kinds of dynamic response that cloud is supposed to enable. And since it is targeted for deployment to manage your resources inside the firewall, IBM is positioning this offering as an enabler for business services in the private cloud.

Before we start looking even more clueless than we already are, let’s set a few things straight. There’s no reason that you can’t have virtualization when you consolidate servers; in the long run it makes the most of your limited physical and carbon footprints. Instead, when we talk private clouds, we’re taking virtualization up a few levels. Not just the physical instance of a database or application, or its VM container, but now the actual services it delivers. Or as Joe McKendrick points out, it’s all about service orientation.

In actuality, that’s the mode you operate in when you take advantage of Amazon’s cloud. In their first generation, Amazon published APIs to their back end, but that approach hit a wall given that preserving state over so many concurrent active and dormant connections could never scale. It may be RESTful services, but they are still services that abstract the data services that Amazon provides if you decide to dip into their pool.

But we’ve been pretty skeptical up to now about private cloud – we’ve wondered what really sets it apart from a well-managed server consolidation strategy. And there’s not exactly been a lot of product out there that lets you manage an internal server farm beyond the kind of virtualization that you get with a garden variety hypervisor.

So we agree with Joe that’s it’s all about services. Services venture beyond hypervisor images to abstract the purpose and task that a service performs from how or where it is physically implemented. Consequently, if you take the notion to its logical extent, a private cloud is not simply a virtualized bank of server clusters, but a virtualized collection of services that made available wherever there is space, and if managed properly, as close to the point of consumption as demand and available resources (and the cost of those resources) permits.

In all likelihood, early implementations of IBM’s Cloudburst and anything of the like that comes along will initially be targeted on an identifiable server farm or cluster. In that sense, it makes it only a service abstraction away from what is really just another case of old fashioned server consolidation (paired with IBM’s established z/VM, you could really turn out some throughput if you already have the big iron there). But taken to its more logical extent, a private clouds that deploys service environments wherever there is demand and capacity, freed from the four walls of a single facility, will become the fruition of the idea.

Of course, there’s no free lunch. Private clouds are supposed to eliminate the uncertainty of running highly sensitive workloads outside the firewall. Being inside the firewall will not necessarily make the private cloud more secure than a public one, and by the way, it will not replace the need to implement proper governance and management now that you have more moving parts. That’s hopefully one lesson that SOA – dead or alive – should have taught us by now.

The Network is the Computer

It’s sometimes funny that history takes some strange turns. Back in the 1980s, Sun began building its empire in the workgroup by combining two standards: UNIX boxes with TCP/IP networks built in. Sun’s The Network is the Computer message declared that computing was of little value without the network. Of course, Sun hardly had a lock on the idea, as Bob Metcalfe devised the law stating that the value of the network was exponentially related to the number of nodes connected, and that Digital (DEC) (remember them?) actually scaled out the idea at division level where Sun was elbowing its way into the workgroup.

Funny that DEC was there first but they only got the equation half right – bundling a proprietary OS to a standard networking protocol. Fast forward a decade and Digital was history, Sun was the dot in dot com. Then go a few more years later, as Linux made even a “standard” OS like UNIX look proprietary, Sun suffers DEC’s fate (OK they haven’t been acquired, yet and still have cash reserves, if they could only figure out what to do when they finally grow up), and bandwidth, blades get commodity enough that businesses start thinking that the cloud might be a cheaper, more flexible alternative to the data center. Throw in a very wicked recession and companies are starting to think that the numbers around the cloud – cheap bandwidth, commodity OS, commodity blades – might provide the avoided cost dollars they’ve all been looking for. That is, if they can be assured that lacing data out in the cloud won’t violate ay regulatory or privacy headaches.

So today it gets official. After dropping hints for months, Cisco has finally made it official: its Unified Computing System is to provide in essence a prepackaged data center:

Blades + Storage Networking + Enterprise Networking in a box.

By now you’ve probably read the headlines – that UCS is supposed to do, what observers like Dana Gardner term, bring an iPhone like unity to the piece parts that pass for data centers. It would combine blade, network device, storage management and VMware’s virtualization platform (as you might recall, Cisco owns a $150 million chunk of VMware) to provide, in essence, a data center appliance in the cloud.

In a way, UCS is a closing of the circle that began with mainframe host/terminal architectures of a half century ago: a single monolithic architecture with no external moving parts.

Of course, just as Sun wasn’t the first to exploit TCP/IP network, but got the lion’s share of credit from, similarly, Cisco is hardly the first to bridge the gap between compute and networking node. Sun already has a Virtual Network Machines Project for processing network traffic on general-purpose servers, while its Project Crossbow is supposed to make networks virtual as well as part of its OpenSolaris project. Sounds like a nice open source research project to us that’s limited to the context of the Solaris OS. Meanwhile HP has raped up its Procurve business, which aims at the heart of Cisco territory. Ironically, the dancer left on the sidelines is IBM, which sold off its global networking business to AT&T over a decade ago, and its ROLM network switches nearly a decade before that.

It’s also not Cisco’s first foray out of the base of the network OSI stack. Anybody remember Application-Oriented Networking? Cisco’s logic, to build a level of content-based routing into its devices was supposed to make the network “understand” application traffic. Yes, it secured SAP’s endorsement for the rollout, but who were you really going to sell this to in the enterprise? Application engineers didn’t care for the idea of ceding some of their domain to their network counterparts. On the other hand, Cisco’s successful foray into storage networking proves that the company is not a one-trick pony.

What makes UCS different on this go round are several factors. Commoditization of hardware and firmware, emergence of virtualization and the cloud, makes division of networking, storage, and datacenter OS artificial. Recession makes enterprises hungry for found money, maturation of the cloud incents cloud providers to buy pre-packaged modules to cut acquisition costs and improve operating margins. Cisco’s lineup of partners is also impressive – VMware, Microsoft, Red Hat, Accenture, BMC, etc. – but names and testimonials alone won’t make UCS fly. The fact is that IT has no more hunger for data center complexity, the divisions between OS, storage, and networking no longer adds value, and cloud providers need a rapid way of prefabricating their deliverables.

Nonetheless we’ve heard lots of promises of all-in-one before. The good news is this time around there’s lots of commodity technology and standards available. But if Cisco is to make a real alternative to IBM, HP, or Dell, it’s got to put make datacenter or cloud-in-the box reality.