Category Archives: IT Infrastructure

It’s happening: Hadoop and SQL worlds are converging

With Strata, IBM IOD, and Teradata Partners conferences all occurring this week, it’s not surprising that this is a big week for Hadoop-related announcements. The common thread of announcements is essentially, “We know that Hadoop is not known for performance, but we’re getting better at it, and we’re going to make it look more like SQL.” In essence, Hadoop and SQL worlds are converging, and you’re going to be able to perform interactive BI analytics on it.

The opportunity and challenge of Big Data from new platforms such as Hadoop is that it opens a new range of analytics. On one hand, Big Data analytics have updated and revived programmatic access to data, which happened to be the norm prior to the advent of SQL. There are plenty of scenarios where taking programmatic approaches are far more efficient, such as dealing with time series data or graph analysis to map many-to-many relationships. It also leverages in-memory data grids such as Oracle Coherence, IBM WebSphere eXtreme Scale, GigaSpaces and others, and, where programmatic development (usually in Java) proved more efficient for accessing highly changeable data for web applications where traditional paths to the database would have been I/O-constrained. Conversely Advanced SQL platforms such as Greenplum and Teradata Aster have provided support for MapReduce-like programming because, even with structured data, sometimes using a Java programmatic framework is a more efficient way to rapidly slice through volumes of data.

Until now, Hadoop has not until now been for the SQL-minded. The initial path was, find someone to do data exploration inside Hadoop, but once you’re ready to do repeatable analysis, ETL (or ELT) it into a SQL data warehouse. That’s been the pattern with Oracle Big Data Appliance (use Oracle loader and data integration tools), and most Advanced SQL platforms; most data integration tools provide Hadoop connectors that spawn their own MapReduce programs to ferry data out of Hadoop. Some integration tool providers, like Informatica, offer tools to automate parsing of Hadoop data. Teradata Aster and Hortonworks have been talking up the potentials of HCatalog, actuality an enhanced version of Hive with RESTful interfaces, cost optimizers, and so on, to provide a more SQL friendly view of data residing inside Hadoop.

But when you talk analytics, you can’t simply write off the legions of SQL developers that populate enterprise IT shops. And beneath the veneer of chaos, there is an implicit order to most so-called “unstructured” data that is within the reach programmatic transformation approaches that in the long run could likely be automated or packaged inside a tool.

At Ovum, we have long believed that for Big Data to crossover to the mainstream enterprise, that it must become a first-class citizen with IT and the data center. The early pattern of skunk works projects, led by elite, highly specialized teams of software engineers from Internet firms to solve Internet-style problems (e.g., ad placement, search optimization, customer online experience, etc.) are not the problems of mainstream enterprises. And neither is the model of recruiting high-priced talent to work exclusively on Hadoop sustainable for most organizations; such staffing models are not sustainable for mainstream enterprises. It means that Big Data must be consumable by the mainstream of SQL developers.

Making Hadoop more SQL-like is hardly new
Hive and Pig became Apache Hadoop projects because of the need for SQL-like metadata management and data transformation languages, respectively; HBase emerged because of the need for a table store to provide a more interactive face – although as a very sparse, rudimentary column store, does not provide the efficiency of an optimized SQL database (or the extreme performance of some columnar variants). Sqoop in turn provides a way to pipeline SQL data into Hadoop, a use case that will grow more common as organizations look to Hadoop to provide scalable and cheaper storage than commercial SQL. While these Hadoop subprojects that did not exactly make Hadoop look like SQL, they provided building blocks from which many of this week’s announcements leverage.

Progress marches on
One train of thought is that if Hadoop can look more like a SQL database, more operations could be performed inside Hadoop. That’s the theme behind Informatica’s long-awaited enhancement of its PowerCenter transformation tool to work natively inside Hadoop. Until now, PowerCenter could extract data from Hadoop, but the extracts would have to be moved to a staging server where the transformation would be performed for loading to the familiar SQL data warehouse target. The new offering, PowerCenter Big Data Edition, now supports an ELT pattern that uses the power of MapReduce processes inside Hadoop to perform transformations. The significance is that PowerCenter users now have a choice: load the transformed data to HBase, or continue loading to SQL.

There is growing support for packaging Hadoop inside a common hardware appliance with Advanced SQL. EMC Greenplum was the first out of gate with DCA (Data Computing Appliance) that bundles its own distribution of Apache Hadoop (not to be confused with Greenplum MR, a software only product that is accompanied by a MapR Hadoop distro). Teradata Aster has just joined the fray with Big Analytics Appliance, bundling the Hortonworks Data Platform Hadoop; this move was hardly surprising given their growing partnership around HCatalog, an enhancement of the SQL-like Hive metadata layer of Hadoop that adds features such as a cost optimizer and RESTful interfaces that make the metadata accessible without the need to learn MapReduce or Java. With HCatalog, data inside Hadoop looks like another Aster data table.

Not coincidentally, there is a growing array of analytic tools that are designed to execute natively inside Hadoop. For now they are from emerging players like Datameer (providing a spreadsheet-like metaphor; which just announced an app store-like marketplace for developers), Karmasphere (providing an application develop tool for Hadoop analytic apps), or a more recent entry, Platfora (which caches subsets of Hadoop data in memory with an optimized, high performance fractal index).

Yet, even with Hadoop analytic tooling, there will still be a desire to disguise Hadoop as a SQL data store, and not just for data mapping purposes. Hadapt has been promoting a variant where it squeezes SQL tables inside HDFS file structures – not exactly a no-brainer as it must shoehorn tables into a file system with arbitrary data block sizes. Hadapt’s approach sounds like the converse of object-relational stores, but in this case, it is dealing with a physical rather than a logical impedance mismatch.

Hadapt promotes the ability to query Hadoop directly using SQL. Now, so does Cloudera. It has just announced Impala, a SQL-based alternative to MapReduce for querying the SQL-like Hive metadata store, supporting most but not all forms of SQL processing (based on SQL 92; Impala lacks triggers, which Cloudera deems low priority). Both Impala and MapReduce rely on parallel processing, but that’s where the similarity ends. MapReduce is a blunt instrument, requiring Java or other programming languages; it splits a job into multiple, concurrently, pipelined tasks where, at each step along the way, reads data, processes it, and writes it back to disk and then passes it to the next task. Conversely, Impala takes a shared nothing, MPP approach to processing SQL jobs against Hive; using HDFS, Cloudera claims roughly 4x performance against MapReduce; if the data is in HBase, Cloudera claims performance multiples up to a factor of 30. For now, Impala only supports row-based views, but with columnar (on Cloudera’s roadmap), performance could double. Cloudera plans to release a real-time query (RTQ) offering that, in effect, is a commercially supported version of Impala.

By contrast, Teradata Aster and Hortonworks promote a SQL MapReduce approach that leverages HCatalog, an incubating Apache project that is a superset of Hive that Cloudera does not currently include in its roadmap. For now, Cloudera claims bragging rights for performance with Impala; over time, Teradata Aster will promote the manageability of its single appliance, and with the appliance has the opportunity to counter with hardware optimization.

The road to SQL/programmatic convergence
Either way – and this is of interest only to purists – any SQL extension to Hadoop will be outside the Hadoop project. But again, that’s an argument for purists. What’s more important to enterprises is getting the right tool for the job – whether it is the flexibility of SQL or raw power of programmatic approaches.

SQL convergence is the next major battleground for Hadoop. Cloudera is for now shunning HCatalog, an approach backed by Hortonworks and partner Teradata Aster. The open question is whether Hortonworks can instigate a stampede of third parties to overcome Cloudera’s resistance. It appears that beyond Hive, the SQL face of Hadoop will become a vendor-differentiated layer.

Part of conversion will involve a mix of cross-training and tooling automation. Savvy SQL developers will cross train to pick up some of the Java- or Java-like programmatic frameworks that will be emerging. Tooling will help lower the bar, reducing the degree of specialized skills necessary. And for programming frameworks, in the long run, MapReduce won’t be the only game in town. It will always be useful for large-scale jobs requiring brute force, parallel, sequential processing. But the emerging YARN framework, which deconstructs MapReduce to generalize the resource management function, will provide the management umbrella for ensuring that different frameworks don’t crash into one another by trying to grab the same resources. But YARN is not yet ready for primetime – for now it only supports the batch job pattern of MapReduce. And that means that YARN is not yet ready for Impala or vice versa.

Of course, mainstreaming Hadoop – and Big Data platforms in general – is more than just a matter of making it all look like SQL. Big Data platforms must be manageable and operable by the people who are already in IT; they will need some new skills and grow accustomed to some new practices (like exploratory analytics), but the new platforms must also look and act familiar enough. Not all announcements this week were about SQL; for instance, MapR is throwing a gauntlet to the Apache usual suspects by extending its management umbrella beyond the proprietary NFS-compatible file system that is its core IP to the MapReduce framework and HBase, making a similar promise of high performance. On the horizon, EMC Isilon and NetApp are proposing alternatives promising a more efficient file system but at the “cost” of separating the storage from the analytic processing. And at some point, the Hadoop vendor community will have to come to grips with capacity utilization issues, because in the mainstream enterprise world, no CFO will approve the purchase of large clusters or grids that get only 10 – 15% utilization. Keep an eye on VMware’s Project Serengeti.

They must be good citizens in data centers that need to maximize resource (e.g., virtualization, optimized storage); must comply with existing data stewardship policies and practices; and must fully support existing enterprise data and platform security practices. These are all topics for another day.

What will Splunk be when it grows up?

Much of the hype around Big Data is that, not only are people generating more data, but machines. Machine data has always been there – it was traditionally collected by dedicated systems such as network node managers, firewalls systems, SCADA systems, and so on. But that’s where the data stayed.

Machine data is obviously pretty low level stuff. Depending on the format of data spewed forth by devices, it may be highly cryptic or may actually contain text that is human intelligible. It was traditionally considered low-density data that was digested either by specific programs or applications or by specific people – typically systems operators or security specialists.

Splunk’s reason for existence is putting this data onto a common data platform, then index it to make it searchable as a function of time. The operable notion is that the data could then be shared or correlated across applications, such as the weblogs. Its roots are in the underside of IT infrastructure management systems, where Splunk is often the embedded data engine. An increasingly popular use case is security, where Splunk can reach across network, server, storage, and web domains to provide a picture of exploits that could be end-to-end, at least within the data center.

There’s been a bit of hype around the company, which IPO’ed earlier this year and reported a strong Q2. Consumer technology still draws the headlines (just look at how much the release of the iPhone 5 drowned out almost all other tech news this week). But given Facebook’s market dive, maybe the turn of events on Wall Street could be characterized as revenge of the enterprise, given the market’s previous infatuation with the usual suspects in the consumer space – mobile devices, social networks, and gaming.

Splunk has a lot of headroom. With machine data proliferating and the company’s promoting its offering as an operational intelligence platform, Splunk is well-positioned as a company that leverages Fast Data. While Splunk is not split second or deterministic real-time, its ability to build searchable indexes on the fly positions itself nicely for tracking volatile environments as they change as opposed to waiting after the fact (although Splunk can also be used for retrospective historical analysis, too).

But Splunk faces real growing pains, both up the value chain, and across it.

While Splunk’s heritage is in IT infrastructure data, the company bills itself as being about the broader category of machine data analytics. And there’s certainly lots of it around, given the explosion of sensory devices that are sending log files from all over the place, inside the four walls of a data center or enterprise, and out. There’s The Internet of Things. IBM’s Smarter Planet campaign over the past few years has raised general awareness of how instrument and increasingly intelligent Spaceship Earth is becoming. Maybe we’re jaded, but it’s become common knowledge that the world is full of sensory points, whether it is traffic sensors embedded in the pavement, weather stations, GPS units, smartphones, biomedical devices, industrial machinery, oil and gas recovery and refining, not to mention the electronic control modules sitting between driver and the powertrain in your car.

And within the enterprise, there may be plenty of resistance to getting the bigger picture. For instance, while ITO owns infrastructure data, marketing probably owns the Omniture logs; yet absent the means to correlate the two, it may not be possible to get the answer on why the customer did or did not make the purchase online.

For a sub $200-million firm, this is all a lot of ground to cover. Splunk knows the IT and security market but lacks the breadth of an IBM to address all of the other segments across national intelligence, public infrastructure, smart utility grids, or healthcare verticals, to name a few. And it has no visibility above IT operations or appdev organizations. Splunk needs to pick its targets.

Splunk is trying to address scale – that’s where the Big Data angle comes in. Splunk is adding some features to increases its scale, with the new 5.0 release adding federated indexing to boost performance over larger bodies of data. But for real scale, that’s where integration with Hadoop comes in, acting as a near-line archive for Splunk data that might otherwise be purged. Splunk offers two forms of connectivity: HadoopConnect, which provides a way to stream and transform Splunk data to populate HDFS and Shuttl, a slower archival feature that treats Hadoop as a tape library (the data is heavily compressed with GZip). It’s definitely a first step – HadoopConnect as the name implies establishes connectivity, but the integration is hardly seamless or intuitive, yet. It uses Splunk’s familiar fill-in-the-blank interface (we’d love to see something more point and click), with the data in Hadoop retrievable, but without Splunk’s familiar indexes (unless you re-import the data back to Splunk). On the horizon, we’d love to see Splunk tackle the far more challenging problem of getting its indexes to work natively inside Hadoop, maybe with HBase.

Then there’s the eternal question of making machine data meaningful to the business. Splunk’s search-based interface today is intuitive to developers and systems admins, as it requires knowledge of the types of data elements that are being stored. But it won’t work for anybody that doesn’t work with the guts of applications or computing infrastructure. But it becomes more critical to convey that message as Splunk is used to correlate log files with higher level sources, such as the correlating abandoned shopping carts with underlying server data to see if the missed sale was attributable to system bugs or the buyer changing her mind.

The log files that record how different elements of IT infrastructure perform are in aggregate telling a story that tells how well your organization is serving the customer. Yet the perennial challenge of all systems level management platforms has been conveying the business impact from the events that generated those log files. For those who don’t have to dye their hair gray, there are distant memories of providers like CA, IBM, and HP promoting how their panes of glass displaying data center performance could tell a business message. There’s been the challenge for ITIL adopters to codify the running of processes in the data center to support the business. The lists of stillborn attempts to convey business meaning to the underlying operations are endless.

So maybe given the hype of the IPO, the relatively new management team that is in place, and the reality of Splunk’s heritage, it shouldn’t be surprising that we heard two different messages and tones.

From recently-appointed product SVP Guido Schroeder, we heard talk of creating a semantic metadata layer that would, in effect, create de facto business objects. That shouldn’t be surprising, as during his previous incarnation he oversaw the integration of Business Objects into the SAP business. For anyone who has tracked the BI business over the years, the key to success has been creation of a metadata layer that not only codified the entities, but made it possible to attain reuse in ad hoc query and standard reporting. Schroeder and the current management team are clearly looking to take Splunk above IT operations to CIO level.

But attend almost session at the conference, and the enterprise message was largely missing. That shouldn’t be surprising as the conference itself was aimed at the people who buy Splunk’s tools – and they tend to be down more in the depths of operations.

There were a few exceptions. One of the sessions in the Big Data track, led by Stuart Hirst, CTO of an Australian big data consulting firm Converging Data, communicated the importance of preserving the meaning of data as it moves through the lifecycle. In this case, it was a counter-intuitive pitch to conventional wisdom of Big Data, which is ingest the data, explore and classify it later. As Splunk data is ingested, it is time stamped to provide a chronological record. Although this may be low level data, as you bring more of it together, there should be a record of lineage, not to mention sensitivity (e.g., are customer-facing systems involved.

From that standpoint, the notion of adding a semantic metadata layer atop its indexing sounds quite intuitive – assign higher level meanings to buckets of log data that carries some business process meaning. For that, Splunk would have to rely on external sources – the applications and databases that run atop the infrastructure whose log files it tracks. That’s a tall order and one that will require partners, not to mention how do you define what are the entities that should be defined. Unfortunately, the track record for cross enterprise repositories is not great; maybe there could be some leveraging of MDM implementations around customer or product that could provide some beginning frame of reference.

But we’re getting way, way ahead of ourselves here. Splunk is the story of an engineering-oriented company that is seeking to climb higher up the value chain in the enterprise. Yet, as it seeks to engage higher level people within the customer organization, Splunk can’t afford to lose track of the base that has been responsible for its success. Splunk’s best route upward is likely through partnering with enterprise players like SAP. That doesn’t deal with the question of how to expand out the footprint to follow the footprint of what is called machine data, but then again, that’s a question for another day. First things first, Splunk needs to pick its target(s) carefully.

Google and Motorola: Quick Post Mortem

There’s been plenty of excellent commentary on Google’s $12.5 billion deal for Motorola Mobility Inc. (MMI) over the past few days, and we’re certainly not going to rehash covered ground.

Clearly this is a lot of money that was invested defensively. Money that could have gone into research or acquisitions that would have grown the business or opened new markets.

Whatever.

That thought hit us this morning after reading a NY Times piece on the bull market for patents. It reinforced our thoughts after word of the deal broke: that this was money spent for arming Google against patent predators in courts of law. In this case, it’s predators sensing blood to slow down or at least exact royalties from the Android platform juggernaut.

Of course much of the issue stems from the subjective nature of software patents; that’s a longstanding issue given that the iterative nature of software development. It is simply difficult if not impossible to prove that a software innovation does not base itself in some way on prior invention. Furthermore, the fact that software relies on other software to operate makes the notion of software patents even more dubious.

This doesn’t mean that software developers should get away plagiarism. Although discovery is still underway, the evidence continues to get more damning in the Oracle-Google case over Dalvik, the Android VM that on closer inspection looks like the JVM in sheep’s clothing. The irony is that when Google was still pulling its (J)VM clean room act, the company at the other end of the line was Sun. To us, this is a reflection of Google’s Not-Invented-Here mentality. Would it have killed them to secure a JVM license at the time, as they could have gotten far more reasonable terms from Sun – rather than Oracle, the new sheriff in town.

Just askin’.

Leo Apotheker to target HP’s forgotten business

Ever since its humble beginnings in the Palo Alto garage, HP has always been kind of a geeky company – in spite of Carly Fiorina’s superficial attempts to prod HP towards a vision thing during her aborted tenure. Yet HP keeps talking about getting back to that spiritual garage.

Software has long been the forgotten business of HP. Although – surprisingly – the software business was resuscitated under Mark Hurd’s reign (revenues have more than doubled as of a few years ago), software remains almost a rounding error in HP’s overall revenue pie.

Yes, Hurd gave the software business modest support. Mercury Interactive was acquired under his watch, giving the business a degree of critical mass when combined with the legacy OpenView business. But during Hurd’s era, there were much bigger fish to fry beyond all the internal cost cutting for which Wall Street cheered, but insiders jeered. Converged Infrastructure has been the mantra, reminding us one and all that HP was still very much a hardware company. The message remains loud and clear with HP’s recent 3PAR acquisition at a heavily inflated $2.3 billion which was concluded in spite of the interim leadership vacuum.

The dilemma that HP faces is that, yes, it is the world’s largest hardware company (they call it technology), but the bulk of that is from personal systems. Ink, anybody?

The converged infrastructure strategy was a play at the CTO’s office. Yet HP is a large enough company that it needs to compete in the leagues of IBM and Oracle, and for that it needs to get meetings with the CEO. Ergo, the rumors of feelers made to IBM Software’s Steve Mills, and the successful offer to Leo Apotheker, and agreement for Ray Lane as non executive chairman.

Our initial reaction was one of disappointment; others have felt similarly. But Dennis Howlett feels that Apotheker is the right choice “to set a calm tone” that there won’t be a massive a debilitating reorg in the short term.

Under Apotheker’s watch, SAP stagnated, hit by the stillborn Business ByDesign and the hike in maintenance fees that, for the moment, made Oracle look warmer and fuzzier. Of course, you can’t blame all of SAP’s issues on Apotheker; the company was in a natural lull cycle as it was seeking a new direction in a mature ERP market. The problem with SAP is that, defensive acquisition of Business Objects notwithstanding, the company has always been limited by a “not invented here” syndrome that has tended to blind the company to obvious opportunities – such as inexplicably letting strategic partner IDS Scheer slip away to Software AG. Apotheker’s shortcoming was not providing the strong leadership to jolt SAP out of its inertia.

Instead, Apotheker’s – and Ray Lane’s for that matter – value proposition is that they know the side of the enterprise market that HP doesn’t. That’s the key to this transition.

The next question becomes acquisitions. HP has a lot on its plate already. It took at least 18 months for HP to digest the $14 billion acquisition of EDS, providing a critical mass IT services and data center outsourcing business. It is still digesting nearly $7 billion of subsequent acquisitions of 3Com, 3PAR, and Palm to make its converged infrastructure strategy real. HP might be able to get backing to make new acquisitions, but the dilemma is that Converged Infrastructure is a stretch in the opposite direction from enterprise software. So it’s not just a question of whether HP can digest another acquisition; it’s an issue of whether HP can strategically focus in two different directions that ultimately might come together, but not for a while.

So let’s speculate about software acquisitions.

SAP, the most logical candidate, is, in a narrow sense, relatively “affordable” given that its stock is roughly about 10 – 15% off its 2007 high. But SAP would be obviously the most challenging given the scale; it would be difficult enough for HP to digest SAP under normal circumstances, but with all the converged infrastructure stuff on its plate, it’s back to the question of how can you be in two places at once. Infor is a smaller company, but as it is also a polyglot of many smaller enterprise software firms, would present HP additional integration headaches that it doesn’t need.

HP may have little choice but to make a play for SAP if IBM or Microsoft were unexpectedly to actively bid. Otherwise, its best bet is to revive the relationship which would give both companies the time to acclimate. But in a rapidly consolidating technology market, who has the luxury of time these days?

Salesforce.com would make a logical stab as it would reinforce HP Enterprise Services’ (formerly EDS) outsourcing and BPO business. It would be far easier for HP to get its arms around this business. The drawback is that Salesforce.com would not be very extensible as an application as it uses a proprietary stored procedures database architecture. That would make it difficult to integrate with a prospective ERP SaaS acquisition, which would otherwise be the next logical step to growing the enterprise software footprint.

Informatica is often brought up – if HP is to salvage its Neoview BI business, it would need a data integration engine to help bolster it. Better yet, buy Teradata, which is one of the biggest resellers of Informatica PowerCenter – that would give HP far more credible presence in the analytics space. Then it will have to ward off Oracle – which has an even more pressing need for Informatica to fill out the data integration piece in its Fusion middleware stack – for Informatica. But with Teradata, there would at least be a real anchor for the Informatica business.

HP has to decide what kind of company it needs to be as Tom Kucharvy summarized well a few weeks back. Can HP afford to converge itself in another direction? Can it afford not to? Leo Apotheker has a heck of a listening tour ahead of him.

HP analyst meeting 2010: First Impressions

Over the past few years, HP under Mark Hurd has steadily gotten its act together in refocusing on the company’s core strengths with an unforgiving eye on the bottom line. Sitting at HP’s annual analyst meeting in Boston this week, we found ourselves comparing notes with our impressions from last year. Last year, our attention was focused on Cloud Assure; this year, it’s the integraiton of EDS into the core businesss.

HP now bills itself as the world’s largest purely IT company and ninth in the Fortune 500. Of course, there’s the consumer side of HP that the world knows. But with the addition of EDS, HP finally has a credible enterprise computing story (as opposed to an enterprise server company). Now we’ll get plenty of flack from our friends at HP for that one – as HP has historically had the largest market share for SAP servers. But let’s face it; prior to EDS, the enterprise side of HP was primarily a distributed (read: Windows or UNIX) server business. Professional services was pretty shallow, with scant knowledge of the mainframes that remain the mainstay of corporate computing. Aside from communications and media, HP’s vertical industry practices were sparse, few, and far between. HP still lacks the vertical breadth of IBM, but with EDS has gained critical mass in sectors ranging from federal to manufacturing, transport, financial services, and retail, among others.

Having EDS also makes credible initiatives such as Application Transformation, a practice that helps enterprises prune, modernize, and rationalize their legacy application portfolios. Clearly, Application transformation is not a purely EDS offering; it was originated by Ann Livermore’s Enterprise Business group, draws upon HP Software assets such as discovery and dependency mapping, Universal CMDB, PPM, and the recently introduced IT Financial Management (ITFM) service. But to deliver, you need bodies and people that know the mainframe – where most of the apps being harvested or thinned out are. And that’s where EDS helps HP flesh this out to a real service.

But EDS is so 2009; the big news on the horizon is 3Com, a company that Cisco left in the dust before it rethought its product line and eked out a highly noticeable 30% market share for network devices in China. Once the deal is closed, 3Com will be front and center in HP’s converged computing initiative which until now primarily consisted of blades and Procurve VoIP devices. It gains a much wider range of network devices to compete head-on as Cisco itself goes up the stack to a unified server business. Once the 3com deal is closed, HP will have to invest significant time, energy, and resources to deliver on the converged computing vision with an integrated product line, rather than a bunch of offerings that fill the squares of a PowerPoint matrix chart.

According to Livermore, the company’s portfolio is “well balanced.” We’d beg to differ where it comes to software, which accounts for a paltry 3% of revenues (a figure that our friends at HP reiterated underestimated the real contribution of software to the business).

It’s the side of the business that suffered from (choose one) benign or malign neglect prior to the Mark Hurd era. HP originated network node management software for distributed networks, an offering that eventually morphed into the former OpenView product line. Yet HP was so oblivious to its own software products that at one point its server folks promoted bundling of rival product from CA. Nonetheless, somehow the old HP managed not to kill off Openview or Opencall (the product now at the heart of HP’s communications and media solutions) – although we suspect that was probably more out of neglect than intent.

Under Hurd, software became strategic, a development that lead to the transformational acquisition of Mercury, followed by Opsware. HP had the foresight to place the Mercury, Opsware, and Openview products within the same business unit as – in our view – the application lifecycle should encompass managing the runtime (although to this day HP has not really integrated Openview with Mercury Business Availability Center; the products still appeal to different IT audiences). But there are still holes – modest ones on the ALM side, but major ones elsewhere, like in business intelligence where Neoview sits alone. Or in the converged computing stack and cloud in a box offerings, which could use strong identity management.

Yet if HP is to become a more well-rounded enterprise computing company, it needs more infrastructural software building blocks. To our mind, Informatica would make a great addition that would point more attention to Neoview as a credible BI business, not to mention that Informatica’s data transformation capabilities could play key roles with its Application Transformation service.

We’re concerned that, as integration of 3Com is going to consume considerable energy in the coming year, that the software group may not have the resources to conduct the transformational acquisitions that are needed to more firmly entrench HP as an enterprise computing player. We hope that we’re proven wrong.

AmberPoint finally gets acquired

Thanks go out to Oracle this morning for finally putting us out of our suspense. AmberPoint was one of a dwindling group of still-standing independents delivering run time governance of the for SOA environments.

It’s a smart move for Oracle as it patches some gaps in its Enterprise Manager offering, not only in SOA runtime governance, but also with business transaction management – and potentially – better visibility to non-Oracle systems. Of course, that visibility will in part depend on the kindness of strangers as AmberPoint partners like Microsoft and Software AG might not be feeling the same degree of love going forward.

We’re surprised that AmberPoint was able to stay independent for as long as it had, because the task that it performs is simply one piece of managing the run time. When you manage whether services are connecting, delivering the right service levels to the right consumers, ultimately you are looking at a larger problem because services do not exist on their own desert island. Neither should runtime SOA governance. As we’ve stated again and again, it makes little sense to isolate runtime governance from IT Service Management. The good news is that with the Oracle acquisition, there are potential opportunities, not only for converging runtime SOA governance with application management, but as Oracle digests the Sun acquisition, providing full visibility down to infrastructure level.

But let’s not get ahead of ourselves here as the emergence of a unified, Oracle on Sun turnkey stack won’t happen overnight. And the challenge of delivering an integrated solution will be as much cultural as technical, as the jurisdictional boundary between software development and IT operations blurs. But we digress.

Nonetheless, over the past couple years, AmberPoint itself has begun reaching out from its island of SOA runtime, as it extended its visibility to business transaction management. AmberPoint is hardly alone here as we’ve seen a number of upstarts like AppDynamics or Bluestripe (typically formed by veterans of Wiley and HP/Mercury), burrowing down into the space of instrumenting transactions from hop to hop. Transaction monitoring and optimization will become the next battleground of application performance management, and it is one that IBM, BMC, CA, HP, and Compuware are hardly likely to passively watch from the sidelines.

As for whether runtime SOA governance demands a Switzerland-style independent vendor approach, that leaves it up to the last one standing, SOA Software, to fight the good fight. Until now, AmberPoint and SOA Software have competed for the affections of Microsoft; AmberPoint has offered an Express web services monitoring product that is a free plug-in for Visual Studio (a version is also available for Java); SOA Software offers extensive .NET versions of its service policy, portfolio, repository, and service manager offerings.

Nonetheless, although AmberPoint isn’t saying anything outright about the WebLogic share of its 300-customer installed base, that platform was first among equals when it came to R&D investment and presence. BEA previously OEM’ed the AmberPoint management platform, an arrangement that Oracle ironically discontinued; well in this case, the story ends happily ever after. As for SOA Software, we would be surprised if this deal didn’t push it into closer embrace with Microsoft.

Postscript: Thanks to Ann Thomas Manes for updating me on AmberPoint’s alliances. They are/were with SAP, Tibco, and HP, in addition to Microsoft. Their Software AG relationship has faded in recent years.

Of course all this M&A rearranges the dance floor in interesting ways. Oracle currently OEMs HP’s Systinet as its SOA registry, an arrangement that might get awkward now that Oracle’s getting into the hardware business. That will place into question virtually all of AmberPoint’s relationships.

SpringSource: Back to our regularly scheduled program

With the ink not yet dry on VMware’s offer to buy SpringSource, it’s time for SpringSource to get back to its regularly scheduled program. That happened to be SpringSource’s unveiling of the Cloud Foundry developer preview: This was the announcement that SpringSource was going to get out before the program got interrupted by the wheels of finance.

Cloud Foundry, a recent SpringSource acquisition, brings SpringSource’s evolution from niche technology to lightweight stack provider full circle. Just as pre-Red Hat JBoss was considered a light weight alternative to WebSphere and WebLogic, SpringSource is positioning itself as a kinder and gentler alternative to the growing JBoss-Red Hat stack. And that’s where the VMware connection comes into play, but more about that later.

The key of course is that SpringSource rides on the popularity of the Spring framework around which the company was founded. The company claims the Spring framework now shows up in roughly half of all Java installations. Its success is attributable to the way that Spring simplifies deployment to Java EE. But as popular as the Spring framework is, as an open source company, SpringSource monetizes only a fraction of al Spring framework deployments. So over the past few years it has been surrounding the framework with a stack of lightweight technologies that complement it, encompassing the:
• Tomcat servlet container (a lightweight Java server) and the newer DM server that is based on OSGi technology.
• Hyperic as the management stack;
• Groovy and Grails, which provides dynamic scripting that is native to the JVM, and an accompanying framework to make Groovy programming easy; and
• Cloud Foundry, which provided SpringSource the technology to mount its offerings in the cloud.

From a mercenary standpoint, putting all the pieces out in a cloud enables SpringSource to more thoroughly monetize the open source assets that otherwise gain revenue stream through support subscriptions.

But in another sense, you could consider the SpringSource’s Cloud Foundry as the Java equivalent of what Microsoft plans to do with Azure. In both cases, the goal is Platform-as-a-Service offerings based on familiar technology (Java, .NET) that can run in and outside the cloud. Microsoft calls it Software + Services. What both also have in common is that they are still in preview and not likely to go GA until next year.

But beyond the fact that SpringSource’s offering is Java-based, the combination with VMware adds yet a more basic differentiator. While Microsoft Azure is an attempt to preserve the Windows and Microsoft Office franchise, when you add VMware to the mix, the goal on SpringSource’s side is to make the OS irrelevant.

There are other intriguing possibilities with the link to VMware such as the possibility that some of the principles of the Spring framework (e.g., dependency injection, which abstract dependencies so developers don’t have to worry about writing all the necessary configuration files) might be applied to managing virtualization, which untamed, could become quite a beast to manage. And as we mentioned last week in the wake of the VMware announcement, SpringSource could do with some JVM virtualization so that each time you need to stretch the processing of Java objects., that you don’t have to blindly sprawl out another VM container.

Fleshing out the Cloud

VMware’s proposed $362 million acquisition of SpringSource is all about getting serious in competing with Salesforce.com and Google App Engine as the Platform-as-a-Service (PaaS) cloud with the technology that everybody already uses.

This acquisition was a means to an end, pairing two companies that could not be less alike. VMware is a household name, sells software through traditional commercial licenses, and markets to IT operations. SpringSource is a grassroots, open source developer-oriented firm whose business is a cottage industry by comparison. The cloud brought both companies together that each faced complementary limitations on their growth. VMware needed to grow out beyond its hardware virtualization niche if it was to regain its groove, while SpringSource needed to grow up and find deeper pockets to become anything more than a popular niche player.

The fact is that providing a virtualization engine, even if you pad it with management utilities that act like an operating system, is still a raw cloud with little pull unless you go higher up in the stack. Raw clouds have their appeal only to vendors that resell capacity or enterprise large firms with the deep benches of infrastructure expertise to run their own virtual environments. For the rest of us, we need a player that provides a deployment environment, handles the plumbing, that is married to a development environment. That is what Salesforce’s Force.com and Google’s App Engine are all about. VMware’s gambit is in a way very similar to Microsoft’s Software + Services strategy: use the software and platforms that you are already used to, rather than some new environment in a cloud setting. There’s nothing more familiar to large IT environments than VMware’s ESX virtualization engine, and in the Java community, there’s nothing more familiar than the Spring framework which – according to the company – accounts for roughly half of all Java installations.

With roughly $60 million in stock options for SpringSource’s 150-person staff, VMware is intent on keeping the people as it knows nothing about the Java virtualization business. Normally, we’d question a deal like this because the company’s are so dissimilar. But the fact that they are complementary pieces to a PaaS offering gives the combination stickiness.

For instance, VMware’s vSphere’s cloud management environment (in a fit of bravado, VMware calls it a cloud OS) can understand resource consumption of VM containers; with SpringSource, it gets to peer inside the black box and understand why those containers are hogging resource. That provides more flexibility and smarts for optimizing virtualization strategies, and can help cloud customers answer the question: do we need to spin out more VMs, perform some load balancing, or re-apportion all those Spring TC (Tomcat) servlet containers?

The addition of SpringSource also complements VMware’s cloud portfolio in other ways. In his blog about the deal, SpringSource CEO Rod Johnson noted that the idea of pairing VMware’s Lab Manager (that’s the test lab automation piece that VMware picked up through the Akimbi acquisition) proved highly popular with Spring framework customers. In actuality, if you extend Lab manager from simply spinning out images of testbeds to spinning out runtime containers, you would have VMware’s answer to IBM’s recently-introduced WebSphere Cloudburst appliance.

VMware isn’t finished however. The most glaring omission is need for Java object distributed caching to provide yet another alternative to scalability. If you only rely on spinning out more VMs, you get a highly rigid one-dimensional cloud that will not provide the economies of scale and flexibility that clouds are supposed to provide. So we wouldn’t be surprised if GigaSpaces or Terracotta might be next in VMware’s acquisition plans.

What do Smarter Planets and Oil Refineries have in common?

Last week we paid our third visit in as many years to IBM’s Impact SOA conference. Comparing notes, if 2007’s event was about engaging the business, and 2008 was about attaining the basic blocking and tackling to get transaction system-like performance and reliability, this year’s event was supposed to provide yet another forum for pushing IBM’s Smarter Planet corporate marketing. We’ll get back to that in a moment.

Of course, given that conventional wisdom or hype has called 2009 the year of the cloud (e.g., here and here), it shouldn’t be surprising that cloud-related announcements grabbed the limelight. To recap: IBM announced WebSphere Cloudburst, an appliance that automates rapid deployment of WebSphere images to the private cloud (whatever that is — we already provided our two cents on that) and it released IBM’s BlueWorks, a new public cloud service for white boarding business processes that is IBM’s answer to Lombardi Blueprints.

But back to our regularly scheduled program, IBM has been pushing Smarter Planet since the fall. It came in the wake of a period when rapid run-up and volatility in natural resource prices and global instability prompted renewed discussions over sustainability that are at decibel levels not heard since the late 70s. A Sam Palmisano speech delivered before the Council on Foreign relations last November laid out what have since become IBM’s standard talking points. The gist of IBM’s case is that the world is more instrumented and networked than ever, which in turn provides the nervous system so we can make the world a better, cleaner, and for companies, a more profitable place. A sample: 67% of electrical power generation is lost to network inefficiencies during a period of national debate in setting up smart grids.

IBM’s Smarter Planet campaign is hardly anything new. It builds on Metcalfe’s law, which posits that the value of a network is the square of the numbers of new users that join it. Put another way, a handful of sensors provides only narrow slices of disjoint data; fill that network in with hundreds or thousands of sensors, add some complex event processing logic to it, and now you not only can deduce what’s happening, but do things like predict what will happen or provide economic incentives that change human behavior so that everything comes out copasetic. Smarter Planet provides a raison d’etre for IBM’s Business Events Processing initiatives that it began struggling to get its arms around last fall. It also tries to make use of IBM’s capacity for extreme scale computing, but also prods it to establish relationships with new sets of industrial process control and device suppliers that are quite different from the world of ISVs and systems integrators.

So, if you instrumented the grid, you could take advantage of transient resources such as winds that this hour might be gusting in the Dakotas, and in the next hour, in the Texas Panhandle, so that you could even out generation to the grid and supplant more expensive gas-fired generation in Chicago. Or, as described by a Singaporean infrastructure official at the IBM conference, you can apply sensors to support congestion pricing, which rations scarce highway capacity based on demand, with the net result that it ramps up prices to what the market will bear at rush hour and funnel those revenues to expanding the subway system (too bad New York dropped the ball when a similar opportunity presented itself last year). The same principle could make supply chains far more transparent and driven by demand with real-time predictive analytics if you somehow correlate all that RFID data. The list of potential opportunities, which optimize consumption of resources in a resource-constrained economy, are limited by the imagination.

In actuality, what IBM described is a throwback to common practices established in highly automated industrial process facilities, where closed-loop process control has been standard practice for decades. Take oil refineries for example. The facilities required to refine crude are extremely capital-intensive, the processes are extremely complex and intertwined, and the scales of production so huge that operators have little choice but to run their facilities flat out 24 x 7. With margins extremely thin, operators are under the gun to constantly monitor and tweak production in real time so it stays in the sweet spot where process efficiency, output, and costs are optimized. Such data is also used for predictive trending to prevent runaway reactions and avoid potential safety issues such as a dangerous build-up of pressure in a distillation column.

So at base, a Smarter Planet is hardly a radical idea; it seeks to emulate what has been standard practice in industrial process control going back at least 30 years.

What’s a Service? Who’s Responsible?

Abbott and Costello aside, one of the most charged, ambiguous, and overused terms in IT today is Service. At its most basic, a service is a function or operation that performs a task. For IT operations, a service is a function that is performed using a computing facility that performs a task for the organization. For software architecture, a service in the formal capital “S” form is a loosely coupled function or process that is designed to be abstracted from the software application, physical implementation, and the data source; as a more generic lower case “s,” service is a function that is performed by software. And if you look at the Wikipedia definition, a service can refer to processes that are performed down at the OS level.

Don’t worry, we’ll keep the discussion above OS level to stay relevant — and to stay awake.

So why are we getting hung up on this term? It’s because it was all over the rather varied day that we had today, having split our time at (1) HP’s annual IT analyst conference for its Technology Solutions Group (that’s the 1/3 of the company that’s not PCs or printers); (2) a meeting of the SOA Consortium; and (3) yet another meeting with MKS, an ALM vendor that just signed an interesting resale deal with BMC that starts with the integration of IT Service Desk with issue and defect management in the application lifecycle.

Services in each of its software and IT operations were all over our agenda today; we just couldn’t duck it. But more than just a coincidence of terminology, there is actually an underlying interdependency between the design and deployment of a software service, and the IT services that are required to run it.

It was core to the presentation that we delivered to the SOA Consortium today, as our belief is that you cannot manage a SOA or application lifecycle without adequate IT Service Management (ITSM, a discipline for running IT operations that is measured or tracked by the services. We drew a diagram that was deservedly torn apart by our colleagues on the call, Beth Gold-Bernstein and Todd Biske. UPDATE: Beth has a picture of the diagram in her blog. In our diagram, we showed how at run time, there is an intersection between the SOA life cycle and ITSM – or more specifically, ITIL version 3 (ITIL is the best known framework for implanting ITSM). Both maintained that interaction is necessary throughout the lifecycle; for instance, when the software development team is planning a service, they need to get ITO in the loop to brace for release of the service – especially if the service is likely to drastically ramp up demand on the infrastructure.

The result of our discussion was that, not simply that services are joined, figuratively, at the head and neck bone – the software and IT operations implementations – but that at the end of that day, somebody’s got to be accountable for ensuring that services are being developed and deployed responsibly. In other words, just making the business case for a service is not adequate if you can’t ensure that the infrastructure will be able to handle it. Lacking the second piece of the equation, you’d wind up with a scenario of the surgery being successful but the patient dies. With the functional silos that comprise most IT organizations today, that would mean dispersed responsibility of the software (or in some cases, enterprise) architect, and their equivalent(s) in IT Operations. In other words, everybody’s responsible, and nobody’s responsible.

The idea came up that maybe what’s needed is a service ownership role that transcends the usual definition (today, the service owner is typically the business stakeholder that sponsored development, and/or the architect that owns the design or software implementation). That is, a sort of uber role that ensures that the service (1) responds to a bona fide business need (2) is consistent with enterprise architectural standards and does not needlessly duplicate what is already in place, and (3) won’t break IT infrastructure or physical delivery (e.g., assure that ITO is adequately prepared).

While the last thing that the IT organizations needs is yet another layer of management, it may need another layer of responsibility.

UPDATE: Todd Biske has provided some more detail on what the role of a Service Manager would entail.