Category Archives: Cloud

Spark 2.0: Walking the tightrope

Here’s the second post summarizing our takeaways from the recent Spark Summit East.

In April or May, we’ll see Spark 2.0. The direction is addressing gap filling, performance enhancement, and refactoring to nip API sprawl in the bud.

Rewinding the tape, in 2015 the Spark project added new entry points beyond Resilient Distributed Datasets (RDDs). We saw DataFrames, a schema-based data API that borrowed from constructs familiar to Python and R developers. Besides opening Spark to SQL developers (who could write analytics to run against database-like tabular representations) and BI tools, the DataFrame API also leveled the playing field between Scala (the native language of Spark) and R, Python, Java, and Clojure via a common API. But DataFrames are not as fast or efficient as RDDs, so recently, Datasets were introduced to provide the best of both worlds: the efficiency of Spark data objects, with the ability to surface them as schema.

Spark 2.0 release will consolidate the DataFrame and Dataset APIs into one; DataFrame becomes, in effect, the row-level construct of Dataset. Together, both will be positioned as the default interchange format and richer API of Spark with more semantics than the low-level RDD.

If you want ease of use, go with Datasets, but if feeds and speeds is the goal, that’s where RDDs fit in. And that’s where the next enhancement comes in. Spark 2.0 adds the first tweaks to the recently-released Tungsten (adding code generation), which aims to replace the 20-year old JVM with a more efficient mechanism for managing CPU memory. That’s a key strategy for juicing Spark performance, and maybe one that will make Dataset performance good enough. The backdrop to this is that with in-memory processing and faster networks (up to 10 GBE are becoming commonplace), the CPU has become the bottleneck. By eliminating the overhead of JVM garbage collection, Tungsten hopes to even the score with storage and network performance.

The final highlight of Spark 2.0 is Structured Streaming, which will extend Spark SQL and DataFrames (which in turn is becoming part of Dataset) with a streaming API. That will allow streaming and interactive steps, which formerly had to be orchestrated with separate programs, to run as one. And it makes streaming analytics richer; instead of running basic filtering or count actions, you will be able to run more complex queries and transforms. The initial release in 2.0 will support ETL, but future releases will extend querying.

Beyond the 2.0 generation, Spark Streaming will finally get – catch this – streaming. Spark Streaming has been a misnomer, as it is really Spark microbatching. By contrast, rival open source streaming engines such as Storm and Flink give you the choice of streaming (processing exactly one event at a time) or microbatch. In the future, Spark Streaming will give you that choice as well. Because sometimes you want pure streaming, where you need to resolve down to a single event, but other use cases will be better suited for microbatch where you can do more complex processes such as data aggregations and joins. And one other thing, Spark Streaming has never been known for low latency; at best it can resolve batches of events in seconds rather than subseconds. When paired with Tungsten memory management, that should hopefully change.

Spark 2.0 walks a tightrope between adding functionality, consolidating APIs, while not trying to break them. It for now begs the question about all the housekeeping that will be necessary if running Spark standalone. If it’s in the cloud, the cloud service provider should offer the perimeter security, but for now more fine-grained access control will have to be implemented in the application or storage layers. There are some pieces – such as the managing the lifecycle of Spark compute artifacts such as RDDs or DataFrames – that may be the domain of third party value-added tools. And – as it seems likely – Spark establishes itself as the successor to MapReduce for the bulk of complex Big Data analytics workloads, the challenge will be drawing the line between what innovations belong on the Apache side (and preventing fragmentation) and what sits better with third parties. We began that discussion in our last post. Later this year, we expect this discussion to hit the forefront.

Hadoop and Spark: A Tale of two Cities

If it seems like we’ve been down this path before, well, maybe we have. June has been a month of juxtapositions, back and forth to the west coast for Hadoop and Spark Summits. The mood from last week to this has been quite contrasting. Spark Summit has the kind of canned heat that Hadoop conferences had a couple years back. We won’t stretch the Dickens metaphor.

Yeah, it’s human nature to say, down with the old and in with the new.

But let’s set something straight: Spark ain’t going to replace Hadoop, as we’re talking about apples and oranges. Spark can run on Hadoop, and it can run on other data platforms. What it might replace is MapReduce, if Spark can overcome its scaling hurdles. And it could fulfil IBM’s vision as the next analytic operating system if it addresses mundane – but very important concerns – for supporting scaling, high concurrency, and bulletproof security. Spark originated at UC Berkeley’s AMPLab back in 2009, with the founders forming Databricks. With roughly 700 committers contributors, Spark has ballooned to becoming the most active open source project in the apache community, barely 2 years after becoming an Apache project.

Spark is best known as a sort of in-memory analytics replacement for iterative computation frameworks like MapReduce; both employ massively parallel compute and then shuffle interim results, with the difference being that Spark caches in memory while MapReduce writes to disk. But that’s just the tip of the iceberg. Spark offers a simpler programming model, better fault tolerance, and it’s far more extensible than MapReduce. Spark is any form of iterative computation, and it was designed to support specific extensions; among the most popular are machine learning, microbatch stream processing, graph computing, and even SQL.

By contrast, Hadoop is a data platform. It is one of many that can run Spark, because Spark is platform-independent. So you could also run Spark on Cassandra, other NoSQL data store, or SQL databases, but Hadoop has been the most popular target right now.

And, not to forget Apache Mesos, another AMPLab discovery for cluster management to which Spark had originally been closely associated.

There’s little question about the excitement level over Spark. By now the headlines have poured out over IBM investing $300 million, committing 3500 developers, establishing a Spark open source development center a few BART stops from AMPLab in San Francisco, and aiming directly and through partners to educate 1 million professionals on Spark in the next few years (or about 4 – 5x the current number registered for IBM’s online Big Data university). IBM views Spark’s strength as machine learning, and wants to make machine learning a declarative programming experience that will fellow in SQL’s footsteps with its new SystemML language (which it plans to open source).

That’s not to overshadow Databricks’ announcement that its Spark developer cloud, in preview over the past year, has now gone GA. The big challenge facing Databricks was making its cloud scalable and sufficiently elastic to meet the demand – and not become a victim of its own success. And there is the growing number of vendors that are embedding Spark within their analytic tools, streaming products, and development tools. The release announcement of Spark 1.4 brings new manageability and capability for automatically renewing Kerberos tokens for long running processes like streaming. But there remain growing pains, like reducing the number of moving parts needed to make Spark a first class citizen with Hadoop YARN.

By contrast, last week was about Hadoop becoming more manageable and more amenable to enterprise infrastructure, like shared storage as our colleague Merv Adrian pointed out. Not to mention enduring adolescent factional turf wars.

It’s easy to get excited by the idealism around the shiny new thing. While the sky seems the limit, the reality is that there’s lots of blocking and tackling ahead. And the need for engaging, not only developers, but business stakeholders through applications, rather than development tools, and success stories with tangible results. It’s a stage that the Hadoop community is just starting to embrace now.

HP does a 180 – Now it’s Apotheker’s Company

HP chose the occasion of its Q3 earnings call to drop the bomb. The company that under Mark Hurd’s watch focused on Converged Infrastructure, spending almost $7 billion to buy Palm, 3COM, and 3PAR, is now pulling a 180 in ditching both the PC and Palm hardware business, and making an offer to buy Autonomy, one of the last major independent enterprise content management players, for roughly $11 billion.

At first glance, the deal makes perfect sense, given Leo Apotheker’s enterprise software orientation. From that standpoint, Apotheker has made some shrewd moves, putting aging enterprise data warehouse brand Neoview out of its misery, following up weeks later with the acquisition of Advanced SQL analytics platform provider Vertica. During the Q3 earnings call, Apotheker stated the obvious as to his comfort factor with Autonomy: “I have spent my entire professional life in software and it is a world that I know well. Autonomy is very complementary.”

There is potential synergy between Autonomy and Vertica, with Autonomy CEO Mike Lynch (who will stay on as head of the unit, reporting to Apotheker) that Autonomy’s user screens provide the long missing front end to Vertica, and that both would be bound by a common “information layer.” Of course, the acquisition not being final, he did not give details on what that layer is, but for now we’d assume that integration will be at presentation and reporting layer. There is clearly a lot more potential here — Vertica for now only holds structured data while Autonomy’s IDOL system holds everything else. In the long run we’d love to see federated metadata and also an extension of Vertica to handle unstructured data, just as Advanced SQL rivals like Teradata’s Aster Data already do.

Autonomy, according to my Ovum colleague Mike Davis who has tracked the company for years, is one of only three ECM providers that have mastered the universal document viewer – Oracle’s Stellent and an Australian open source player being the others. In contrast to HP (more about that in a moment), Autonomy is quite healthy with the latest quarterly revenues up 16% year over year, operating margins in the mid 40% range, and a run rate that will take the company to its first billion dollar year.

Autonomy is clearly a gem, but HP paid dearly for it. During Q&A on the earnings call, a Wall street analyst took matters back down to earth, asking whether HP got such a good deal, given that it was paying roughly 15% of its market cap for a company that will only add about 1% to its revenues.

Great, expensive acquisition aside, HP’s not doing so well these days. Excluding a few bright spots, such as its Fortify security software business, most of HP’s units are running behind last year. Q3 net revenue of $31.2 billion was up only 1% over last year, but down 2% when adjusted for constant currency. By contrast, IBM’s most recent results were up 12% and 5%, respectively, when currency adjusted. Dennis Howlett tweeted that it was now HP’s turn to undergo IBM’s near-death experience.

More specifically, HP Software was the bright spot with 20% growth year over year and 19.4% operating margin. By contrast, the printer and ink business – long HP’s cash cow – dropped 1% year over year with the economy dampening demand from the commercial side, not to mention supply chain disruptions from the Japanese tsunami.

By contrast, services grew only 4%, and is about to kick in yet another round of transformation. John Visenten, who ran HP’s Enterprise services in the Americas region, comes in to succeed Ann Livermore. The problem is, as Ovum colleague John Madden states it, HP’s services “has been in a constant state of transformation” that is making some customers’ patience wear thin. Ever since acquiring EDS, HP has been trying – and trying – to raise the legacy outsourcing business higher up the value chain, with its sights literally set in the cloud.

The trick is that as HP tries aiming higher up the software and services food chain, it deals with a market that has longer sales cycles and long-term customer relationships that prize stability. Admittedly, when Apotheker was named CEO last fall, along with enterprise software veteran ray Lane to the board, the conventional wisdom was that HP would train its focus on enterprise software. So to that extent, HP’s strategy over the past 9 months has been almost consistent – save for earlier pronouncements on the strategic role of the tablet and WebOS business inherited with Palm.

But HP has been around for much longer than 9 months, and its latest shifts in strategy must be viewed with a longer perspective. Traditionally an engineering company, HP grew into a motley assortment of businesses. Before spinning off its geeky Agilent unit in 1999, HP consisted of test instruments, midrange servers and PCs, a token software business, and lest we forget, that printer business. Since then:
• The 2001 acquisition of Compaq that cost a cool $25 billion, under Carly Fiorina’s watch. That pitted it against Dell and caused HP to assume an even more schizoid personality as consumer and enterprise brand.
• Under Mark Hurd’s reign, software might have grown a bit (they did purchase Mercury after unwittingly not killing off their OpenView business), but the focus was directed at infrastructure – storage, switches, and mobile devices as part of the Converged Infrastructure initiative.
• In the interim, HP swallowed EDS, succeeding at what it failed to do with its earlier ill-fated pitch for PwC.

Then (1) Hurd gets tossed out and (2) almost immediately lands at Oracle; (3) Oracle pulls support for HP Itanium servers, (4) HP sues Oracle, and (5) its Itanium business sinks through the floor.

That sets the scene for today’s announcements that HP is “evaluating a range of options” (code speak for likely divestment) for its PC and tablet business – although it will keep WebOS on life support as its last gasp in the mobile arena. A real long shot: HP’s only hope for WebOS might be Android OEMs not exactly tickled pink about Google’s going into the handset business by buying Motorola’s mobile unit.

There is logical rationale for dropping those businesses – PCs have always been a low margin business in both sales and service, in spite of what it claimed to be an extremely efficient supply chain. Although a third of its business, PCs were only 13% of HP’s profits, and have been declining in revenue for several years. PCs were big enough to provide a distraction and low enough margin to become a drain. And with Palm, HP gained an eloquent OS, but with a damaged brand that was too late to become the iOS alternative – Google had a 5-year headstart. Another one bites the dust.

Logical moves, but it’s fair to ask, what is an HP? Given HP’s twists, turns, and about-faces, a difficult one to answer. OK, HP is shedding its consumer businesses – except printers and ink because in normal times they are too lucrative – but HP still has all this infrastructure business. It hopes to rationalize all this in becoming a provider of cloud infrastructure and related services, with a focus on information management solutions.

As mentioned above, enterprises crave stability, yet HP’s track record over the past decade has been anything but. To be an enterprise provider, technology providers must demonstrate that they have a consistent strategy and staying power because enterprise clients don’t want to be left with orphaned technologies. To its credit, today’s announcements show the fruition of Apotheker’s enterprise software-focused strategy. But HP’s enterprise software customers and prospects need the assurance that HP won’t pull another about face when it comes time for Apotheker’s successor.

Postscript: Of course we all know how this one ended up. One good 180 deserved another. Exit Apotheker stage left. Enter Meg Whitman stage right. Reality has been reversed.

A Week of BPM

We’re closing out a week that’s been bookended by a couple of BPM-related announcements from IBM and upstart Active Endpoints. The common thread is the quest for greater simplicity, the difference is that one de-emphasizes the B word itself, with a focus of being a simple productivity tool that makes complex enterprise apps (in this case easier, while the other amplifies a message loud and clear that this is the BPM that you always wanted BPM to be.

The backstory is that BPM has traditionally appealed to a narrow audience of specially-skilled process analysts and modelers, and has yet to achieve the mass market status. Exhibit One? One of the larger independent BPM players (Pegasystems) still standing is a $330 million company. In the interim, there has been significant consolidation in this community as BPM has become one of the components that are expected in a middleware stack. A milestone has been Oracle’s latest Fusion BPM 11g R1, which unifies the engines and makes them first class citizens of the Fusion middleware stack. While such developments have solidified BPM’s integration story, on its own that has not made BPM the enterprise’s next ERP.

The irony is that BPM has long been positioned as the business stakeholder’s path to application development, with the implication that a modeling/development environment that uses the terms of business process rather than programmatic commands should appeal to a higher audience. The drawback is that to get there, most BPM tools relied on proprietary languages that limited use to… you guessed it… a narrow cadre of business process architects.

Just over a year ago, IBM acquired Lombardi, one of the more innovative independents that has always stressed simplicity. Not that IBM was lacking in BPM capability, but they were based on aging engines centered around integration and document management/workflow use cases, respectively. As IBM software has not been known by its simplicity (many of its offerings still consist of multiple products requiring multiple installs, or a potpourri of offerings targeted for separate verticals or use cases), the fear was that Lombardi would get swallowed alive and emerge unrecognizable.

The good news is that Lombardi in technology and product has remained alive and well. We’ll characterize IBM Business Process Manager 7.5 as Lombardi giving IBM’s BPM suite a heart transplant; it dropped in a new engine to reinvigorate the old. As for the peanut gallery, Janelle Hill characterized it as IBM getting a Lombardotomy; Neil Ward-Dutton and Sandy Kemsley described it as a reverse takeover; Ward-Dutton debated Clay Richardson that the new release was more than just a new paint job, while Bruce Silver assessed it as IBM’s BPM endgame.

So what did IBM do in 7.5? The modeling design environment and repository are Lombardi’s, with the former IBM WebSphere Process Server (that’s the integration BPM) now being ported over to the Lombardi engine. It’s IBM’s initial answer to Oracle’s unification of process design and runtimes with the latest Fusion BPM.

IBM is not all the way there yet. Process Server models, which were previously contained in a flat file, are now stored in the repurposed Lombardi repository, but that does not yet make the models fully interoperable. You can just design them in the same environment and store them in the same repository. That’s a good start, and at least it shows that the Lombardi approach will not get buried. Beyond the challenge of integrating the model artifacts, the bigger question to us is whether the upscaling and extending of Lombardi to cover more use cases might be too much of a good thing. Can it avoid the clutter of Microsoft Office that resulted from functional scope creep?

As for FileNet, IBM’s document-centric BPM, that’s going to wait. It’s a very different use case – and note that even as Oracle unified two of its BPM engines, it has markedly omitted the document management piece. Document centric workflow is a well-ingrained use case and has its own unique process patterns, so the open question is whether it is realistic to expect that such a style of process management can fit in the same engine, or simply exist as external callable workflows.

At the other end of the week, we were treated to the unveiling of Active Endpoints new Cloud Extend release. As we tweeted, this is a personality transplant for Active Endpoints, as the company’s heritage has been with more geeky BPEL, and even geekier branded tool, ActiveVOS.

OK, the Cloud branding is a break from geekdom, but it ain’t exactly original – there’s too much Cloud this and Cloud that going around the ISV community these days.

More to the point, Cloud Extend does not really describe what Active Endpoints is doing with this release. Cloud Extend is not cloud-enabling your applications, it is just enabling your application that in this case happens to run in the cloud (the company also has an on premises version of this tool with the equally unintuitive brand Socrates).

In essence, Cloud Extend adds a workflow shell to so that you can design workflows in a manner that appears as simple as creating a Visio diagram, while providing the ability to save and reuse them. There’s BPEL and BPMN underneath the hood, but in normal views you won’t see them. It also has neat capabilities that help you filter out extraneous workflow activities when working on a particular process. The result is that you have screens that drive users to interact with Salesforce in a consistent manner, replacing the custom coding of systems integrators with a tool that should help systems integrators perform the same Salesforce customization job quicker. Clearly, Salesforce should be just the first stop for this technology; we’d expect that Active Endpoints will subsequently target other enterprise apps with its engine in due course.

We hate to spill the beans, but under the covers, this is BPM. But that’s not the point – and in fact it opens an interesting argument as to whether we should simply take advantage of the technology without having to make a federal case about it. It’s yet another approach to make the benefits of BPM more accessible to people who are not modeling notation experts, which is a good thing.

How should enterprises navigate forks in the Hudson?

A South Jersey neighbor of ours — runner, educator, and open source mischief maker Bob Bickelrecently blogged a status report on what’s been going on with the Jenkins open source project ever since it split off from Hudson.

That’s prompted us to wade in to ask the question that’s been glossed over by the theatrics: what about the user?

For background: This is a case of a promising grassroots technology that took off beyond expectation and became a victim of its own success: governance just did not keep up with the projects growing popularity and attractiveness to enterprise developers. The sign of a mature open source project is that its governing body has successfully balanced the conflicting pressures of constant innovation vs. the need to slow things down for stable, enterprise-ready releases. Hudson failed in this regard.

That led to unfortunate conflicts that degenerated to stupid, petty, and entirely avoidable personality squabbles that in the end screwed the very enterprise users that were pumping much of the oxygen in. We know the actors on both sides – who in their everyday roles are normal, reasonable people that got caught up in mob frenzy. Both sides shot themselves in the feet as matters careened out of control. Go to SD Times if you want the blow by blow.

So what is Hudson – or Jenkins – and why is it important?

Hudson is a continuous integration (CI) server open source project that grew very popular for Java developers. The purpose of a CI server is to support agile practices of continuous integration with a server that maintains the latest copy of the truth. The project was the brainchild of former Sun and Oracle, and current Cloudbees employee Kohsuke Kawaguchi.

Since the split, it has forked into the Hudson and Jenkins branches, with Jenkins having attracted the vast majority of committers and much livelier mailing list activity. Bickel has given us a good snapshot from the Jenkins side with which he’s aligned: a diverse governance body has been established that plans to publish results of its meetings and commit, not only to continuing the crazy schedule of weekly builds, but “stable” quarterly releases. The plan is to go “stable” with the recent 1.400 release, for which a stream of patches is underway.

So most of the committers have gone to Jenkins. Case closed? From the Hudson side, Jason van Zyl of Sonatype, whose business was built around Apache Maven, states that the essential plug-ins are already in the existing Hudson version, and that the current work is more about consolidating the technology already in place, testing it, and refactoring to comply with JSR 330, built around the dependency injection technology popularized by the Spring framework. Although the promises are to keep the APIs stable, this is going to be a rewrite of the innards of Hudson.

Behind the scenes, Sonatype is competing on the natural affinity between Maven and Hudson, which share a large mass of common users, while the emerging force behind Jenkins is Cloudbees, which wants to establish itself as the leading Java development in the cloud platform.

So if you’re wondering what to do, join the crowd. There are bigger commercial forces at work, but as far as you’re concerned, you want stable releases that don’t break the APIs you already use. Jenkins must prove it’s not just the favorite of the hard core, but that its governance structure has grown up to provide stability and assurance to the enterprise, while Hudson must prove that the new rewrite won’t destabilize the old, and that it has managed to retain the enterprise base in spite of all the noise otherwise.

Stay tuned.

April 28, 2011 update. Bob Bickel has reported to me that since the “divorce,” that Jenkins has drawn 733 commits vs 172 for Hudson.

May 4, 2011 update. Oracle has decided to submit the Hudson project to the Eclipse Foundation. Eclipse board member Mik Kersten voices his support of this effort. Oracle says it didn’t consider this before because going to Eclipse was originally perceived as being too heavyweight. This leaves us wondering, why didn’t Oracle propose to do this earlier? where was the common sense?

Big Data analytics in the cloud could be HP’s enterprise trump card

Unfortunately, scheduling conflicts have kept us from attending Leo Apotheker’s keynote today before the HP Analyst Summit in San Francisco. But yesterday, he tipped his cards for his new software vision for HP before a group of investment analysts. HP’s software focus is not to reinvent the wheel – at least where it comes to enterprise apps. Apotheker has to put to rest that he’s not about to do a grudge match and buy the company that dismissed him. There is already plenty of coverage here, interesting comment from Tom Foremski (we agree with him about SAP being a non-starter), and the Software Advice guys who are conducting a poll.

To some extent this has been little surprise with HP’s already stated plans for WebOS and its recently announced acquisition of Vertica. We do have one question though: what happened to Converged Infrastructure?

For now, we’re not revisiting the acquisitions stakes, although if you follow #HPSummit twitter tags today, you’ll probably see lots of ideas floating around today after 9am Pacific time. We’ll instead focus on the kind of company HP wants to be, based on its stated objectives.

1. Develop a portfolio of cloud services from infrastructure to platform services and run the industry’s first open cloud marketplace that will combine a secure, scalable and trusted consumer app store and an enterprise application and services catalog.

This hits two points on the checklist: provide a natural market for all those PCs that HP sells. The next part is stating that HP wants to venture higher up the food chain than just sell lots of iron. That certainly makes sense. The next part is where we have a question: offering cloud services to consumers, the enterprise, and developers sounds at first blush that HP wants its cloud to be all things to all people.

The good news is that HP has a start on the developer side where it has been offering performance testing services for years – but is now catching up to providers like CollabNet (with which it is aligned and would make a logical acquisition candidate) and Rally in offering higher value planning services for the app lifecycle.

In the other areas – consumer apps and enterprise apps – HP is starting from square one. It obviously must separate the two, as cloud is just about the only thing that the two have in common.

For the consumer side, HP (like Google Android and everyone else) is playing catchup to Apple. It is not simply a matter of building it and expecting they will come. Apple has built an entire ecosystem around its iOS platform that has penetrated content and retail – challenging Amazon, not just Salesforce or a would-be HP, using its user experience as the basis for building a market for an audience that is dying to be captive. For its part, HP hopes to build WebOS to have the same “Wow!” factor as the iPhone/iPad experience. It’s got a huge uphill battle on its hands.

For the enterprise, it’s a more wide open space where only Salesforce’s AppExchange has made any meaningful mark. Again, the key is a unifying ecosystem, with the most likely outlet being enterprise outsourcing customers for HP’s Enterprise Services (the former EDS operation). The key principle is that when you build a market place, you have to identity who your customers are and give them a reason to visit. A key challenge, as we’ve stated in our day job, is that enterprise customers are not the enterprise equivalent of those $2.99 apps that you’ll see in the AppStore. The experience at Salesforce – the classic inversion of the long tail – is that the market is primarily for add-ons to the CRM application or use of the development platform, but that most entries simply get buried deep down the list.

Enterprise apps marketplaces are not simply going to provide a cheaper channel for solutions that still require consultative sells. We’ve suggested that they adhere more to the user group model, which also includes forums, chats, exchanges of ideas, and by the way, places to get utilities that can make enterprise software programs more useful. Enterprise app stores are not an end in themselves, but a means for reinforcing a community — whether it be for a core enterprise app – or for HP, more likely, for the community of outsourcing customers that it already has.

2. Build webOS into a leading connectivity platform.
HP clearly hopes to replicate Apple’s success with iOS here – the key being that it wants to extend the next-generation Palm platform to its base of PCs and other devices. This one’s truly a Hail Mary pass designed to rescue the Palm platform from irrelevance in a market where iOS, Android, Adobe Flash, Blackberry, and Microsoft Windows 7/Silverlight are battling it out. Admittedly, mobile developers have always tolerated fragmentation as a fact of life in this space – but of course that was when the stakes (with feature phones) were rather modest. With smart device – in all its varied form factors from phone to tablet – becoming the next major consumer (and to some extent, enterprise) frontier, there’s a new battle afresh for mindshare. That mindshare will be built on the size of the third party app ecosystem that these platforms attract.

As Palm was always more an enterprise rather consumer platform – before the Blackberry eclipsed it – HP’s likely WebOS venue will be the enterprise space. Another uphill battle with Microsoft (that has the office apps), Blackberry (with its substantial corporate email base), and yes, Apple, where enterprise users are increasingly sneaking iPhones in the back door, just like they did with PCs 25 years ago,

3. Build presence with Big Data
Like (1), this also hits a key checkbox for where to sell all those HP PCs. HP has had a half-hearted presence with the discontinued Neoview business. The Vertica acquisition was clearly the first one that had Apotheker’s stamp on it. Of HP’s announced strategies, this is the one that aligns closest with the enterprise software strategy that we’ve all expected Apotheker to champion. Obviously Vertica is the first step here – and there are many logical acquisitions that could fill this out, as we’ve noted previously, regarding Tibco, Informatica, and Teradata. The importance is that classic business intelligence never really suffered through the recession, and arguably, big data is becoming the next frontier for BI that is becoming, not just a nice to have, but increasingly an expected cost of competition.

What’s interesting so far is that in all the talk about big Data, there’s been relatively scant attention paid to utilizing the cloud to provide the scaling to conduct such analytics. We foresee a market where organizations that don’t necessarily want to buy all that and that use large advanced analytics on an event-driven basis, to consume the cloud for their Hadoop – or Vertica – runs. Big Data analytics in the cloud could be HP’s enterprise trump card.

The Second Wave of Analytics

Throughout the recession, business intelligence (BI) was one of the few growth markets in IT. Given that transactional systems that report “what” is happening are simply the price of entry for remaining in a market, BI and analytics systems answer the question of “why” something is happening, and ideally, provide intelligence that is actionable so you can know ‘how’ to respond. Not surprisingly, understanding the whys and hows are essential for maximizing the top line in growing markets, and pinpointing the path to survival in down markets. The latter reason is why BI has remained one of the few growth areas in the IT and business applications space through the recession.

Analytic databases are cool again. Teradata, the analytic database provider with a 30-year track record, had its strongest Q2 in what was otherwise a lousy 2010 for most IT vendors. Over the past year, IBM, SAP, and EMC took major acquisitions in this space, while some of the loudest decibels at this year’s Oracle OpenWorld were over the Exadata optimized database machine. There are a number of upstarts with significant venture funding, ranging from Vertica to Cloudera, Aster Data, ParAccel and others that are not only charting solid growth, but the range of varied approaches that reveal that the market is far from mature and that there remains plenty of demand for innovation.

We are seeing today a second wave of innovation in BI and analytics that matches the ferment and intensity of the 1995-96 era when data warehousing and analytic reporting went commercial. There isn’t any one thing that is driving BI innovation. At one end of the spectrum, you have Big Data, and at the other end, Fast Data — the actualization of real-time business intelligence. Advances in commodity hardware, memory density, parallel programming models, and emergence of NoSQL, open source statistical programming languages, cloud are bringing this all within reach. There is more and more data everywhere that’s begging to be sliced, diced and analyzed.

The amount of data being generated is mushrooming, but much of it will not necessarily be persisted to storage. For instance, if you’re a power company that wants to institute a smart grid, moving from monthly to daily meter reads multiplies your data volumes by a factor of 30, and if you decide to take readings every 15 minutes, better multiple all that again by a factor of 100. Much of this data will be consumed as events. Even if any of it is persisted, traditional relational models won’t handle the load. The issue is not only because of overhead of operating all the iron, but with it the concurrent need for additional equipment, space, HVAC, and power.

Unlike the past, when the biggest databases were maintained inside the walls of research institutions, public sector agencies, or within large telcos or banks, today many of the largest data stores on the Internet are getting opened through APIs, such as from Facebook. Big databases are no longer restricted to use by big companies.

Compare that to the 1995-96 time period when relational databases, which made enterprise data accessible, reached critical mass adoption; rich Windows clients, which put powerful apps on the desktop, became enterprise standard; while new approaches to optimizing data storage and productizing the kind of enterprise reporting pioneered by Information Builders, emerged. And with it all came the debates OLAP (or MOLAP) vs ROLAP, star vs. snowflake schema, and ad hoc vs. standard reporting. Ever since, BI has become ingrained with enterprise applications, as reflected by recent consolidations with the acquisitions of Cognos, Business Objects, and Hyperion by IBM, SAP, and Oracle. How much more establishment can you get?

What’s old is new again. When SQL relational databases emerged in the 1980s, conventional wisdom was that the need for indexes and related features would limit their ability to perform or scale to support enterprise transactional systems. Moore’s Law and emergence of client/server helped make mockery of that argument until the web, proliferation of XML data, smart sensory devices, and realization that unstructured data contained valuable morsels of market and process intelligence, in turn made mockery of the argument that relational was the enterprise database end-state.

In-memory databases are nothing new either, but the same hardware commoditization trends that helped mainstream SQL has also brought costs of these engines down to earth.

What’s interesting is that there is no single source or style of innovation. Just as 1995 proved a year of discovery and debate over new concepts, today you are seeing a proliferation of approaches ranging from different strategies for massively-parallel, shared-nothing architectures; columnar databases; massive networked and hierarchical file systems; and SQL vs. programmatic approaches. It is not simply SQL vs. a single post-SQL model, but variations that mix and match SQL-like programming with various approaches to parallelism, data compression, and use of memory. And don’t forget the application of analytic models to complex event processes for identifying key patterns in long-running events or coming through streaming data that is arriving in torrents too fast and large to ever consider putting into persistent storage.

This time, much of the innovation is coming from the open source world as evidenced by projects like the Java-based distributed computing platform Hadoop developed by Google; MapReduce parallel programming model developed by Google; the HIVE project that makes MapReduce look like SQL; the R statistical programming language. Google has added fuel to the fire by releasing to developers its BigQuery and Prediction API for analyzing large sets of data and applying predictive algorithms.

These are not simply technology innovations looking for problems, as use cases for Big Data or real-time analysis are mushrooming. Want to extend your analytics from structured data to blogs, emails, instant messaging, wikis, or sensory data? Want to convene the world’s largest focus group? There’s sentiment analysis to be conducted from Facebook; trending topics for Wikipedia; power distribution optimization for smart grids; or predictive analytics for use cases such as real-time inventory analysis for retail chains, or strategic workforce planning, and so on.

Adding icing to the cake was an excellent talk at a New York Technology Council meeting by Merv Adrian, a 30-year veteran of the data management field (who will soon be joining Gartner) who outlined the content of a comprehensive multi-client study on analytic databases that can be downloaded free from Bitpipe.

Adrian speaks of a generational disruption occurring to the database market that is attacking new forms of age old problems: how to deal with expanding datasets while maintaining decent performance. as mundane as that. But the explosion of data coupled with commoditization of hardware and increasing bandwidth have exacerbated matters to the point where we can no longer apply the brute force approach to tweaking relational architectures. “Most of what we’re doing is figuring out how to deal with the inadequacies of existing systems,” he said, adding that the market and state of knowledge has not yet matured to the point where we’re thinking about how the data management scheme should look logically.

So it’s not surprising that competition has opened wide for new approaches to solving the Big and Fast Data challenges; the market has not yet matured to the point where there are one or a handful of consensus approaches around which to build a critical mass practitioner base. But when Adrian describes the spate of vendor acquisitions over the past year, it’s just a hint of things to come.

Watch this space.

First Takes from Rational Innovate 2010

To paraphrase Firesign Theatre, how can you in two places at once when you’re not anywhere at all? We would have preferred being in at least two places if not more today, what with Microsoft TechEd in New Orleans, IBM Rational’s Innovate conference in Orlando, and for spare change, PTC’s media and analyst day just a cab ride away.

Rational’s message, which is that software is the invisible glue of smarter products, was much more business grounded than its call a year ago for collaborative software development, which we criticized back then as more like a call for repairing your software process as opposed to improving your core business.

The ongoing name changes in the conference reflect Rational’s repositioning, which the Telelogic acquisition closes the circled. Two years ago, the event was called the Rational Software Development Conference; last year they eliminated the word “Development,” and this year replaced “Software” with “Innovate.” Vanishing of Software from the conference title is consistent with the invisible glue motif.

Software is the means, not the end. Your business needs to automate its processes or make better products in a better way. Software gets you there. As IBM’s message is Smarter Planet, Rational has emphasized Smarter Products rather than “Smarter Business Processes. It’s not just a matter of force fitting to a corporate slogan; Rational estimates compound annual growth of its systems of systems (ergo, mostly the Telelogic side of the business) to be well into the double digits over the next 4 – 5 years, compared to a fraction of that for its more traditional enterprise software modernization and IT optimization businesses.

Telelogic played the starring role for Smart Products. The core of the strategy is a newly announced Integrated Product Management umbrella of products and services for helping companies gain better control over their product lifecycles. Great lead, but for now scant detail. IBM’s strategy leverages Telelogic’s stakehold with companies making complex engineered products with other assets such as IBM’s vertical industry frameworks. We also see strong potential synergy with Maximo, which completes the product lifecycle with the piece that follows the product’s service life.

IBM’s product management strategy places it on a collision course with the PLM community. IBM lays claim to managing the logical constraints of product development – coming from its base in requirements and portfolio management. By contrast, IBM claims that PLM vendors know only the physical constraints. The PLM folks – especially PTC – are developing their own counter strategies. For starters, there is PTC’s plan to offer Eclipse tooling that will start with its own branded support of CollabNet’s open source Subversion source code repository. Folks, this is the beginning of a grudge match that for now is only reinforcing the culture/turf divides demarcating software from the more traditional physical engineering disciplines.

Rational also introduced a workbench idea which is a promising way to use SOA to mix and mash capabilities from across its wide portfolio to address specific vertical industry problems or horizontal software development requirements. The idea is promising, but for now mostly vision. These workbenches take products – mostly Jazz offerings – and mash them up using the SOA architecture of Jazz framework to create configurable integrations for addressing specific business and software delivery problems. We saw a demo of an early example that provided purpose built integrations that provided role-based views for correlating requirements to functional and performance tests, through to specific builds that would be accessed through tools that BAs, testers, and developers use. On the horizon, IBM Rational is planning vertical workbenches that apply Rational tools with some of its vertical industry frameworks addressing segments such as Android mobile device development, cybersecurity, and smarter cities.

The idea for the workbenches is that they would not be rigid products but instead configurable mixes and matches of Rational and partner content, through interfaces developed through OSLC, IBM’s not-really-open-source community for building Jazz interfaces. A good use for IBM’s varied software and industry framework portfolio, it will be challenging to sell as these are not standard catalog products, and ideally, not customized systems integrations. Sales needs to think out of the box to sell these workbenches while customers need assurance that they are not paying for one-off systems integration engagements.

The good news is that with IBM’s expanding cloud offerings for Rational, that these workbenches could be readily composed and delivered through the cloud on much shorter lead time compared to delivering conventional packaged software. Aiding the workbenches is a new flexible token licensing system that expands on a model originated by Telelogic. Tokens are generic licenses that give you access to any piece of software within a designated group, allowing the customer to mix and match software, or for IBM to do so through its Rational Workbenches. IBM is combining it with the idea of term licensing to make this suited for cloud customers who are allergic to perpetual licensing. For now, tokens are available only for Telelogic and Jazz offerings, but IBM Software Group is investigating applying it to the other brands.

So how do you cost justify these investments for the software side of smarter products? Rational GM Danny Sabbah’s keynote on software econometrics addressed the costing issue based on Rational’s invisible glue, means not end premise. We agree with Sabbah that traditional metrics for software development, such as defect rates, are simply internal metrics that fail to express business value. Instead, Sabbah urges measuring business outcomes of software development.

Sabbah’s arguments are hardly new. They rehash old debates over the merits of “hard” vs. “soft,” or tangible vs intangible costing. Traditionally, new capital investments, such as buying new software (or paying to develop it) were driven by ROI calculations that computed how much money you’d save; in mist cases, those savings came form direct labor. Those were considered “hard” numbers because it was fairly straightforward to calculate how much labor some piece of automation would save. Savings are OK for the bottom line but do nothing for the top line. However, if you automated a process that would allow you to shorten product lead time by 3 weeks, how much money would you make with the extra selling time, and by getting to market earlier, how much benefit would accrue by becoming first to market? Common sense is that all other factors being equal, getting a jump in sales should translate to more revenue, and in turn, bolster your competitive position. But such numbers were considered “soft” because there were few ways to scientifically quantify the benefits.

Sabbah’s plea for software econometrics simply revives this debate.

SpringSource’s surround ’em on every cloud strategy

We’re trying to stifle the puns with SpringSource’s announcement that now it’s also become the preferred Java development platform for the Google Apps Engine. Like SpringSource on a roll… OK that’s out of our system.

But coming atop the recent announcements of VMforce, along with key acquisitions of Gemstone, and to a lesser extent, RabbitMQ, we’d have to agree with VMware’s CTO Steve Herrod that VMware’s acquisition of SpringSource has not slowed the company down. Congrats to SpringSource’s Rod Johnson for keeping the momentum going under VMware’s watch, and hats off to VMware for making it all happen.

Short but sweet (we’re behind with report deadlines for our day job), SpringSource’s cloud strategy is to become as ubiquitous as possible. Grab every potential Java PaaS platform in sight, do end-arounds with IBM and Oracle who have barely placed their feet inside the door for Java development platforms as a service. Their move reminds us of Duane Reade, the well-known Manhattan pharmacy chain whose long-time strategy was to saturate every street corner location to crowd out rivals like CVS and Walgreens out of the market; as a desperation maneuver, Walgreen finally bit the bullet and snapped up Duane Reade, but in deference to its New York brand recognition, kept the name.

But given that Google app Engine is not exactly a mainstream enterprise platform (Google still struggles to understand the enterprise), for SpringSource the announcement carries more light than heat. The move nonetheless brings a halo effect to Google’s Apps Engine, which becomes more extensible with the Spring framework and with cool extras like Spring Roo, which eliminates a lot of coding legwork and is a good match for Google’s Web Toolkit, which provides a warmer, fuzzier, but more importantly simpler way to piece together web apps. More importantly, it means you can now write and run something meaningful on Google App Engine without having to rely on Python. It provides clever potential upside for Google’s newly announced Apps Engine for Business.

SpringSource’s strategy is an end-around, not only to IBM and Oracle, but also to VMware itself. The latest announcement vindicates in part VMware’s strategy for SpringSource, which we believe has been about building the de facto standard Java cloud platform. While we give hats off toe VMware for accelerating SpringSource’s expansion of its middleware stack and cloud strategy, VMware has been slower to leverage SpringSource internally, whether it be with:
1. Promotion of vCloud. That’s remains more a future bet for leapfrogging VMware past the increasingly commoditized hypervisor business, leveraging its market-leading virtualization management technologies to establish them as de facto standards for managing virtualization in the cloud.
2. Cross-fertilizing SpringSource’s dependency injection capabilities into virtualization, with the idea of simplifying virtualization in the same way that the original Spring framework simplified Java deployment.

SpringSource buying Gemstone: VMware’s written all over it

There they go again. Barely a month after announcing the acquisition of message broker Rabbit Technologies, SpringSource is adding yet one more piece to its middleware stack: it has announced the acquisition of Gemstone for its distributed data caching technology.

SpringSource’s Rod Johnson told us that he was planning to acquire such a technology even before VMware came into the picture, but make no mistake about it, VMware’s presence upped the ante.

SpringSource has been looking to fill out its stack vs. Oracle and IBM ever since its cornerstone acquisition of Covalent (which brought the expertise behind Apache Tomcat and bequeathed the world tc Server) two years ago. Adding Gemstone’s Gemfire becomes SpringSource’s response to Oracle Coherence and IBM WebSphere XD. The technologies in question allow you to replicate data from varied sources into a single logical cache, which is critical if those sources are highly dispersed.

So what about VMware? Wasn’t SpringSource planning to grow its stack anyway? There are deeper stakes at play: VMware’s aspiration to make cloud and virtualization virtually synonymous – or at least to make virtualization essential to the cloud – falls apart if you don’t have a scalable, high performance way to manage and access data. Enterprises using the cloud are not likely to move all their data there, and need a solution that allows hybrid strategies that will invariably involve a mix of cloud- and on premised-based data resources to be managed and accessed efficiently. Distributed data caching is essential.

So the next question is why SpringSource, as a historically open source company that has always made open source acquisitions, buy open source Terracotta instead? Chances are, were SpringSource still independent, it probably would have, but VMware brings deeper pockets and deeper aspirations. Gemstone is the company that sold object-oriented databases back in the 90s, and once it grew obvious that they (and other OODBMS rivals like Object Store) weren’t going to become the next Oracles, they adapted their expertise to caching. Gemfire emerged in 2002 and provided Wall Street and defense agencies an off the shelf alternative to homegrown development or a best of breed strategy. By comparison, although Terracotta boasts several Wall Street clients, its core base is in web caching for high traffic B2C oriented websites.

Bottom line: VMware needs the scale.

There are other interesting pieces that Gemstone brings to the party. It is currently developing SQLFabric, a project that embeds the Apache Derby open source relational database into Gemfire to make its distributed data grid fully SQL-compliant, which would be very strategic to VMware and SpringSource. It also has a shot-in-the-dark project, MagLev, which is more a curiosity for the mother ship. Conceivably it could provide the impetus for SpringSource to extend to the Ruby environment, but would require a lot more development work to productize.

Obviously as the deal won’t close immediately, both entities must be coy about their plans other than the obvious commitment to integrate products.

But there’s another angle that will be worth exploring once the ink dries: SpringSource has been known for simplicity. The Spring framework provided a way to abstract all the complexity out of Java EE, while tc Server, based on Tomcat, carries but a subset of the bells and whistles of full Java EE stacks. But Gemfire is hardly simple, and the market for distributed data grids has been limited to organizations with extreme processing needs who have extreme expertise and extreme budgets. Yet the move to cloud will mean, as noted above, that the need for logical data grids will trickle down to more of the enterprise mainstream, although the scope of the problem won’t be as extreme. It would make sense for the Spring framework to extend its dependency injection to a “lite” version of Gemfire (Gemcloud?) to simplify the hassle of managing data inside and outside of the cloud.