Are data warehouses becoming victims of their own success? It’s hard to ignore the reality that the appetite for BI analytics has grown steadily – it’s one of the few enterprise IT software markets to continue enjoying steady growth over the past decade or more. While BI has not yet morphed into that long-promised democratic knowledge tool for the masses, there’s little question that it has become firmly embedded as a pillar of enterprise IT. Increasingly, analytics are being integrated with transactions, and there’s a movement for self-service BI that aims to address the everyman gap.
On the data side, there’s little question about the impact of data warehousing and BI. Enterprises have increasingly voracious appetite for data. And there are more kinds of data coming in. As part of our day job, we globally surveyed large enterprise data warehouse users (DWs over a terabyte) a couple years back and discovered over half of them were already routinely conducting text analytics in addition to conventionally structured data.
While SQL platforms have steadily increased scale and performance (it’s easy to forget that 30 years ago, conventional wisdom was that they would never scale to support enterprise OLTP systems), the legwork of operating data warehouses is becoming a source of bottlenecks. Data warehouses and transactional systems have traditionally been kept apart because their workloads significantly differed; they were typically kept at arm’s length with separate staging servers in the middle tier, where ETL operations were performed.
Yet, surging data volumes are breaking this pattern. With growing data volumes has come an emerging pattern where data and processing are brought together on the same platform. The “ELT” pattern was thus born based on the notion that collocating transformation operations inside the data warehouse would be more efficient as it would reduce data movements. The downside of ELT, however, is that data transformation compute cycles compete for finite resource with analytics.
Enter Hadoop. First developed for Internet companies for solving what were considered unique Internet problems (e.g., search indexes, ad optimization, gaming, etc.), enterprises are intrigued by the new platform for its ability to broaden the scope of their analytics to accommodate data outside the traditional purview of the data warehouse. Indeed, Hadoop allows organizations to broaden their analytic view. Instead of a traditional “360 view” of the customer that is largely transaction-based, add in social or mobile data to get a fuller picture. The same goes with machine data for operations, or weblog data for any Internet site.
But Hadoop can have another role in supplementing the data warehouse, performing the heavy lift at less cost. For starters, inserting a Hadoop platform to perform data transformation can offload cycles from the data warehouse to an environment where compute (and storage) are far less expensive. The software is far less expensive and the hardware is pure commodity (x86, standard Ethernet, and cheap 1 – 3 TB disk). And the system is sufficiently scalable that, should you need to commandeer additional resources to crunch higher transformation loads, they can be added economically by growing out your clusters.
Admittedly, there’s no free lunch; Hadoop is not free, as in free beer.
1. Like most open source software, you will pay for support. However, when compared to the cost of licensing commercial database software (where fees are related to installation size), the cost of open source should be far more modest.
2. In the short run, you will likely pay more for Hadoop skills because they are not (yet) as plentiful as SQL. This is a temporary state of affairs; as with Java 1999, the laws of supply and demand will eventually resolve this hurdle.
3. Hadoop is not as mature a platform as off-the-shelf SQL counterparts, but that is also a situation that time will eventually resolve.
4. Hadoop adds another tier to your analytic platform environment. If you embraced ELT, it does introduce some additional data movement; but then again, you may not have to load all of that data to a SQL target in the end.
But even if Hadoop is not free, it presents a lower cost target for shifting transform compute cycles. More importantly, it adds new options for analytic processing. With SQL and Hadoop converging, there are new paths for SQL developers to access data in Hadoop without having to learn MapReduce. These capabilities will not eliminate SQL querying to your existing data warehouse, as such platforms are well-suited for routine queries (with many of them carrying their own embedded specialized functions). But they supplement them by providing the opportunity to conduct exploratory querying that rounds out the picture and provides the opportunity to test drive new analytics before populating them to the primary data warehouse.
Emergence of Hadoop is part of a trend towards away from monolithic data warehousing environments. While enterprise DWs live on, they are no longer the sole focus of the universe. For instance, Teradata, which has long been associated with enterprise data warehousing, now promotes a unified data architecture that acknowledges that you’ll need different types of platforms for different workloads: operational data store, interactive analytics, and data deep dives. IBM, Oracle, and Microsoft are similarly diversifying their data platforms. Hadoop is just the latest addition, bringing capabilities for bulk transformation and exploratory analytics in SQL (and other) styles.
We will be discussing this topic in more detail later this week in a webinar sponsored by Cloudera. Catch our session “Hadoop: Extending your Data Warehouse” on Thursday, May 9, 2013 at 2:00pm ET/ 11:00a PT. You can register for the session here.
That’s a paraphrase of a question raised by IDC analyst Carl Olofson at an IBM Big Data analyst event earlier this week. Carl’s question neatly summarized our impressions from the session, which centered around some big data announcements that IBM made. It concerned some new performance improvements that IBM has made that might render some issues with poorly formed SQL moot. More about that in a moment.
The question was all the more fitting and ironic given the setting – the event was held at IBM’s Almaden research facility, which happened to be the same place where Edgar (Ted) Codd invented SQL; IBM will video webcast excerpts on April 30.
Specifically, IBM made a series of announcements; while much of the press focused on announcement of a preview for IBM’s PureData for Hadoop appliance, to us the highlight was unveiling of a new architecture, branded BLU acceleration. Independent DB2 consultant Dave Beulke, whom we met at the launch, has published one of the best post mortems on the significance of the announcement.
BLU is supposed to be lightning fast. BNSF railroad, a BLU beta customer, reported performing a 4-billion row join in 8 milliseconds.
So what does this all mean?
Databases are assuming multiple personalities
BLU acceleration consists of a new engine that accelerates database performance. Let’s dissect that seemingly innocuous – and ambiguous – statement. Traditionally, the database and the underlying engine were considered one and the same. But increasingly, databases are evolving into broader data platforms with multiple personalities that are each designed for a specific form of processing or compute problem scenarios. Today’s DB2, Exadata, and Teradata 14 are not your father’s row-based data stores—they also have columnar support; even Microsoft SQL Server Parallel Warehouse Edition supports columnar indexing that can double as full-blown data tables alongside your row store. So you run rows for your existing applications (probably most of them are transactional) and run new analytic apps against the column store. Or if you’re a Pivotal HD (nee EMC Greenplum), run your SQL analytic queries against the relational engine, which happens to use Hadoop’s HDFS as the back end file system. And for Hadoop, new frameworks are emerging alongside MapReduce that are adding interactive, graph, and stream processing faces.
This week’s announcements by IBM of the BLU architecture marked yet another milestone in this trend. BLU is an engine that can exist side by side with DB2’s traditional row-based data store (it will be supported inside DB2 10.5). So you can run existing apps on the row store while migrating a select few to tap BLU. BLU is also being made available for IBM’s Informix TimeSeries 12.1, and in the long run, you’re likely to see it going into IBM’s other data platforms (think PureData for Analytics, Operational Analytics, and Hadoop models).
From IBM: More mixing and matching
In the same spirit, we expect to see IBM (and its rivals) do more mixing and matching in the future. We’re waiting for IBM to release an appliance that combines SQL analytics side by side with an instance of Hadoop, where you could run blended analytic queries (think: analytics from your CRM system alongside social, weblog, and mobile data harvested by Hadoop).
And while we’re on the topic of piling on data engines, IBM announced a preview of a JSON data store (think MongoDB style); it will become yet another engine to sit under the DB2 umbrella. We don’t expect MongoDB users to suddenly flock to buy DB2 licenses, but it will be a way to for existing DB2 shops to add an engine for developers who would otherwise implement their own Mongo one-off projects. The carrot is that IBM JSON takes advantage of data protection and security services of the DB2 platform that is not available from Mongo.
BLU includes a number of features that individually, are not that unique (although there may be debates regarding degree of optimization). But together, they form a well-rounded approach to not only accelerating processing inside a SQL platform, but allowing new types of analytic processing. For instance, think about applying some of the late-binding schema practices from the Hadoop world to SQL (don’t believe for a moment that analytics on Hadoop doesn’t involve structuring data, but you can do it on demand, for the specific problem).
Put another way, in the Hadoop world, the competitive spotlight currently is on convergence with SQL. And now in the SQL world, styles of analytic processing from the NoSQL side are bleeding into SQL. Consider it a case of man bites dog.
The laundry list for BLU includes:
• Columnar and in-memory processing. Most Advanced SQL (or NewSQL) analytic platforms such as Teradata Aster, Pivotal HD (and its Greenplum predecessors), Vertica, ParaAccel, and others incorporate columnar as a core design. Hadoop’s HBase database also uses column storage. And of course as noted above, columnar engines are increasingly being incorporated alongside existing row-oriented stores inside relational warhorses. Columnar lends itself well to analytics because it reduces table scanning (you only need to look at specific columns rather than across entire rows) and focuses on aggregate data rather than individual records.
• Data compression – Compression and columnar tend to go together because, when you focus on representing aggregates, you can greatly reduce the number of bits for providing the data you need, such as averages, means, or outliers. Almost every column store employs some form of compression with ratios in the double-digit to 1 territory common. BLU is differentiated by a feature that IBM calls “actionable compression:” you can read compressed data without de-compressing it first, which significantly boosts performance because you can avoid de-compress/re-compress compute cycles.
• Data skipping – Many analytic data stores incorporate algorithms for minimizing data scans, with BLU’s algorithms doing so by ferreting out non-relevant data.
There are more optimizations under the hood. For instance, BLU tiers active columnar data into and out of memory and/or Flash (solid state disk) drives. And while in memory, BLU optimizes processing so that several columns can be crammed into a single memory register; that may sound quite geeky, but this design pattern is a key ingredient to accelerating throughput.
IBM contends that its in-memory and Flash optimizations are “good enough” to the point that a 100% in-memory PureData appliance to counter SAP HANA is not likely. But for Flash, never say never. In our view, given rapidly declining prices, we wouldn’t be surprised to see IBM at some point come out with an all-Flash unit.
Again, what does this mean for SQL and the DBA?
Now, back to our original question: When performance is accelerated to such an extent, does it really matter whether you’ve structured your tables, tuned your database, or formed your SQL statements properly? At first blush, that sounds like a rather academic question, but consider that time spent modeling databases and optimizing queries is time diverted from taking on new problems that could cut into the development backlog. And there is historical precedent; in SQL’s early days, conventional wisdom was that it required so much processing overhead (compared to hierarchical file systems that prevailed at the time) that it would never scale for the enterprise. Well, Moore’s Law brute forced the solution; SQL processing didn’t get that much more efficient, but hardware got much more powerful. Will on-demand SQL acceleration do the same for database modeling and SQL querying? Will optimization and automation make DBAs obsolete?
It seemed sacrilegious that, nearing the 40th anniversary of SQL, that such a question was posed at the very place where the technology was born.
But matters aren’t quite so black and white; as one set of problems get solved, broader ones emerge. For the DBA, the multiple personalities of data platforms are changing the nature of problem-solving: instead of writing the best SQL statement, focus on defining and directing the right query, to the right data, on the right engine, at the right time.
For instance, a hot new mobile device is released to the market with huge fanfare, sales initially spike before unexpectedly dropping through the floor. Such a query might fuse SQL (from the CRM analytic system) with sentiment analysis (to see what customers and prospects were saying), graph analysis (to understand who is friends with, and influences, who), and time series (to see how sentiment changed over time). The query may run across SQL, Hadoop, and possibly another specialized data store.
Admittedly, there will be a significant role for automation to optimize such queries, but the trend points to a bigger reality for DBAs where they don’t worry as much about SQL schema or syntax per se, but focus more on optimizing (with the system’s help) data and queries in more global terms.
Flattening of Big Data architecture has become something of an epidemic. The largeness of Big Data has forced the middle and bottom layers of the stack – analytics and data – to converge; the accessibility of SQL married to the scale of Hadoop has driven a similar result. And now we’re seeing the top, middle, and in some cases lower levels of the stack converging with BI and transformation atop an increasingly ambitious data tier.
It began with the notion of making BI more self-service; give ordinary people the ability to make ad hoc queries without waiting for IT to clear its backlog. Tools like Tableau, QlikTech, and Spotfire have popularized visualization with intuitive front ends backed typically by some form of data caching mechanism for materializing views PDQ. Originally these approaches may have amounted to putting lipstick on a pig (e.g., big, ugly, complex SQL databases), but in many cases, these tools are packing more back end functionality to not simply paint pictures, but quickly assemble them. They are increasingly embedding their own transformation tools. While they are eliminating the ETL tier, they are definitely not eliminating the “T” – although that message tends to get blotted out by marketing hyperbole. That in turn is leading to the next step, which is elevating cache to becoming full-bore in-memory databases. So now you’ve collapsed, not only the ETL middle tier, but the data back end tier. We’re seeing platforms that would otherwise be classed Advanced SQL or NewSQL databases, like SiSense.
That phenomenon is also working its way on the NoSQL side where we see Platfora packaging, not only an in-memory caching tier for Hadoop, but also the means to marshal and transform data and views on the fly.
This is not simply a tale of flattening architecture for its own sake. The ramifications are basic changes to the analytics workflow and lifecycle. Instead of planning your data structures and queries ahead of time, generate schema and views on demand. Flattening the architecture facilitates this new style.
Traditionally with SQL – both for data warehousing and transaction systems – the process was completely different. You modeled the data and specified the tables at design time based on your knowledge of the content of the data and the queries and reports you were going to generate.
As you might recall, NoSQL was a reaction to the constraints of imposing a schema on the database at design time. By contrast, NoSQL did not necessarily do away with structure; it simply allowed the process to become more flexible. Collect the data and when it’s time to harvest it, explore it, discover the problem, and then derive the structure. And because in most NoSQL platforms, you still retain the data in raw form, you can generate a different schema as the nature of the problem, business challenge, or content of the data itself changes. Just run another series of MapReduce or similar processes to generate new views. Nonetheless, this view of flexible schema was borne with assumptions of a batch processing environment.
What’s changed is the declining cost of silicon-based storage: Flash (SSD) and memory (DRAM). That’s allowed those cute SQL D-I-Y visualization tools to morph into in-memory data platforms, because it was now cheap enough to gang terabytes of memory together. Likewise, it has cleared the way for Oracle and SAP to release in-memory platforms. And on the NoSQL side, it is making the notion of dynamic views from Hadoop thinkable.
We’re only at the beginning of the great rethink of analytic data views, schematizing processes, and architectural refactoring. It fires a shot across the bow to traditional BI players who have built their solutions on traditional schema at design rather than run time; in their case, it will require some significant architectural redesign. The old way will not disappear, as the contents of core end-of-period reporting and similar processes will not go away. There will always be a need for data warehouses and BI/reporting tools that provide repeatable, baseline query and reporting. And the analytic and data protection/housekeeping functions that are provided by established platforms will continue to be in demand. Astute BI and DW vendors should consider these new options as additive; they will be challenged by upstarts offering highly discounted pricing. For established vendors, the key is emphasizing the value add while they provide the means for taking advantage of the new, more flexible style of schema on demand.
Sadly, as we’re at the beginning of a new era of dynamic schema and dynamic analytics, there is also a lot of noise, like the dubious proposition that we can eliminate ETL. Folks, we are eliminating a tier, not the process itself. Even with Hadoop, when you analyze data, you inevitably end up forming it into a structure so you can grind out your analytics.
Disregard the noise and hype. You’re not going to replace your data warehouses for routine, mandated processes. But there will be new analytics will become more organic. It’s not simply a phenomenon in the Hadoop world, but with SQL as well. Your analytics infrastructure will flatten, and your schema and analytics will grow more flexible and organic.
Data warehousing and analytics have accumulated a reasonably robust set of best practices and methodologies since they emerged in the mid-1990s. Although not all enterprises are equally vigilant, the state of practices around data stewardship (e.g., data quality, information lifecycle management, privacy and security) is pretty mature.
With emergence of Big Data and new analytic data platforms that handle different kinds of data such as Hadoop, the obvious question is whether these practices still apply. Admittedly, not all Hadoop use cases have been for analytics, but arguably, the brunt of early implementations are. That reality is reinforced by how most major IT data platform household brands have positioned Hadoop: EMC Greenplum, HP Vertica, Teradata Aster and others paint a picture that Hadoop is an extension of your [SQL] enterprise data warehouse.
That provokes the following question: if Hadoop is an extension of your data warehouse or analytic platform environment, should the same data stewardship practices apply?
We’ll train our focus on quality. Hadoop frees your analytic data store of limits, both to quantity of data and structure, which were part and parcel of maintaining a traditional data warehouse. Hadoop’s scalability frees your organization to analyze all of the data, not just a digestible sample of it. And not just structured data or text, but all sorts of data whose structure is entirely variable. With Hadoop, the whole world’s an analytic theatre.
Significantly, with the spotlight on volume and variety, the spotlight has been off quality. The question is, with different kinds and magnitudes of data, does data quality still matter? Can you afford to cleanse multiple terabytes of data? Is “bad data” still bad?
The answers aren’t obvious. Traditional data warehouses treated “bad” data as something to be purged, cleansed, or reconciled. While the maxim “garbage in, garbage out” has been with us since the dawn of computing, the issue of data quality hit the fan when data warehouses provided the opportunity to aggregate more, diverse sources of data that was not necessarily consistent in completeness, accuracy, or structure. The fix was cleansing record by record based on the proposition that analytics required strict apples to apples comparisons.
Yet volume and variety of Hadoop data casts doubt on the practicality of traditional data hygiene practice. Remediating record by record will take forever, and anyway, it’s simply not going to be practice – or worthwhile – to cleanse log files which are highly variable (and low value) by nature. The variety of data, not only by structure, but also source, makes it more difficult to know what is the correct structure and form of any individual record. And given that individual machine data readings are often cryptic and provide little value except when aggregated at huge scale also militates against traditional practice.
So now Hadoop becomes a special case. However, given that Hadoop also supports a different approach to analytics, by reason, data should also be treated differently.
Exact Picture or Big Picture?
Quality in Hadoop becomes more of a broad spectrum of choice that depends on the nature of the application and the characteristics of the data – specifically, the 4 V’s. Is your application mission-critical? That might augur for a more vigilant practice of data quality, but that depends on whether the application requires strict audit trails and carries regulatory compliance exposure. In those cases, better get the data right. However, web applications such as searching engines or ad placement may also be mission-critical but not necessarily bring the enterprise to its knees if the data is not 100% correct.
So you’ve got to ask yourself the question: are you trying to get the big picture, or the exact one? In some cases, they may be different.
The nature of data in turn determines the practicality of cleansing strategies. More volume dictates against traditional record-by-record approaches, variety makes the job of clean sing more difficult, while high velocity makes it virtually impossible. For instance, high throughput complex event processing (CEP)/data streaming applications are typically implemented for detecting patterns that drive operational decisions; cleansing would add too much processing overhead for especially high-velocity/low latency apps. Then there’s the question of data value; there’s more value in a customer identity record an individual reading that is the output of a sensor.
A spectrum of data hygiene approaches
Enforcing data quality is not impossible in Hadoop. There are different approaches, that, depending on the nature of the data and application, may dictate different levels of cleansing or none at all.
A “crowdsourcing” approach widens the net of data collection to a larger array of sources with the notion that enough good data from enough sources will drown out the noise. In actuality, that’s been the de facto approach that has been taken with early adopters, and it’s a fairly passive one. But such approaches could be juiced up with trending analytics that dynamically track the sweet spot of good data to see if the norm is drifting.
Another idea is unleashing the power of data science, not only to connect the dots, but also correct them. We’re not suggesting that you turn your expensive (and rare) data scientists into data QA techs, but to apply the same methodologies for exploration to dynamically track quality. Other variants are applying approaches that apply cleansing logic, not at the point of data ingestion, but consumption; that’s critical for highly-regulated processes, such as assessing counter-party risk for capital markets. In one particular case, an investment bank used a rules-based, semantic domain model using the OMG’s Common Warehouse Model as a means for validating data consumed.
Bad Data may be good
Big Data in Hadoop may be different data, and may be analyzed differently. The same logic applies to “bad data” that in conventional terms appears as outlier, incomplete, or plain wrong. The operable question of why the data may be “bad” may yield as much value as analyzing data within the comfort zone. It’s the inverse of analyzing the drift over time of the sweet spot of good data. When there’s enough bad data, that makes it fair game for trending to check whether different components or pieces of infrastructure are drifting off calibration, or if the assumptions on what constitute “normal” conditions are changing. Like rising sea levels, typical daily temperature swings, for instance. Similar ideas could apply to human-readable data, where perceived outliers reflect flawed assumptions on the meaning of data, such as when conducting sentiment analysis. In Hadoop, bad data may be good.
In its rise to leadership of the ERP market, SAP shrewdly placed bounds around its strategy: it would stick to its knitting on applications and rely on partnerships with systems integrators to get critical mass implementation across the Global 2000. When it came to architecture, SAP left no doubt of its ambitions to own the application tier, while leaving the data tier to the kindness of strangers (or in Oracle’s case, the estranged).
Times change in more ways than one – and one of those ways is in the data tier. The headlines of SAP acquiring Sybase (for its mobile assets, primarily) and subsequent emergence of HANA, its new in-memory data platform, placed SAP in the database market. And so it was that at an analyst meeting last December, SAP made the audacious declaration that it wanted to become the #2 database player by 2015.
Of course, none of this occurs in a vacuum. SAP’s declaration to become a front line player in the database market threatens to destabilize existing relationships with Microsoft and IBM as longtime SAP observer Dennis Howlett commented in a ZDNet post. OK, sure, SAP is sick of leaving money on the table to Oracle, and it’s throwing in roughly $500 million in sweeteners to get prospects to migrate. But if the database is the thing, to meet its stretch goals, says Howlett, SAP and Sybase would have to grow that part of the business by a cool 6x – 7x.
But SAP would be treading down a ridiculous path if it were just trying to become a big player in the database market for the heck of it. Fortuitously, during SAP’s press conference on announcements of their new mobile and database strategies, chief architect Vishal Sikka tamped down the #2 aspirations as that’s really not the point – it’s the apps that count, and increasingly, it’s the database that makes the apps. Once again.
Back to our main point, IT innovation goes in waves; during emergence of client/server, innovation focused on database where the need was mastering SQL and relational table structures; during the latter stages of client/server and subsequent waves of Webs 1.0 and 2.0, activity shifted to the app tier, which grew more distributed. With emergence of Big Data and Fast Data, energy shifted back to the data tier given the efficiencies of processing data big or fast inside the data store itself. Not surprisingly, when you hear SAP speak about HANA, they describe an ability to perform more complex analytic problems or compound operational transactions. It’s no coincidence that SAP now states that it’s in the database business.
So how will SAP execute its new database strategy? Given the hype over HANA, how does SAP convince Sybase ASE, IQ, and SQL Anywhere customers that they’re not headed down a dead end street?
That was the point of the SAP announcements, which in the press release stated the near term roadmap but shed little light on how SAP would get there. Specifically, the announcements were:
• SAP HANA on BW is now going GA and at the low (SMB) end come out with aggressive pricing: roughly $3000 for SAP BusinessOne on HANA; $40,000 for HANA Edge.
• Ending a 15-year saga, SAP will finally port its ERP applications to Sybase ASE, with tentative target date of year end. HANA will play a supporting role as the real-time reporting adjunct platform for ASE customers.
• Sybase SQL Anywhere would be positioned as the mobile front end database atop HANA, supporting real-time mobile applications.
• Sybase’s event stream (CEP) offerings would have optional integration with HANA, providing convergence between CEP and BI – where rules are used for stripping key event data for persistence in HANA. In so doing, analysis of event streams could be integrated or directly correlating with historical data.
• Integrations are underway between HANA and IQ with Hadoop.
• Sybase is extending its PowerDesigner data modeling tools to address each of its database engines.
Most of the announcements, like HANA going GA or Sybase ASE supporting SAP Business suite, were hardly surprises. Aside from go-to-market issues, which are many and significant, we’ll direct our focus on the technology roadmaps.
We’ve maintained that if SAP were serious about its database goals, that it had to do three basic things:
1. Unify its database organization. The good news is that it has started down that path as of January 1 of this year. Of course, org charts are only the first step as ultimately it comes down to people.
2. Branding. Although long eclipsed in the database market, Sybase still has an identifiable brand and would be the logical choice; for now SAP has punted.
3. Cross-fertilize technology. Here, SAP can learn lessons from IBM which, despite (or because of) acquiring multiple products that fall under different brands, freely blends technologies. For instance, Cognos BI reporting capabilities are embedded into rational and Tivoli reporting tools.
The third part is the heavy lift. For instance, given that data platforms are increasingly employing advanced caching, it would at first glance seem logical to blend in some of HANA’s in-memory capabilities to the ASE platform; however, architecturally, that would be extremely difficult as one of HANA’s strengths –dynamic indexing – would be difficult to implement in ASE.
On the other hand, given that HANA can index or restructure data on the fly (e.g., organize data into columnar structures on demand), the question is, does that make IQ obsolete? The short answer is that while memory keeps getting cheaper, it will never be as cheap as disk and that therefore, IQ could evolve as near-line storage for HANA. Of course that begs the question as to whether Hadoop could eventually perform the same function. SAP maintains that Hadoop is too slow and therefore should be reserved for offline cases; that’s certainly true today, but given developments with HBase, it could easily become fast and cheap enough for SAP to revisit the IQ question a year or two down the road.
Not that SAP Sybase is sitting still with Hadoop integration. They are providing MapReduce and R capabilities to IQ (SAP Sybase is hardly alone here, as most Advanced SQL platforms are offering similar support). SAP Sybase is also providing capabilities to map IQ tables into Hadoop Hive, slotting IQ as alternative to HBase; in effect, that’s akin to a number of strategies to put SQL layers inside Hadoop (in a way, similar to what the lesser-known Hadapt is doing). And of course, like most of the relational players, SAP Sybase is also support the bulk ETL/ELT load from HDFS to HANA or IQ.
On SAP’s side for now is the paucity of Hadoop talent, so pitching IQ as an alternative to HBase may help soften the blow for organizations seeking to get a handle. But in the long run, we believe that SAP Sybase will have to revisit this strategy. Because, if it’s serious about the database market, it will have to amplify its focus to add value atop the new realities on the ground.
Informatica is within a year or two of becoming a $1 billion company, and the CEO’s stretch goal is to get to $3b.
Informatica has been on a decent tear. It’s had a string of roughly 30 consecutive growth quarters, growth over the last 6 years averaging 20%, and 2011 revenues nearing $800 million. Abbasi took charge back in 2004, lifting Informatica out of its midlife crisis by ditching an abortive foray into analytic applications, instead expanding from the company’s data transformation roots to data integration. Getting the company to its current level came largely through a series of acquisitions that then expanded the category of data integration itself. While master data management (MDM) has been the headliner, other recent acquisitions have targeted information lifecycle management (ILM), complex event processing (CEP), low latency messaging (ultra messaging), along with filling gaps in its B2B and data quality offerings. While some of those pieces were obvious additions, others such as ultra messaging or event processing were not.
CEO Sohaib Abbassi is talking about a stretch goal of $3 billion revenue. The obvious chunk is to deepen the company’s share of existing customer wallets. We’re not at liberty to say how much, but Informatica had a significant number of 6-figure deals. Getting more $1m+ deals will help, but on their own won’t triple revenue.
So how to get to $3 billion?
Obviously, two strategies: deepen the existing business while taking the original formula to expand the footprint of what’s data integration.
First, the existing business. Of the current portfolio, MDM is likely best primed to allow Informatica to more deeply penetrate the installed base. Most of its data integration clients haven’t yet done MDM, and it is not a trivial investment. And for MDM clients who may have started with a customer or product domain, there are always more domains to tackle. During Q&A, Abbasi listed MDM has having as much potential addressable market as the traditional ETL and data quality segments.
The addition of SAP and Oracle veteran Dennis Moore to the Informatica MDM team points to the classic tightrope for any middleware vendor that claims it’s not in the applications game – build more “solutions” or jumpstart templates to confront the same generic barrier that packaged applications software was designed to surmount: provide customers an alternative to raw toolsets or custom programming. For MDM, think industry-specific “solutions” like counter-party risk, or horizontal patterns like social media profiles. If you’re Informatica, don’t think analytic applications.
That’s part of a perennial debate (or rant) on whether middleware is the new enterprise application: you implement for a specific business purpose as opposed to technology project, such as application or data integration, and you implement with a product that offers patterns of varying granularity as a starting point. Informatica MDM product marketing director Ravi Shankar argues it’s not an application because applications have specific data models and logic that become their own de factor silos, whereas MDM solutions reuse the same core metadata engine for different domains (e.g., customer, product, operational process). Our contention? If it solves a business problem and it’s more than a raw programming toolkit, it’s a de facto application. If anybody else cares about this debate, raise your hand.
MDM is typically a very dry subject but demo’ing a social MDM straw man showing a commerce application integrated into Facebook perked Twitter debate among analysts in the room. The operable notion is that such a use of MDM could update the customer’s (some might say, victim’s) profile by the associations that they make in social networks. An existing Informatica higher educational client that shall remain anonymous actually used MDM to mine LinkedIn to prove that its grads got jobs.
This prompts the question, just because you can do it, should you. When a merchant knows just a bit too much about you – and your friends (who may not have necessarily opted in) – that more than borders on creepy. Informatica’s Facebook MDM integration was quite effective; as a pattern for social business, well, we’ll see.
So what about staking new ground? When questioned, Abbasi stated that Informatica had barely scratched the surface with productizing around several megatrend areas that it sees impacting its market: cloud, social media, mobile, and Big Data. More specifically:
• Cloud continues to be a growing chunk of the business. Informatica doesn’t have all of its tooling up in the cloud, but it’s getting there. Consumption of services from the Informatica Cloud continues to grow at a 100 – 150% annual run rate. Most of the 1500 cloud customers are new to Informatica. Among recent introductions are a wizard-driven Contact Validation service that verifies and corrects postal addresses from over 240 countries and territories. A new rapid connectivity framework further eases the ability of third parties to OEM Informatica Cloud services.
• Social media – there were no individual product announcements her per se, just that Informatica’s tools must increasingly parse data coming from social feeds. That covers MDM, data profiling and data quality. Much of it leverages HParser, the new Hadoop data parsing tool released late last year.
• Mobile – for now this is mostly a matter of making Informatica tools and apps (we’ll use the term) consumable on small devices. On the back end, there are opportunities for optimizing virtualizing and replicating data on demand to the edges of highly distributed networks. Aside from newly-announced features such as iPhone and Android support of monitoring the Informatica cloud, for now Informatica is making a statement of product direction.
• Big Data – Informatica, like other major BI and database vendors, have discovered big Data with a vengeance over the past year. The ability to extract from Hadoop is nothing special – other vendors have that – but Informatica took a step ahead with release of HParser last fall. In general there’s growing opportunity for tooling in a variety of areas touching Hadoop, with Informatica’s data integration focus being one of them. We expect to see extension of Informatica’s core tools to not only parse or extract from Hadoop, but increasingly, work natively inside HDFS on the assumption that customers are not simply using it as a staging platform anymore. We also see opportunities in refinements to HParser providing templates or other shortcuts for deciphering sensory data. ILM, for instance, is another obvious one. While Facebook et al might not archive or deprecate their Hadoop data, mere mortal enterprises will have to bite the bullet. Data quality in Hadoop in many cases may not demand the same degree of vigilance as SQL data warehouses, creating demand for lighter weight data profiling and cleansing tooling And for other real-time web centric use case, alternatives stores like MongoDB, Couchbase, and Cassandra may become new Informatica data platform targets.
What, no exit talk?
Abbasi commented at the end of the company’s annual IT analyst meeting that this was the first time in recent memory that none of the analysts asked who would buy Informatica when. Buttonholing him after the session, we got his take which, very loosely translated to Survivor terms, Informatica has avoided getting voted off the island.
At this point, Informatica’s main rivals – Oracle and IBM – have bulked up their data integration offerings to the point where an Informatica acquisition would no longer be gap filling; it would simply be a strategy of taking out a competitor – and with Informatica’s growth, an expensive one at that. One could then point to dark horses like EMC, Tibco, Teradata, or SAP (for obvious reasons we’ve omitted HP). A case might be made for EMC, or SAP if it remains serious in raising its profile as database player– but we believe both have bigger fish to fry. Never say never. But otherwise, the common thread is that data integration will not differentiate these players and therefore it is not strategic to their growth plans.
When we last left Oracle’s Big Data plans, there was definitely a missing piece. Oracle’s Big Data Appliance as initially disclosed at last fall’s OpenWorld was a vague plan that appeared to be positioned primarily as an appliance that would accompany and feed data to Exadata. Oracle did specify some utilities, such as an enterprise version of the open source R statistical processing program that was designed for multithreaded execution, plus a distribution of a NoSQL database based on Oracle’s BerkeleyDB as an alternative to Apache Hive. But the emphasis appeared to be extraction and transformation of data for Exadata via Oracle’s own utilities that were optimized for its platform.
As such, Oracle’s plan for Hadoop was competition, not for Cloudera (or Hortonworks), which featured a full Apache Hadoop platform, but EMC which offered a comparable, appliance-based strategy that pairs Hadoop with an Advanced SQL data store; and IBM, which took a different approach by emphasizing Hadoop as an analytics platform destination enhanced with text and predictive analytics engines, and other features such as unique query languages and file systems.
Oracle’s initial Hadoop blueprint lacked explicit support of many pieces of the Hadoop stack such as HBase, Hive, Pig, Zookeeper, and Avro. No more. With Oracle’s announcement of general availability of the Big Data appliance, it is filling in the blanks by disclosing that it is OEM’ing Cloudera’s CDH Hadoop distribution, and more importantly, the management tooling that is key to its revenue stream. For Oracle, OEM’ing Cloudera’s Hadoop offering fully fleshes out its Hadoop distribution and positions it as a full-fledged analytic platform in its own right; for Cloudera, the deal is a coup that will help establish its distribution as the reference. It is fully consistent with Cloudera’s goal to become the Red Hat of Hadoop as it does not aspire to spread its footprint into applications or frameworks.
Of course, whenever you put Oracle in the same sentence as OEM deal, the question of acquisition inevitably pops up. There are several reasons why an Oracle acquisition of Cloudera is unlikely.
1. Little upside for Oracle. While Oracle likes to assert maximum control of the stack, from software to hardware, its foray into productizing its own support for Red Hat Enterprise Linux has been strictly defensive; its offering has not weakened Red Hat.
2. Scant leverage. Compare Hadoop to MySQL and you have a Tale of Two Open Source projects. One is hosted and controlled by Apache, the other is hosted and controlled by Oracle. As a result, while Oracle can change licensing terms for MySQL, which it owns, it has no such control over Hadoop. Were Oracle to buy Cloudera, another provider could easily move in to fill the vacuum. The same would happen to Cloudera if, as a prelude to such a deal, it began forking from the Apache project with its own proprietary adds-ons or substitutions.
OEMs deals are a major stage of building the market. Cloudera has used its first mover advantage with Hadoop well with deals Dell, and now Oracle. Microsoft in turn has decided to keep the “competition” honest by signing up Hortonworks to (eventually) deliver the Hadoop engine for Azure.
OEM deals are important for attaining another key goal in developing the Hadoop market: defining the core stack – as we’ve ranted about previously. Just as Linux took off once a robust kernel was defined, the script will be identical for Hadoop. With IBM and EMC/MapR forking the Apache stack at the core file system level, and with niche providers like Hadapt offering replacement for HBase and Hive, there is growing variability in the Hadoop stack. However, to develop the third party ecosystem that will be vital to the development of Hadoop, a common target (and APIs for where the forks occur) must emerge. A year from now, the outlines of the market’s decision on what makes Hadoop Hadoop will become clear.
The final piece of the trifecta will be commitments from the Accentures and Deloittes of the world to develop practices based on specific Hadoop platforms. For now they are still keeping their cards close to their vests.
Hadoop World was sold out and it seemed like “For Hire” signs were all over the place –- or at least that’s what it said on the slides at the end of many of the presentations. “We’re hiring, and we’re paying 10% more than the other guys,” declared a member of the office of the CIO at JP MorganChase in a conference keynote. Not to mention predictions that there’s big money in big data. Or that Accel Partner’s announced a new $100 million venture fund for big data startups; Cloudera scored $40 million in D funding; and rival Hortonworks previously secured $20 million for Round A.
These are heady days. For some like Matt Asay it’s time to voice a word of caution for all the venture money pouring into Hadoop: Is the field bloating with more venture dollars than it can swallow?
The resemblance to Java 1999 was more than coincidental; like Java during the dot com bubble, Hadoop is a relatively new web-related technology undergoing its first wave of commercialization ahead of the buildup of the necessary skills base. We haven’t seen such a greenfield opportunity in the IT space in over a decade. And so the mood at the conference became a bit heady –– where else in the IT world today is the job scene a seller’s market?
Hadoop has come a long way in the past year. A poll of conference attendees showed at least 200 petabytes under management. And while Cloudera has had a decent logo slide of partners for a while, it is no longer the lonely voice in the wilderness for delivering commercial distributions and enterprise support of Hadoop. Within this calendar year alone, Cloudera has finally drawn the competition to legitimize Hadoop as a commercial market. You’ve got the household names from data management and storage -– IBM, Oracle, EMC, Microsoft, and Teradata — jumping in.
Savor the moment. Because the laws of supply and demand are going to rectify the skills shortage in Hadoop and MapReduce and the market is going to become more “normal.” Colleagues like Forrester’s Jim Kobielus predict Hadoop is going to enter the enterprise data warehousing mainstream; he’s also gone on record that interactive and near real-time Hadoop analytics are not far off.
Nonetheless, Hadoop is not going to be the end-all; with the learning curve, we’ll understand the use cases where Hadoop fits and where it doesn’t.
But before we declare victory and go home, we’ve got to get a better handle of what Hadoop is and what it can and should do. In some respects, Hadoop is undergoing a natural evolution that happens with any successful open source technology: there are always questions over what is the kernel and where vendors can differentiate.
Let’s start with the Apache Hadoop stack, which is increasingly resembling a huge brick wall where things are arbitrarily stacked atop one another with no apparent order, sequence, or interrelationship. Hadoop is not a single technology or open source project but –– depending on your perspective –– an ecosystem or a tangled jumble of projects. We won’t bore you with the full list here, but Apache projects are proliferating. That’s great if you’re an open source contributor as it provides lots of outlet for innovation, but if you’re at the consuming end in enterprise IT, the last thing you want is to have to maintain a live scorecard on what’s hot and what’s not.
Compounding the situation, there is still plenty of experimentation going on. Like most open source technologies that get commercialized, there is the question of where the open source kernel leaves off and vendor differentiation picks up. For instance, MapR and IBM each believe it is in the file system, with both having have their own answers to the inadequacies of the core Hadoop file system, (HDFS).
But enterprises need an answer. They need to know what makes Hadoop, Hadoop. Knowing that is critical, not only for comparing vendor implementations, but software compatibility. Over the coming year, we expect others to follow Karmasphere and create development tooling, and we also except new and existing analytic applications to craft solutions targeted at Hadoop. If that’s the case, we better know where to insist on compatibility. Defining Hadoop the way that Supreme Court justice Potter Stewart defined pornography (“I know it when I see it”) just won’t cut it.
Of course, Apache is the last place to expect clarity as that’s not its mission. The Apache Foundation is a meritocracy. Its job is not to pick winners, although it will step aside once the market pulls the plug as it did when it mothballed Project Harmony. That’s where the vendors come in –– they package the distributions and define what they support. What’s needed is not an intimidating huge rectangle showing a profile, but instead a concentric circle diagram. For instance, you’d think that the file system would be sacred to Hadoop, but if not, what are the core building blocks or kernel of Hadoop? Put that at the center of the circle and color it a dark red, blue, or the most convincing shade of elephant yellow. Everything else surrounds the core and is colored pale. We call upon the Clouderas, Hortonworks, IBMs, EMCs et al to step up the plate and define Hadoop.
Then there’s the question of what Hadoop does. We know what it’s done traditionally. It’s a large distributed file system that is used for offline, a.k.a., batch –– analytic runs grinding through ridiculous amounts of data. Hadoop literally chops huge problems down to size thanks a lot of things: it has a simple file structure and it brings computation directly to the data; leverages cheap commodity hardware; supports scaled-out clustering; has a highly distributed and replicated architecture; and uses the MapReduce pattern for dividing and pipelining jobs into lots of concurrent threads, and mapping them back to unity.
But we also caught a presentation from Facebook’s Jonathan Grey on how Hadoop and its HBase column store was adapted to real-time operation for several core applications at Facebook such as its unified messaging system, the polar opposite of a batch application. In summary, there were a number of brute force workarounds to make Hadoop and HBase more performant, such as extreme denormalization of data; heavy reliance on smart caching; and use of inverted indexes that point to the physical location of data, and so on. There’s little doubt that Hadoop won’t become a mainstream enterprise analytic platform until performance bottlenecks are addressed. Not surprisingly, there’s little doubt that the HBase Apache project is targeting interactivity as one of the top development goals.
Conversely, we also heard lots of mention about the potential for Hadoop to function as an online alternative to offline archiving. That’s fed by an architectural design assumption that Big Data analytic data stores allow organizations to analyze all the data, not just a sample of it. Organizations like Yahoo have demonstrated dramatic increases in click-through rates from using Hadoop to dissect all user interactions. That’s instead of using MySQL or other relational data warehouse that can only analyze a sampling. And the Yahoos and Googles of the world currently have no plan to archive their data –– they will just keep scaling their Hadoop clusters out and distributing them. Facebook’s messaging system –– which was used for rolling out real-time Hadoop, is also designed with the use case that old data will not be archived.
The challenge is that the same Hadoop cannot be all things to all people. Optimizing the same data store for interactive and online archiving is like violating the laws of gravity –– either you make the storage cheap or you make it fast. Maybe there will be different flavors of Hadoop, as data in most organizations outside the Googles, Yahoos, or Facebooks of the world is more mortal –– as are the data center budgets.
Admittedly, there is an emerging trend to brute force design databases for mixed workloads –– that’s the design pattern behind Oracle’s Exadata. But even Oracle’s Exadata strategy has limitations in that its design will be overkill for smaller-midsize organizations, and that is exactly why Oracle came out with the Oracle Database Appliance. Same engine, but optimized differently. As few organizations will have Google’s IT budget, Hadoop will also have to have personas –– one size won’t fit all. And the Hadoop community –– Apache and vendor alike –– has got to decide what Hadoop’s going to be when it grows up.
Tibco has been running on all cylinders of late. In earnings and revenues, it has kept up with the Joneses in the enterprise software neighborhood, running respectable 25% revenue and 30+% software license growth numbers in its most recent quarterly year over year results as we’ve noted in several of our recent Ovum research notes.
It is beginning to make the turn from its geeky roots towards more solution selling to the business side in tone and deed. Ever since the 2007 Spotfire acquisition – which brought real-time analytic visualizations – it has made several buys that are more targeted to the business rather than strictly the IT or CIO side. They include Netrics, for fuzzy logic technologies for pattern matching; Loyalty Lab, for managing customer affinity programs; and Nimbus, a recent addition, which adds process discovery and management of manual activity that comprise the other 80% of what happens inside an enterprise.
Of course it’s not as if Tibco were trying to pull an HP in doing a 180 on its business strategy (heaven forbid, we don’t need any more senseless Silicon Valley soap operas!). Core infrastructure plays, such as FTL ultra low latency messaging or the DataSynapse data grid, remain core to Tibco’s 2-second advantage mission. It’s just that, in modest but growing cases, the raw technology is being packaged as a black box underneath more business-focused solutions. For instance, Tibco is packaging solutions for retail such as Active Catalog and Active Fulfillment that underneath the hood bundle Tibco Business Events (CEP), Active Matrix BPM, and other pieces.
Of course, such transformations don’t come overnight, as there is the need to get field sales up to speed and accustomed to calling on new entry points at target prospect. Not surprisingly, Tibco is also ramping up vertical solutions, but on an opportunistic basis. An example: we met with a European telco customer that is using Business Events for monitoring devices (in this case, water meters) which may present an opportunity for Tibco to develop an M2M (machine-to-machine) event-driven integration solution that could be more widely applied to segments such as utilities or logistics/transportation.
Several of its recent acquisitions, such as Foresight, a healthcare payer EDI gateway; Open Sprit, for data integration for upstream oil and gas processing, are strictly vertical plays. Loyalty Labs, which provides analytics for customer affinity programs, has helped make retail one of its fastest growing verticals coming from a near-zero client base a few years back.
Tibco is traveling a similar road as IBM, but is starting from much earlier point in developing vertical solutions. As Tibco lacks the professional services presence of IBM, it has to cherry pick its vertical opportunities.
At this point, the major disrupters for Tibco are big data and mobility.
For mobile the challenge is integrating alerts from Tibco’s Business Events and Spotfire engines to clients; tibbr, its internal collaboration messaging platform, provides the logical environment for bringing its events feed out to mobile devices. This could be bolstered with its recent Nimbus acquisition, both for input (process discovery, using mobile devices to snap a picture, for instance) and output (for communicating how to perform manual processes out to the field).
Big data positioning and productization for Tibco is also a work in progress. Its message busses can in some cases handle enormous amounts of data; its business event engine could also provide feeds if Tibco can make the sensing agent more lightweight; its BPM offering could be configured to get triggered based on specific event patterns that may involve crunching of enormous volumes of event feeds.
But there is a brave new world of variably structured data that is becoming fair game for enterprises to sense and respond. We don’t expect Tibco to buy its own Advanced SDQL platform or create its own Hadoop distribution, as Tibco is not about data at rest, nor is it a database player (OK, its MDM offering does have to store master and reference data). Nonetheless, delivering the 2-second advantage in a big world where the data is getting bigger, bigger, and more heterogeneous raises the urgency for Tibco to distinguish itself in extending its visibility.
When we were asked by the executive marketing team of our impressions this year, our thoughts were, well, there was hardly anything newsworthy. That’s not necessarily a bad thing, as during a strategy roadmap presentation at this year’s Tibco TUCON conference, a timeline of Tibco acquisitions showed roughly a half dozen entries for 2010 and just one for this year. Over the past year Tibco has been preoccupied with absorbing the new acquisitions and so – Nimbus excluded – has not been active on this front lately. For instance, Tibco has integrated the Netrics fuzzy pattern matching engine into Business Events, where it belongs.. It has similarly blended the recently acquired data grid technology with Business Events. Check out Sandy Kemsley’s post for a more detailed blow-by-blow on how Tibco has rounded out its product portfolio over the past year.
With the swoon on Wall Street, Tibco has left its $250 cash stash alone, in spite of the fact that there are plenty of acquisition targets available at reasonable prices right now as a lot of venture funds are looking for exits. By its CFO’s words, the company is not as enormous as IBM or Oracle, where acquisitions don’t disrupt the entire company. Nonetheless, we expect that 2012 will grow more active in acquisitions – we hope that acquisition of a data quality provider makes the top of the shopping list.
HP chose the occasion of its Q3 earnings call to drop the bomb. The company that under Mark Hurd’s watch focused on Converged Infrastructure, spending almost $7 billion to buy Palm, 3COM, and 3PAR, is now pulling a 180 in ditching both the PC and Palm hardware business, and making an offer to buy Autonomy, one of the last major independent enterprise content management players, for roughly $11 billion.
At first glance, the deal makes perfect sense, given Leo Apotheker’s enterprise software orientation. From that standpoint, Apotheker has made some shrewd moves, putting aging enterprise data warehouse brand Neoview out of its misery, following up weeks later with the acquisition of Advanced SQL analytics platform provider Vertica. During the Q3 earnings call, Apotheker stated the obvious as to his comfort factor with Autonomy: “I have spent my entire professional life in software and it is a world that I know well. Autonomy is very complementary.”
There is potential synergy between Autonomy and Vertica, with Autonomy CEO Mike Lynch (who will stay on as head of the unit, reporting to Apotheker) that Autonomy’s user screens provide the long missing front end to Vertica, and that both would be bound by a common “information layer.” Of course, the acquisition not being final, he did not give details on what that layer is, but for now we’d assume that integration will be at presentation and reporting layer. There is clearly a lot more potential here — Vertica for now only holds structured data while Autonomy’s IDOL system holds everything else. In the long run we’d love to see federated metadata and also an extension of Vertica to handle unstructured data, just as Advanced SQL rivals like Teradata’s Aster Data already do.
Autonomy, according to my Ovum colleague Mike Davis who has tracked the company for years, is one of only three ECM providers that have mastered the universal document viewer – Oracle’s Stellent and an Australian open source player being the others. In contrast to HP (more about that in a moment), Autonomy is quite healthy with the latest quarterly revenues up 16% year over year, operating margins in the mid 40% range, and a run rate that will take the company to its first billion dollar year.
Autonomy is clearly a gem, but HP paid dearly for it. During Q&A on the earnings call, a Wall street analyst took matters back down to earth, asking whether HP got such a good deal, given that it was paying roughly 15% of its market cap for a company that will only add about 1% to its revenues.
Great, expensive acquisition aside, HP’s not doing so well these days. Excluding a few bright spots, such as its Fortify security software business, most of HP’s units are running behind last year. Q3 net revenue of $31.2 billion was up only 1% over last year, but down 2% when adjusted for constant currency. By contrast, IBM’s most recent results were up 12% and 5%, respectively, when currency adjusted. Dennis Howlett tweeted that it was now HP’s turn to undergo IBM’s near-death experience.
More specifically, HP Software was the bright spot with 20% growth year over year and 19.4% operating margin. By contrast, the printer and ink business – long HP’s cash cow – dropped 1% year over year with the economy dampening demand from the commercial side, not to mention supply chain disruptions from the Japanese tsunami.
By contrast, services grew only 4%, and is about to kick in yet another round of transformation. John Visenten, who ran HP’s Enterprise services in the Americas region, comes in to succeed Ann Livermore. The problem is, as Ovum colleague John Madden states it, HP’s services “has been in a constant state of transformation” that is making some customers’ patience wear thin. Ever since acquiring EDS, HP has been trying – and trying – to raise the legacy outsourcing business higher up the value chain, with its sights literally set in the cloud.
The trick is that as HP tries aiming higher up the software and services food chain, it deals with a market that has longer sales cycles and long-term customer relationships that prize stability. Admittedly, when Apotheker was named CEO last fall, along with enterprise software veteran ray Lane to the board, the conventional wisdom was that HP would train its focus on enterprise software. So to that extent, HP’s strategy over the past 9 months has been almost consistent – save for earlier pronouncements on the strategic role of the tablet and WebOS business inherited with Palm.
But HP has been around for much longer than 9 months, and its latest shifts in strategy must be viewed with a longer perspective. Traditionally an engineering company, HP grew into a motley assortment of businesses. Before spinning off its geeky Agilent unit in 1999, HP consisted of test instruments, midrange servers and PCs, a token software business, and lest we forget, that printer business. Since then:
• The 2001 acquisition of Compaq that cost a cool $25 billion, under Carly Fiorina’s watch. That pitted it against Dell and caused HP to assume an even more schizoid personality as consumer and enterprise brand.
• Under Mark Hurd’s reign, software might have grown a bit (they did purchase Mercury after unwittingly not killing off their OpenView business), but the focus was directed at infrastructure – storage, switches, and mobile devices as part of the Converged Infrastructure initiative.
• In the interim, HP swallowed EDS, succeeding at what it failed to do with its earlier ill-fated pitch for PwC.
Then (1) Hurd gets tossed out and (2) almost immediately lands at Oracle; (3) Oracle pulls support for HP Itanium servers, (4) HP sues Oracle, and (5) its Itanium business sinks through the floor.
That sets the scene for today’s announcements that HP is “evaluating a range of options” (code speak for likely divestment) for its PC and tablet business – although it will keep WebOS on life support as its last gasp in the mobile arena. A real long shot: HP’s only hope for WebOS might be Android OEMs not exactly tickled pink about Google’s going into the handset business by buying Motorola’s mobile unit.
There is logical rationale for dropping those businesses – PCs have always been a low margin business in both sales and service, in spite of what it claimed to be an extremely efficient supply chain. Although a third of its business, PCs were only 13% of HP’s profits, and have been declining in revenue for several years. PCs were big enough to provide a distraction and low enough margin to become a drain. And with Palm, HP gained an eloquent OS, but with a damaged brand that was too late to become the iOS alternative – Google had a 5-year headstart. Another one bites the dust.
Logical moves, but it’s fair to ask, what is an HP? Given HP’s twists, turns, and about-faces, a difficult one to answer. OK, HP is shedding its consumer businesses – except printers and ink because in normal times they are too lucrative – but HP still has all this infrastructure business. It hopes to rationalize all this in becoming a provider of cloud infrastructure and related services, with a focus on information management solutions.
As mentioned above, enterprises crave stability, yet HP’s track record over the past decade has been anything but. To be an enterprise provider, technology providers must demonstrate that they have a consistent strategy and staying power because enterprise clients don’t want to be left with orphaned technologies. To its credit, today’s announcements show the fruition of Apotheker’s enterprise software-focused strategy. But HP’s enterprise software customers and prospects need the assurance that HP won’t pull another about face when it comes time for Apotheker’s successor.
Postscript: Of course we all know how this one ended up. One good 180 deserved another. Exit Apotheker stage left. Enter Meg Whitman stage right. Reality has been reversed.
« Previous entries Next Page » Next Page »