Turn on the ignition of your car, back out of the parking space and go into drive. As you engaged the transmission, gently tapped the accelerator and stepped on the brake, you didn’t directly interact with the powertrain. Instead, your actions were detected by sensors and executed by actuators on electronics control units that then got the car to shift, move, then stop.
Although in the end, Toyota’s recall issues from 2009-10 wound up isolating misadjusted accelerator controls, speculation around the recalls directed the spotlight to the prominent role of embedded software, prompting the realization that today when you operate your car, you are driving by wire.
Today’s automobiles are increasingly looking a lot more like consumer electronics products. They contain nearly as much software an iPhone, and in the future will contain even more. According to IDC, the market for embedded software that is designed into engineered products (like cars, refrigerators, airplanes, and consumer electronics) will double by 2015.
Automobiles are the tip of the iceberg where it comes to smart products; today most engineered products, from refrigerators to industrial machinery and aircraft all feature smart control. Adding intelligence allows designers to develop flexible control logic that brings more functionality to products and provides ways to optimize operation to gain savings in weight, bulk, and cost.
Look at the hybrid car: to function, the battery, powertrain, gas and electric engines, and braking systems must all interoperate to attain fuel economy. It takes software to determine when to let the electric engine run or let the battery recharge. The degree of interaction between components is greater compared to traditional electromechanical products designs. Features such as anti-lock braking or airbag deployment depend on the processing of data from multiple sources – wheel rotation, deceleration rate, steering, etc.
The growth of software content changes the ground rules for product development, which has traditionally been a very silo’ed process. There are well established disciplines in mechanical and electrical engineering, with each having their own sets of tools, not to mention claims to ownership of the product design. Yet with software playing the role as the “brains” of product operation, there is the need for engineering disciplines to work more interactively across silos rather than rely on systems engineers to crack the whip on executing the blueprint.
We were reminded of this after a rather enjoyable, freewheeling IEEE webcast that we had with IBM Rational’s Dominic Tavasolli last week.
Traditionally, product design fell under the mechanical engineering domain, which designed the envelope and specified the geometry, components, materials, physical properties (such as resistance to different forms of stress) and determined the clearance within which electronics could be shoehorned.
Drill down deeper and you’ll note that each engineering domain has its full lifecycle of tools. It’s analogous to enterprise software development organizations, where you’ll often stumble across well entrenched camps of Microsoft, Java, and web programmers. Within the lifecycle there is a proliferation of tools and languages to deal with the wide variety of engineering problems that must be addressed when developing a product. Unlike the application lifecycle, where you have specific tools that handle modeling or QA, on the engineering side there are multiple tools because there are many different ways to simulate a product’s behavior in the real world to perform the engineering equivalent of QA. You might want to test mechanical designs for wind shear, thermal deformation, or compressive stresses, and electrical ones for their ability to handle voltage and disperse heat from processing units.
Now widen out the picture. Engineering and manufacturing groups each have their own definitions of the product. It is expressed in the bill of materials (BOM): engineering has its own BOM, which details the design hierarchy, while the manufacturing BOM itemizes the inventory materials and the manufacturing processes needed to fabricate and assemble the product. That sets the stage for the question of who owns the product lifecycle management (PLM) process: the CADCAM vs. the ERP folks.
Into the mix between the different branches of engineering and the silos between engineering and manufacturing, now introduce the software engineers. They used to be an afterthought, yet today their programs are affecting, not only how product components and systems behave, but in many cases might impact the physical specifications. for instance, if you can design software to enable a motor to run more efficiently, the mechanical engineers can then design a smaller, lighter weight engine.
In the enterprise computing world, we’ve long gotten hung up on the silos that divide different parts of IT from itself – the developers vs. QA, DBAs, enterprise architects, systems operations – or IT from the business. However, the silos that plague enterprise IT are child’s play compared to the situation in product development where you have engineering groups pared off against each other, and against manufacturing.
OK, so the product lifecycle is a series of fiefdoms – why bother or care about making it more efficient? There is too much at stake in the success of a product: there are the constantly escalating pressures to squeeze time, defects, and cost out of the product lifecycle. That’s been the routine ever since the Japanese introduced American concepts of lean manufacturing back in the 1980s. But as automobiles and other complex engineered products adds more intelligence, the challenge is leveraging the rapid innovation of the software and consumer electronics industries for product sectors where, of necessity, lead times will stretch into one or more years.
There is no easy solution because there is no single solution. Each industry has different product characteristics that impact the length of the lifecycle and how product engineering teams are organized. Large, highly complex products such as automobiles, aircraft, or heavy machinery will have long lead times because of supply chain dependencies. At the other end of the scale, handheld consumer electronics or biomedical devices might not have heavy supply chain dependences. But, for instance, smart phones have short product lifespans and are heavily driven by the fats pace of innovation in processing power and software capabilities, meaning that product lifecycles must be quicker in order for new products to catch the market window. Biomedical devices on the other hand are often compact, but have significant regulatory hurdles to mount which impacts how the devices are tested.
The product lifecycle is a highly varied creature. The common thread is the need to more effectively integrate software engineering, which in turn is forcing the issue of integration and collaboration between other engineering disciplines. It is no longer sufficient to rely on systems engineers to get it together in the end – as manufacturers learned the hard way, it costs more to rework a design that doesn’t fit together, perform well, or be readily assembled with existing staff and facilities. The rapid evolution of software and processors also forces the issue on whether and where agile development processes can be coupled with linear or hierarchical development processes that are necessary for long-fuse products.
There is no single lifecycle process that will apply to all sectors, and no single set of tools that can perform every design and test function necessary to get from idea to product. Ultimately, the answer – as loose as it is – is that in larger product development organizations, work on the assumption that there are multiple sources of truth. The ALM and PLM worlds have at best worked warily at arms length from each other as there is a DMZ when it comes to requirements, change, and quality management. The reality is that no single constituency owns the product lifecycle – get used to federation that will proceed on rules of engagement that will remain industry- and organization-specific.
Ideally it would be great to integrate everything. Good luck. With the exception of frameworks that are proprietary for specific vendors, there is no associativity between tools that provides a process-level integration. The best that can be expected at this point is at the data exchange level.
It’s a start.
Hadoop World was sold out and it seemed like “For Hire” signs were all over the place –- or at least that’s what it said on the slides at the end of many of the presentations. “We’re hiring, and we’re paying 10% more than the other guys,” declared a member of the office of the CIO at JP MorganChase in a conference keynote. Not to mention predictions that there’s big money in big data. Or that Accel Partner’s announced a new $100 million venture fund for big data startups; Cloudera scored $40 million in D funding; and rival Hortonworks previously secured $20 million for Round A.
These are heady days. For some like Matt Asay it’s time to voice a word of caution for all the venture money pouring into Hadoop: Is the field bloating with more venture dollars than it can swallow?
The resemblance to Java 1999 was more than coincidental; like Java during the dot com bubble, Hadoop is a relatively new web-related technology undergoing its first wave of commercialization ahead of the buildup of the necessary skills base. We haven’t seen such a greenfield opportunity in the IT space in over a decade. And so the mood at the conference became a bit heady –– where else in the IT world today is the job scene a seller’s market?
Hadoop has come a long way in the past year. A poll of conference attendees showed at least 200 petabytes under management. And while Cloudera has had a decent logo slide of partners for a while, it is no longer the lonely voice in the wilderness for delivering commercial distributions and enterprise support of Hadoop. Within this calendar year alone, Cloudera has finally drawn the competition to legitimize Hadoop as a commercial market. You’ve got the household names from data management and storage -– IBM, Oracle, EMC, Microsoft, and Teradata — jumping in.
Savor the moment. Because the laws of supply and demand are going to rectify the skills shortage in Hadoop and MapReduce and the market is going to become more “normal.” Colleagues like Forrester’s Jim Kobielus predict Hadoop is going to enter the enterprise data warehousing mainstream; he’s also gone on record that interactive and near real-time Hadoop analytics are not far off.
Nonetheless, Hadoop is not going to be the end-all; with the learning curve, we’ll understand the use cases where Hadoop fits and where it doesn’t.
But before we declare victory and go home, we’ve got to get a better handle of what Hadoop is and what it can and should do. In some respects, Hadoop is undergoing a natural evolution that happens with any successful open source technology: there are always questions over what is the kernel and where vendors can differentiate.
Let’s start with the Apache Hadoop stack, which is increasingly resembling a huge brick wall where things are arbitrarily stacked atop one another with no apparent order, sequence, or interrelationship. Hadoop is not a single technology or open source project but –– depending on your perspective –– an ecosystem or a tangled jumble of projects. We won’t bore you with the full list here, but Apache projects are proliferating. That’s great if you’re an open source contributor as it provides lots of outlet for innovation, but if you’re at the consuming end in enterprise IT, the last thing you want is to have to maintain a live scorecard on what’s hot and what’s not.
Compounding the situation, there is still plenty of experimentation going on. Like most open source technologies that get commercialized, there is the question of where the open source kernel leaves off and vendor differentiation picks up. For instance, MapR and IBM each believe it is in the file system, with both having have their own answers to the inadequacies of the core Hadoop file system, (HDFS).
But enterprises need an answer. They need to know what makes Hadoop, Hadoop. Knowing that is critical, not only for comparing vendor implementations, but software compatibility. Over the coming year, we expect others to follow Karmasphere and create development tooling, and we also except new and existing analytic applications to craft solutions targeted at Hadoop. If that’s the case, we better know where to insist on compatibility. Defining Hadoop the way that Supreme Court justice Potter Stewart defined pornography (“I know it when I see it”) just won’t cut it.
Of course, Apache is the last place to expect clarity as that’s not its mission. The Apache Foundation is a meritocracy. Its job is not to pick winners, although it will step aside once the market pulls the plug as it did when it mothballed Project Harmony. That’s where the vendors come in –– they package the distributions and define what they support. What’s needed is not an intimidating huge rectangle showing a profile, but instead a concentric circle diagram. For instance, you’d think that the file system would be sacred to Hadoop, but if not, what are the core building blocks or kernel of Hadoop? Put that at the center of the circle and color it a dark red, blue, or the most convincing shade of elephant yellow. Everything else surrounds the core and is colored pale. We call upon the Clouderas, Hortonworks, IBMs, EMCs et al to step up the plate and define Hadoop.
Then there’s the question of what Hadoop does. We know what it’s done traditionally. It’s a large distributed file system that is used for offline, a.k.a., batch –– analytic runs grinding through ridiculous amounts of data. Hadoop literally chops huge problems down to size thanks a lot of things: it has a simple file structure and it brings computation directly to the data; leverages cheap commodity hardware; supports scaled-out clustering; has a highly distributed and replicated architecture; and uses the MapReduce pattern for dividing and pipelining jobs into lots of concurrent threads, and mapping them back to unity.
But we also caught a presentation from Facebook’s Jonathan Grey on how Hadoop and its HBase column store was adapted to real-time operation for several core applications at Facebook such as its unified messaging system, the polar opposite of a batch application. In summary, there were a number of brute force workarounds to make Hadoop and HBase more performant, such as extreme denormalization of data; heavy reliance on smart caching; and use of inverted indexes that point to the physical location of data, and so on. There’s little doubt that Hadoop won’t become a mainstream enterprise analytic platform until performance bottlenecks are addressed. Not surprisingly, there’s little doubt that the HBase Apache project is targeting interactivity as one of the top development goals.
Conversely, we also heard lots of mention about the potential for Hadoop to function as an online alternative to offline archiving. That’s fed by an architectural design assumption that Big Data analytic data stores allow organizations to analyze all the data, not just a sample of it. Organizations like Yahoo have demonstrated dramatic increases in click-through rates from using Hadoop to dissect all user interactions. That’s instead of using MySQL or other relational data warehouse that can only analyze a sampling. And the Yahoos and Googles of the world currently have no plan to archive their data –– they will just keep scaling their Hadoop clusters out and distributing them. Facebook’s messaging system –– which was used for rolling out real-time Hadoop, is also designed with the use case that old data will not be archived.
The challenge is that the same Hadoop cannot be all things to all people. Optimizing the same data store for interactive and online archiving is like violating the laws of gravity –– either you make the storage cheap or you make it fast. Maybe there will be different flavors of Hadoop, as data in most organizations outside the Googles, Yahoos, or Facebooks of the world is more mortal –– as are the data center budgets.
Admittedly, there is an emerging trend to brute force design databases for mixed workloads –– that’s the design pattern behind Oracle’s Exadata. But even Oracle’s Exadata strategy has limitations in that its design will be overkill for smaller-midsize organizations, and that is exactly why Oracle came out with the Oracle Database Appliance. Same engine, but optimized differently. As few organizations will have Google’s IT budget, Hadoop will also have to have personas –– one size won’t fit all. And the Hadoop community –– Apache and vendor alike –– has got to decide what Hadoop’s going to be when it grows up.