Words and Pictures

Data is one of those things that tends to get hidden behind black boxes. Before data becomes accessible, somebody must jump through the hoops to either access the data, or make access intuitive for the rest of us. There’s been no shortage of tools over the years to do things like hiding the SQL, or making non-SQL data sources look like SQL, and so on. Data transformation and integration used to be an exceptionally thorny problem until, just over a decade ago, the emergence of data warehousing demanded a better solution, and Informatica invented it.

Informatica’s innovation was borrowing visual development techniques from 4GL RAD tools and backing it with a metadata engine to make data transformations visual and reusable. For his latest act, Informatica founder Gaurav Dhillon has returned to his roots, pushing ETL into the Web 2.0p world. His new venture, SnapLogic, exploits emergence of RESTful web services, a.k.a., Web-Oriented Architecture (WOA), which represents data, not as columns or rows in a relational table, but as web links that are searchable by Google. It combines that with a Microsoft Excel-like front end recognizable to power users, rather than the 4GL metaphors used by developers, and places the technology atop a commodity open source Apache Tomcat Java server. Using these technologies, you can connect to a growing array of data services, such as those provided by commercial providers like StrikeIron, or those that are already in the public or open source domain.

SnapLogic’s latest deals reveal its goals to make ETL available to SMB that previously judged the technology too expensive and complex. It has inked deals with SugarCRM to provide a high level front end that hides the complexity of its web services, and it has signed a deal to host its ETL services in Amazon’s EC2 compute cloud. The goal is to make the process as low-touch as possible.

Significantly, none of what SnapLogic is doing is totally new; providers like CastIron commoditize in an appliance the most popular data transformations that Informatica and others already perform; while Salesforce simplifies access to its web services with features such as Microsoft Outlook plug-ins. The difference is SnapLogic’s go-to-m,arket strategy and its leveraging of REST and open source, which makes the technology more platform-independent and more affordable than even Salesforce’s low prices (compared to Siebel et al).

It’s an auspicious start, but there’s no free lunch. RESTful services are certainly less complex and lighter weight than web services, as you don’t have complex headers and a bewildering array of standards to contend with. REST is simply about data access, retrieval, and updating – which is its greatest strength and weakness. If all you weant is data services, REST does the job far more efficiently than SOAP calls to WSDL services. However, REST is not extensible like web services, meaning you cannot make any of the safeguards, like authentication, authorization, or access intrinsic. In a risk-averse, increasingly compliance driven business world, where leaks of consumer credit card numbers have grown all too routine, you need to incorporate safeguards. Add the dangers of the unprotected cloud, as Computerworld’s Ephraim Schwartz reported in detail.

The good news is that the problem is so general and widespread as to lend itself to the same kind of open source commoditization that could spawn a new generation of partners with which the SnapLogics of the world could round out their vision.