Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community

Posted by Tom Fennelly on Nov 26, 2008 02:23 PM
Source -> Structured Event Stream (Visitor Logic) -> Result
Smooks can be used in one, or both, of the following ways:
Comprehensive Threat Protection for REST, SOA, and Web 2.0 Applications
Business Benefits of Open Source SOA
The Role of Open Source in Data Integration
Usage Landscape: Enterprise Open Source Data Integration
Intel® SOA Expressway Performance Comparison to IBM® DataPower XI50
In this article, we will do a whistle-stop tour of some of the capabilities provided by the Smooks v1.1 distribution, out of the box. By this we mean capabilities you can take advantage of without writing any code (ala mode #2 above). These include:
One of the key features of Smooks is the ability to easily configure it to process data of different formats (i.e. not just XML) in a standard way. This means that if you develop some custom Visitor Logic for Smooks, that code will immediately be able to process any of the supported data formats, just as the Smooks out of the box components (Java Binding etc) are able to do. Allied to this, if you develop a custom Reader implementation for a data format that is not supported out of the box (e.g. YAML), you immediately inherit the ability to use all available out of the box Visitor Logic (e.g. the Java Binding components) to process the data events generated from data streams of that type. This is possible because Smooks components process a standardized event stream (i.e. a canonical form).
Out of the box, Smooks provides support for processing XML, EDI, CSV, JSON and Java Objects. By default, Smooks reads the source data stream as XML (unless otherwised configured). The exception to this is Java Object Sources, which can be automatically recognized. For all other data format types, a "Reader" must be configured in the Smooks configuration. The following is an example of configuring the CSV reader:
xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.1.xsd">
<csv:reader fields="firstname,lastname,gender,age,country" separator="|" quote="'" skipLines="1" />
smooks-resource-list>
Readers for EDI, JSON etc are similarly configured via unique configurations namespaces i.e. <edi:reader></edi:reader>, <json:reader></json:reader> etc. These namespaced configurations are supported via the Extensible XSD based Configuration Model outlined earlier.
The job of the configured Reader is that of translating the source data stream into a structured data event stream (i.e. the canonical form - currently based on SAX2). Smooks listens to this stream of events, firing configured Visitor Logic (e.g. templating or binding resources) at the appropriate times.
This is straightforward:
private Smooks smooks = new Smooks("/smooks-configs/customer-csv.xml");
public void transCustomerCSV(Reader csvSourceReader, Writer xmlResultWriter) {
smooks.filter(new StreamSource(csvSourceReader), new StreamResult(xmlResultWriter));
}
The Smooks.filter() method consumes the standard javax.xml.transform.Source and javax.xml.transform.Result types. The Smooks project also defines a number of new implementations.
XML is the easiest visualization of the event stream generated by a source data stream. So for an XML source, there's no real issue. For a non XML source (e.g. CSV), it's not so easy. The source looks typically nothing like XML. To help with this, Smooks provides an Execution Report Generator tool. One of the uses of this tool is that of helping you visualize the event stream generated by a non XML data source, as XML. It's also very useful as a debugging tool.
This report generation tool is injected into the Smooks ExecutionContext:
private Smooks smooks = new Smooks("/smooks-configs/customer-csv.xml");
public void transCustomerCSV(Reader csvSourceReader, Writer xmlResultWriter) {
ExecutionContext executionContext = smooks.createExecutionContext();
executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));
smooks.filter(new StreamSource(csvSourceReader), new StreamResult(xmlResultWriter), executionContext);
}
The output of which is a HTML page as follows (in Smooks v1.1):
JBoss are in the process of building an Eclipse editor for Smooks as part of JBoss Tools. These tools will further simplify the process of visualizing, and working with, non XML data source event streams.
This use case is good in terms of demonstrating how a number of Smooks capabilities can be combined to perform a more complex task.
Continuing with the CSV example, we have the following basic requirements:
Smooks provides support for applying fragment based transforms using a number of popular templating technologies, including XSL and FreeMarker. Smooks also provides the ability to capture DOM NodeModels from the source event stream (again the source can be non XML), even when the SAX Filter is in use. With this, Smooks constructs "mini" DOM models from source data fragments and makes them available to other Smooks resources, such as FreeMarker templating and Groovy scripting resources. With this approach, you get some of the benefits of the DOM processing model, while still processing in a streamed environment. For the outlined use case, we will use FreeMarker as the templating technology.
Smooks also provides out of the box support for routing data fragments (generated from source data fragments) to a number of different endpoint types, namely JMS, File and Database. As with everything else in Smooks, such capabilities can always be built on or replicated to other use cases e.g. plugging in a custom email routing Visitor component would be trivial. JBoss ESB (and other ESBs) provide custom Smooks Visitor components for performing fragment based ESB endpoint routing from inside a Smooks filtering process running on the ESB.
So configuring Smooks to fulfill the above use case is quite trivial:
xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.1.xsd"
xmlns:jms="http://www.milyn.org/xsd/smooks/jms-routing-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
<params>
(1) <param name="stream.filter.type">SAXparam>
params>
(2) <csv:reader fields="firstname,lastname,gender,age,country" separator="|" quote="'" skipLines="1" />
(3) <resource-config selector="csv-record">
<resource>org.milyn.delivery.DomModelCreatorresource>
resource-config>
(4) <jms:router routeOnElement="csv-record" beanId="csv_record_as_xml" destination="xmlRecords.JMS.Queue" />
(5) <ftl:freemarker applyOnElement="csv-record">
(5.a) <ftl:template>/templates/csv_record_as_xml.ftlftl:template>
<ftl:use>
(5.b) <ftl:bindTo id="csv_record_as_xml"/>
ftl:use>
ftl:freemarker>
smooks-resource-list>
The FreeMarker template (5.a) can also be defined inline in the Smooks configuration (inside the <ftl:template></ftl:template> element), but in this case we define it in an external file:
<#assign csvRecord = .vars["csv-record"]> <#-- special assignment because csv-record has a hyphen -->
<customer fname='${csvRecord.firstname}' lname='${csvRecord.lastname}' >
<gender>${csvRecord.gender}<gender>
<age>${csvRecord.age}<age>
<nationality>${csvRecord.country}<nationality>
<customer>
The above FreeMarker template references the
Smooks can be effectively used to populate Java Object models from any supported source data format. The populated Object model can be used as a result in it's own right, or can be used as a model for a templating operation i.e. the populated object models (stored in the bean context) are made available to the templating technologies (just like with the NodeModels).
Going with the CSV example again. We have a Customer Java class, as well as the Gender enum type (getters/setters omitted):
public class Customer {
private String firstName;
private String lastName;
private Gender gender;
private int age;
}
public enum Gender {
Male,
Female
}
The Smooks configuration for populating a list of this Customer object from the CSV stream would be as follows:
xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.1.xsd">
(1) <csv:reader fields="firstname,lastname,gender,age,country" separator="|" quote="'" skipLines="1" />
(2) <jb:bindings beanId="customerList" class="java.util.ArrayList" createOnElement="csv-set">
(2.a) <jb:wiring beanIdRef="customer" />
jb:bindings>
(3) <jb:bindings beanId="customer" class="com.acme.Customer" createOnElement="csv-record">
<jb:value property="firstName" data="csv-record/firstName" />
<jb:value property="lastName" data="csv-record/lastName" />
<jb:value property="gender" data="csv-record/gender" decoder="Enum" >
(3.a) <jb:decodeParam name="enumType">com.acme.Genderjb:decodeParam>
jb:value>
<jb:value property="age" data="csv-record/age" decoder="Integer" />
jb:bindings>
smooks-resource-list>
Of course, a twist on the earlier Split, Transform and Route use case might be to route populated Customer objects to the JMS Queue, instead of XML generated by a FreeMarker template:
xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.1.xsd"
xmlns:jms="http://www.milyn.org/xsd/smooks/jms-routing-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.1.xsd">
<params>
<param name="stream.filter.type">SAXparam>
params>
<csv:reader fields="firstname,lastname,gender,age,country" separator="|" quote="'" skipLines="1" />
<jms:router routeOnElement="csv-record" beanId="customer" destination="xmlRecords.JMS.Queue" />
<jb:bindings beanId="customer" class="com.acme.Customer" createOnElement="csv-record">
<jb:value property="firstName" data="csv-record/firstName" />
<jb:value property="lastName" data="csv-record/lastName" />
<jb:value property="gender" data="csv-record/gender" decoder="Enum" >
<jb:decodeParam name="enumType">com.acme.Genderjb:decodeParam>
jb:value>
<jb:value property="age" data="csv-record/age" decoder="Integer" />
jb:bindings>
smooks-resource-list>
And getting more complex, one could perform multiple routing operations for each csv-record, routing Customer Objects to the JMS Queue and FreeMarker generated XML messages to file.
Inevitably, this question arises again and again. We have performed numerous adhoc benchmarks on Smooks and our general findings were as outlined in the following subsections.
Smooks is in use in quite a few mission critical production environments today. Any time we receive queries re performance, it has always been due to a configuration issue (e.g. leaving Execution Report Generation turned on). Once resolved, users have always been very happy with performance. This is not very empirical, but does suggest to us that Smooks is not a "dog" in respect of performance.
The bottom line seems to be that Smooks Core is quite efficient, only adding a relatively low overhead on top of standard SAX based processing for XML. After that, performance depends on the configured Visitor Logic, what it is doing and how efficient it is.
The primary focus of Smooks v1.2 will be on providing more tools for processing of EDI messages. We also want to provide out of the box support for some of the more popular EDI message types.
As stated earlier, another important development for Smooks will be the work going on in the JBoss Tools project, where they are building an Eclipse Editor for Smooks.
Hopefully this article has given the reader a better insight into Smooks and it's core capabilities. We hope people will download Smooks, take it for a spin, provide feedback etc etc.
Would you enroll in an India Forex Group i.e http://www.indiaforex.com Groups?
Sorry, I was interested until this article. SAX, DOM, heavy xml configuration? Joke.
Could you elaborate on that?
Maybe you could specifically address your comment to the use cases and problem areas Smooks is targeted at and tell us specifically why it is a bad approach and why the techs you listed are such a bad choice (and maybe what you think would be better). Is it just that XML, SAX etc are not "cool" enough? ;) It's very easy make a comment like that, but it's also totally weightless. Taking the example problem looked at in the article (splitting a huge CSV message and routing the fragments as Java Objects or XML to a JMS destination).... I wouldn't consider the solution as being very "heavy" re configuration. What are you comparing Smooks to?
I've been various curious about Smooks for a while now. Thanks for the intro article.
oh and any plans to move to Stax Reading/Writing?
oh and any plans to move to Stax Reading/Writing?
I've done some playing with a Stax based filter and for some use cases, I think it can bring quite significant performance increases. For others, it's of no benefit at all. For many users however, it's actually irrelevant (performance aside) because unless they are actually writing custom visitor logic, they don't actually see the details of SAX, Stax etc... that's hidden away under the hood.
With own custom code. With stax, or csv jdbc driver and programmed convertor (in java binding case), result will be of similar code length, not mentioning performance and flexibility and refactoring difficulties due hardcoded bean properties in xml. I work on ESB like projects screaming for some standardized way of converting messages, but smooks looks like clear overkill to me. Reader and writers of different formats are good, but rest of smooks only complicates job.
So you're saying you believe: 1. Writing your own custom code (to do?), 2. Handcoding Stax (or manually using a CSV jdbc driver), 3. And "programmed converter" in java binding case (whatever that is exactly). Is more maintainable than: 1. Smooks configuration of 20 to 30 lines of XML. I guess we all have different ideas re what is maintainable and what is not.
Yes. Of course that we have own little transformation library. It's usage code is shorter then any of article examples. Is it possible even use xml schema to validate sources and products of transformations? What about per (one or more) record transactions?
Smooks would be much more acceptable for me if 1. reader, router, binder... could be done in java code, not current xml 2. some sort of more controlled way over whole transformation. Not only visitor pattern where are you limited only to current event and some custom made context. In stax you can chain easily chain readers and writers and make nice pipeline on one stream of events. Plus you can whenever you want stop consume stream. (in case of invalid or malformed message) Great, but not possible in smooks now.
Before I am going to reply on the comments of Anthavio Lenz I want to tell you my background. I am one of the Smooks core developers, however I started as a Smooks User. I am also a Software developer in a company that uses Smooks for several data processing solutions. So I am eating my own (mostly Toms ;) ) dogfoot but that also has the advantage that I can compare it in real life situations to other data processing solutions. Directly using a stax, sax, dom, csv or any other reader has the disadvantage that you are writing code that is very specific to the data format that you are reading. Switching format, which does happen, isn't so easy then. Writing code that can process two different formats in the same way isn't easy to do then either. You could, of course, write an abstraction layer to solve that. However Smooks already provides that and a lot more. So why take the trouble to reinvent the wheel? Another is that code that uses low API readers get a lot complexer and there for a lot less readable and lot harder to maintain, when the complexity of the input data model increases. A great thing about Smooks is that complex data models still result in a consistent easy to read, easy to maintain configuration. Smooks especially fits very good in ESB environments. Because ESB's are also about declarative configuration they both fit naturally togehter. It is correct that a good Java API is just as important as a good XML configuration. At the moment the Smooks configuration API is a bit complex because of it's highly flexible nature. But you can expect that in the future Smooks will have an improved Java API in a similar way as the new XML namespaced based configuration improved the XML configuration. It is correct that hard coding the properties in the XML file isn't that great because the refactoring won't work in those places. Hopefully the Smooks Eclipse plugin will provide a solution for that in the future. It could also be that Smooks will get an annotation based way to do the bean binding directly in the Java beans. Per record transactions can be done within Smooks. However no default visitor is available for that now. It is only a small deal to provide a feature like that. Chaining readers and writers, if I understand you correctly, is already possible. For instance the DomModelCreator visitor does that. It creates DOM model from the selected node that can be used in other visitors like the freemarker visitor. The Smooks team has talked about the feature where you can stop consuming a document on a certain condition. That will probably be available in the Smooks 1.2 release. Smooks is a very good solutions for processing any structured data model. But there is always room for improvement and the Smooks team, hopefully with help from outside contributers, will work hard on it.
Thanks for a little more detail. I don't think we would ever suggest that Smooks is the answer to all transformation needs that anyone would ever have. I think it's inevitable that you'll end up rolling your own in some situations and that some transforms would be more easily implemented outside Smooks. There's no silver bullet. I think you are coming to this from the position of someone that has effectively written their own "Smooks", and you're happy with your own solution. It also sounds like you're coming with the philosophy/preference for a Java based config, which is perfectly valid too. I don't think everyone starts from a position of already having a solution of their own (that they are happy with), or a preference for a non-XML based configuration. Smooks fills a gap here! Your comment re stax vs sax and not being able to abort sax is totally valid. This is a shortcoming that others have identified too and is something we plan on addressing. As I said in an earlier post... we plan on introducing stax support i.e. Smooks is not wedded to any of these technologies and can take advantage of them all. It's a process of evolving the project! I'd love to see your solution to one of the problems looked at in the article i.e. where you split a huge CSV message (or XML/EDI/Java... whatever) out into Java objects and route them to e.g. a JMS destination (or file or database - in a controlled manner!!). I mean that genuinely... I would be interested in seeing how you've done it in a way that can be standardized into a library that others can reuse easily (i.e. what we're striving to achieve with Smooks), and is also consistent with solutions to a number of other use cases. Maybe you could post your equivalent solution here, or email it to us on the Smooks mailing list.
I'm using Smooks for few months. Splitting huge XML/CSV files, mapping to Java objects, checking objects (waiting for declarative validation with Smooks) and loading into DB/JMS never was so simple. How many times you write your own code for this simple use case?. Declarative configuration is very simple! You declare only "source" "what" and "destination". Don' be afraid of <50 lines od XML! ;-)
50 lines of XML is betten than 50 lines of Java...
I totally agree. Xml is much easier to manage than Java code. We have been using Smooks to parse huge CSV files. And it does a very very good job at it. It only takes about 2-3 seconds to parse about 700k records. Its really fast and very simple to configure. Love the xml configuration and love Smooks. Good work Tom!
Any plans for XQuery support ?
For me In the First Look it seems very interesting. Getting the Java object from the non-xml formats is very good. Have to try it. Thanks for the good article.
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
17 comments
Watch Thread Reply