InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Article: Structured Event Streaming with Smooks

Posted by Mark Little on Nov 26, 2008

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
SOA Platforms ,
SOA ,
Open Source ,
SOA Appliance ,
ESB
Tags
XSLT ,
Smooks ,
XML

The Smooks project has been around for a while but probably started to come to prominence when JBossESB used it as the default transformation engine. Over the last year we've seen other open source ESBs also add support for Smooks transformation technology as well. However, as this InfoQ article from Tom Fennelly, the Smooks lead, shows Smooks is capable of much more than just transformation.

Read Structured Event Streaming with Smooks on InfoQ.

  • This article is part of a featured topic series on SOA

17 comments

Watch Thread Reply

Smooks is 90ties retro? by Anthavio Lenz Posted
Re: Smooks is 90ties retro? by Maurice Zeijen Posted
Re: Smooks is 90ties retro? by Anthavio Lenz Posted
Re: Smooks is 90ties retro? by Maurice Zeijen Posted
Re: Smooks is 90ties retro? by Tom Fennelly Posted
Re: Smooks is 90ties retro? by Tom Fennelly Posted
Re: Smooks is 90ties retro? by Ivan Lazarte Posted
Re: Smooks is 90ties retro? by Ivan Lazarte Posted
Stax Support [was: Smooks is 90ties retro?] by Tom Fennelly Posted
Re: Smooks is 90ties retro? by Anthavio Lenz Posted
Re: Smooks is 90ties retro? by Tom Fennelly Posted
Re: Smooks is 90ties retro? by Anthavio Lenz Posted
Very simple and useful by Tomasz Juchniewicz Posted
Re: Very simple and useful by Tomasz Juchniewicz Posted
Re: Very simple and useful by John Dondapati Posted
XQuery ? by wojtek serafin Posted
Re: XQuery ? by Rajagopal Yendluri Posted
  1. Back to top

    Smooks is 90ties retro?

    by Anthavio Lenz

    Sorry, I was interested until this article. SAX, DOM, heavy xml configuration? Joke.

  2. Back to top

    Re: Smooks is 90ties retro?

    by Maurice Zeijen

    Could you elaborate on that?

  3. Back to top

    Re: Smooks is 90ties retro?

    by Tom Fennelly

    Maybe you could specifically address your comment to the use cases and problem areas Smooks is targeted at and tell us specifically why it is a bad approach and why the techs you listed are such a bad choice (and maybe what you think would be better). Is it just that XML, SAX etc are not "cool" enough? ;) It's very easy make a comment like that, but it's also totally weightless. Taking the example problem looked at in the article (splitting a huge CSV message and routing the fragments as Java Objects or XML to a JMS destination).... I wouldn't consider the solution as being very "heavy" re configuration. What are you comparing Smooks to?

  4. Back to top

    Re: Smooks is 90ties retro?

    by Ivan Lazarte

    I've been various curious about Smooks for a while now. Thanks for the intro article.

  5. Back to top

    Re: Smooks is 90ties retro?

    by Ivan Lazarte

    oh and any plans to move to Stax Reading/Writing?

  6. Back to top

    Stax Support [was: Smooks is 90ties retro?]

    by Tom Fennelly

    oh and any plans to move to Stax Reading/Writing?


    I've done some playing with a Stax based filter and for some use cases, I think it can bring quite significant performance increases. For others, it's of no benefit at all. For many users however, it's actually irrelevant (performance aside) because unless they are actually writing custom visitor logic, they don't actually see the details of SAX, Stax etc... that's hidden away under the hood.

  7. Back to top

    Re: Smooks is 90ties retro?

    by Anthavio Lenz

    With own custom code. With stax, or csv jdbc driver and programmed convertor (in java binding case), result will be of similar code length, not mentioning performance and flexibility and refactoring difficulties due hardcoded bean properties in xml.
    I work on ESB like projects screaming for some standardized way of converting messages, but smooks looks like clear overkill to me. Reader and writers of different formats are good, but rest of smooks only complicates job.

  8. Back to top

    Re: Smooks is 90ties retro?

    by Tom Fennelly

    So you're saying you believe:
    1. Writing your own custom code (to do?),
    2. Handcoding Stax (or manually using a CSV jdbc driver),
    3. And "programmed converter" in java binding case (whatever that is exactly).

    Is more maintainable than:
    1. Smooks configuration of 20 to 30 lines of XML.

    I guess we all have different ideas re what is maintainable and what is not.

  9. Back to top

    Re: Smooks is 90ties retro?

    by Anthavio Lenz

    Yes. Of course that we have own little transformation library. It's usage code is shorter then any of article examples.
    Is it possible even use xml schema to validate sources and products of transformations? What about per (one or more) record transactions?

  10. Back to top

    Re: Smooks is 90ties retro?

    by Anthavio Lenz

    Smooks would be much more acceptable for me if
    1. reader, router, binder... could be done in java code, not current xml
    2. some sort of more controlled way over whole transformation. Not only visitor pattern where are you limited only to current event and some custom made context. In stax you can chain easily chain readers and writers and make nice pipeline on one stream of events. Plus you can whenever you want stop consume stream. (in case of invalid or malformed message) Great, but not possible in smooks now.

  11. Back to top

    Re: Smooks is 90ties retro?

    by Maurice Zeijen

    Before I am going to reply on the comments of Anthavio Lenz I want to tell you my background. I am one of the Smooks core developers, however I started as a Smooks User. I am also a Software developer in a company that uses Smooks for several data processing solutions. So I am eating my own (mostly Toms ;) ) dogfoot but that also has the advantage that I can compare it in real life situations to other data processing solutions.

    Directly using a stax, sax, dom, csv or any other reader has the disadvantage that you are writing code that is very specific to the data format that you are reading. Switching format, which does happen, isn't so easy then. Writing code that can process two different formats in the same way isn't easy to do then either. You could, of course, write an abstraction layer to solve that. However Smooks already provides that and a lot more. So why take the trouble to reinvent the wheel? Another is that code that uses low API readers get a lot complexer and there for a lot less readable and lot harder to maintain, when the complexity of the input data model increases. A great thing about Smooks is that complex data models still result in a consistent easy to read, easy to maintain configuration.

    Smooks especially fits very good in ESB environments. Because ESB's are also about declarative configuration they both fit naturally togehter.

    It is correct that a good Java API is just as important as a good XML configuration. At the moment the Smooks configuration API is a bit complex because of it's highly flexible nature. But you can expect that in the future Smooks will have an improved Java API in a similar way as the new XML namespaced based configuration improved the XML configuration.

    It is correct that hard coding the properties in the XML file isn't that great because the refactoring won't work in those places. Hopefully the Smooks Eclipse plugin will provide a solution for that in the future. It could also be that Smooks will get an annotation based way to do the bean binding directly in the Java beans.

    Per record transactions can be done within Smooks. However no default visitor is available for that now. It is only a small deal to provide a feature like that.

    Chaining readers and writers, if I understand you correctly, is already possible. For instance the DomModelCreator visitor does that. It creates DOM model from the selected node that can be used in other visitors like the freemarker visitor.

    The Smooks team has talked about the feature where you can stop consuming a document on a certain condition. That will probably be available in the Smooks 1.2 release.

    Smooks is a very good solutions for processing any structured data model. But there is always room for improvement and the Smooks team, hopefully with help from outside contributers, will work hard on it.

  12. Back to top

    Re: Smooks is 90ties retro?

    by Tom Fennelly

    Thanks for a little more detail.

    I don't think we would ever suggest that Smooks is the answer to all transformation needs that anyone would ever have. I think it's inevitable that you'll end up rolling your own in some situations and that some transforms would be more easily implemented outside Smooks. There's no silver bullet.

    I think you are coming to this from the position of someone that has effectively written their own "Smooks", and you're happy with your own solution. It also sounds like you're coming with the philosophy/preference for a Java based config, which is perfectly valid too. I don't think everyone starts from a position of already having a solution of their own (that they are happy with), or a preference for a non-XML based configuration. Smooks fills a gap here!

    Your comment re stax vs sax and not being able to abort sax is totally valid. This is a shortcoming that others have identified too and is something we plan on addressing. As I said in an earlier post... we plan on introducing stax support i.e. Smooks is not wedded to any of these technologies and can take advantage of them all. It's a process of evolving the project!

    I'd love to see your solution to one of the problems looked at in the article i.e. where you split a huge CSV message (or XML/EDI/Java... whatever) out into Java objects and route them to e.g. a JMS destination (or file or database - in a controlled manner!!). I mean that genuinely... I would be interested in seeing how you've done it in a way that can be standardized into a library that others can reuse easily (i.e. what we're striving to achieve with Smooks), and is also consistent with solutions to a number of other use cases. Maybe you could post your equivalent solution here, or email it to us on the Smooks mailing list.

  13. Back to top

    Very simple and useful

    by Tomasz Juchniewicz

    I'm using Smooks for few months. Splitting huge XML/CSV files, mapping to Java objects, checking objects (waiting for declarative validation with Smooks) and loading into DB/JMS never was so simple. How many times you write your own code for this simple use case?. Declarative configuration is very simple! You declare only "source" "what" and "destination".

    Don' be afraid of <50 lines od XML! ;-)

  14. Back to top

    Re: Very simple and useful

    by Tomasz Juchniewicz

    50 lines of XML is betten than 50 lines of Java...

  15. Back to top

    Re: Very simple and useful

    by John Dondapati

    I totally agree. Xml is much easier to manage than Java code. We have been using Smooks to parse huge CSV files. And it does a very very good job at it. It only takes about 2-3 seconds to parse about 700k records. Its really fast and very simple to configure.

    Love the xml configuration and love Smooks. Good work Tom!

  16. Back to top

    XQuery ?

    by wojtek serafin

    Any plans for XQuery support ?

  17. Back to top

    Re: XQuery ?

    by Rajagopal Yendluri

    For me In the First Look it seems very interesting.
    Getting the Java object from the non-xml formats is very good.
    Have to try it.

    Thanks for the good article.

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.