InfoQ

News

Using LINQ to XML Instead of XSLT for Transformations

Posted by Jonathan Allen on Jul 30, 2007 01:30 PM

Community
.NET
Topics
XML Databinding
Tags
XSLT,
LINQ

Transforming XML from one format to another is a common task for many developers. To do this, most of them leave the confines of their general purpose language and make calls to an XSLT library. But what if they didn't have to?

With LINQ to XML, it now becomes much easier to manipulate XML using C# and VB. Eric White describes how one can perform XSLT style transformations using C# 3.0.

The key to Eric's method is the ability to annotate XML nodes with additional information. Instead of altering the tree piece-meal, each pending change is stored as annotation against XElement it replaces. Eric writes

One advantage to taking this approach - as you formulate queries, you are always writing queries on the unmodified source tree. You need not worry about how modifications to the tree affect the queries that you are writing.

Once all of the pending changes have been generated, they are applied at one time. This is done via the XForm function, which creates a copy of the tree, making replacements where appropriate.

You can learn more about this technique and get a copy of the XForm function from Eric White's blog. 

6 comments

Reply

Why use objects by Frank Cohen Posted Jul 30, 2007 7:20 PM
Re: Why use objects by Jonathan Allen Posted Jul 31, 2007 4:15 PM
this seems to be a step back .... by Ke Jin Posted Aug 2, 2007 12:43 AM
Re: this seems to be a step back .... by Ke Jin Posted Aug 2, 2007 12:46 AM
Re: this seems to be a step back .... by Eric White Posted Aug 9, 2007 5:33 PM
Re: this seems to be a step back .... by berkay NiQuiL Posted Jun 30, 2008 5:12 PM
  1. Back to top

    Why use objects

    Jul 30, 2007 7:20 PM by Frank Cohen

    Interesting move for the XLinq team. Anything that makes it easier to work with XML in an object oriented environment is a good thing. I did not find from Eric's blog if there is an impact on system performance at runtime with this approach. I investigated using XML and XQuery in an architecture I named FastSOA as a way to mitigate the performance problems I find in Java approaches to working with XML. My findings are in my book titled FastSOA from Morgan Kaufmann Publishers. The basic problem I found was all the object instantiation needed to move from serialized XML to objects, to business logic operating on the data, and then back to XML or RDMS data formats. There's just too much object instantiation going on. I recommend using a domain specific language like XQuery because certain XQuery implementations compile XQuery to Java Byte Code and avoid objects entirely. (I also never did get my head into the non-proceduralness of XSLT, oh well.) -Frank Cohen http://www.pushtotest.com

  2. Back to top

    Re: Why use objects

    Jul 31, 2007 4:15 PM by Jonathan Allen

    As far as performance is concerned, it is way too early to tell. Because they are implemented so differently, I can easily see minor tweaks to a transformation shifting the advantage back and forth between the various techniques. I think the real gain from this technique is that you don't have to marshall all the data you might need from C# to XQuery or XSLT. If your transformation needs to do something expensive like a database call or hard calculation one time out of ten, you only pay the cost when you actually need it. On the other hand, XSLT is still a lot cleaner if you think in terms of "This is what my results are suppose to look like." rather than "This is how I transform my source.". Needless to say, this is why I didn't conclude with "And this is why you should use X".

  3. Back to top

    this seems to be a step back ....

    Aug 2, 2007 12:43 AM by Ke Jin

    In java and C++ world, this has been a common and obvious practice for years. For simple transformations, we have DOM and SAX API to parse XML documents. Based on the parsing result, one can use pure imperative java and/or C++ code to generate DOM objects or simply output XML text streams without using XSLT transformer API and certainly without XSLT style sheets. However, for generic applications that use XSLT today, declarative XSLT style sheet code are much cost effective to develop and maintain than imperative Java or C++ code. Also, XSLT is much easy to learn and use by non-programmers, namely business domain experts who have no skills on java or C++ programming, not to mention the DOM or SAX APIs.

  4. Back to top

    Re: this seems to be a step back ....

    Aug 2, 2007 12:46 AM by Ke Jin

    In java and C++ world, this has been a common and obvious practice for years. For simple transformations, we have DOM and SAX API to parse XML documents. Based on the parsing result, one can use pure imperative java and/or C++ code to generate DOM objects or simply output XML text streams without using XSLT transformer API and certainly without XSLT style sheets. However, for generic applications that use XSLT today, declarative XSLT style sheet code are much cost effective to develop and maintain than imperative Java or C++ code. Also, XSLT is much easy to learn and use by non-programmers, namely business domain experts who have no skills on java or C++ programming, not to mention the DOM or SAX APIs.
    Also, declarative XSLT style sheets are much easy to be generated and verified by UI tools than imperative java/C++ or C# code.

  5. Back to top

    Re: this seems to be a step back ....

    Aug 9, 2007 5:33 PM by Eric White

    FWIW, I absolutely agree about the usage scenarios for XSLT. In the LINQ to XML documentation, I have at least 4 or 5 examples that show how to use XSLT to transform an XML tree. XSLT transforms create a new tree, so XSLT does not aleviate the problems of too many short-lived objects. With respect to processor cost, I haven't done any metrics, however, when doing XSLT transforms using LINQ to XML, the XML tree has to be transformed into an XPathDocument internally, which is a big transform. Then XPath expressions have to be evaluated, and then the transform is effected. In contrast, the LINQ to XML queries that you use to add annotations are quite efficient (due to lazy evaluation of LINQ queries and the semantics of LINQ to XML axes), and adding annotations is cheap. I am going to bet that the pure LINQ to XML approach is more efficient. However, I'm not going to test this until my current deadlines are met :-) One more note: this technique can expanded significantly. Possible improvements are: - add modes, ala XSLT. Annotations are marked with modes. ApplyTransforms takes a mode as an attribute. The XForm function can also take a mode. - allow for annotations on other types of nodes: attributes, text nodes, (and processing instructions and comments for completeness). My only point about this post is that this is simply one approach to transforming XML trees when using LINQ to XML. It may be useful in some scenarios, but in other scenarios, XSLT may certainly be better.

  6. Back to top

    Re: this seems to be a step back ....

    Jun 30, 2008 5:12 PM by berkay NiQuiL

Exclusive Content

Rationalizing the Presentation Tier

Thin client paradigm characterized by web applications is a kludge that needs to be repudiated. Old compromises are no longer needed and it's time to move the presentation tier to where it belongs.

Agile Project Management: Lessons Learned at Google

In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.

AtomServer – The Power of Publishing for Data Distribution

In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.

An Introduction to Virtualization

It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.

REST Anti-Patterns

In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.

Choosing between Routing and Orchestration in an ESB

In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.

Enterprise Batch Processing with Spring

Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.

User Story Estimation Techniques

Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.