BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News A New Library and Tooling Package for Open XML

A New Library and Tooling Package for Open XML

This item in japanese

Bookmarks

Office Open XML is an internationally recognized standard for documents that is based on an ZIP/XML representation of various Microsoft Office file formats. It competes with the Open Document Format (ODF), another internationally recognized standard format based on the native format for Open Office files. While it is possible to manipulate Open XML files using low level APIs, the complexity of the format makes that a daunting challenge.

The first generation of the Open XML SDK provided a thin layer on top of the raw XML. While better than nothing, it still required an intimate knowledge of the underlying format. As such it wasn’t of much interest and most developers continued using the Office COM APIs. Unfortunately the COM libraries are very problematic. They require the associated Office products to be installed and cannot be safely used from servers such as IIS. Even when accessed via standalone programs, developers need to take extreme to avoid leaking instances of Word or Excel.

Open XML SDK 2.0 offers a higher level API for manipulating Open XML documents. Unlike the previous version there are specific APIs for each type of document. A deep understanding of the underlying file format is still required, but it is a stepping stone.

Also included in this release is the Open XML SDK v2.0 Productivity Tool. The primary purpose of this tool is to reverse engineer a Word, PowerPoint, or Excel document. It will then generate C# code that can recreate the document. This tool can also be used to validate documents.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Too hard to use for the common mortal, but there's a way

    by Francois Ward,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    That's a few months old, so not that new :)

    That being said, the SDK is too hard to use for the average programmer. You need a deep understanding of the structure of an Open XML document. The SDK let's you create invalid documents quite easily. For example, there is nothing stopping you from putting a string straight in a cell, even though by default cells expect the value to be in a string dictionary. You need to either do that, or change the type of the cell.

    Excel WILL open the invalid document, and prompt you to fix the corrupt data, and then it will work, but that isn't very friendly to your users.

    There's a solution I stumbled on while doing a project that required generating Excel 2007 files on the fly.

    closedxml.codeplex.com/

    This is a very actively maintained wrapper around the SDK. It only works with Excel files right now, but it is extremely user friendly, and intuitive (the documentation is top notch, but even without it you can generally guess how to do 90% of things).

    It doesn't do everything by any mean, but 90% of common cases are covered. Give it a shot (disclaimer: I'm not associated with the project in any way, shape or form. I'm just a happy user)

  • too hard for small projects..

    by Eric Fleites,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    they provide a C# class generation to create or clone a pre-existing document..
    but, for small projects I continue using the old-school interop objects :)

  • Re: too hard for small projects..

    by Francois Ward,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Even if you close a pre-existing document. Just adding something as simple as text in a cell is non-trivial, as i described above. You need to get "parts", find parts in those parts, the naming convention is non-intuitive, etc. If you understand OOXML at a low level, it all makes sense.

    Don't get me wrong, I've done it. It is just an order of magnitude harder than what you'd expect, and there's nothing stopping you from doing it wrong.

    Interop objects work great, but they aren't thread safe, so its a no go for serious web site development (with large amount of concurrent users). My understanding and my testing so far show that the OOXML SDK works fine (since all it is is a glorified XML manipulation sdk) in those environments. There's Aspose that I beleive work fine too.

  • Re: Too hard to use for the common mortal, but there's a way

    by Jonathan Allen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    That certainly look interesting, I'll have to get an interview with them.

    And for the record, we don't mind self promotion. If you have a project that you think is worth talking about by all means let us know.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT