A New Library and Tooling Package for Open XML

| by Jonathan Allen Follow 635 Followers on Jun 17, 2011. Estimated reading time: 1 minute |

Office Open XML is an internationally recognized standard for documents that is based on an ZIP/XML representation of various Microsoft Office file formats. It competes with the Open Document Format (ODF), another internationally recognized standard format based on the native format for Open Office files. While it is possible to manipulate Open XML files using low level APIs, the complexity of the format makes that a daunting challenge.

The first generation of the Open XML SDK provided a thin layer on top of the raw XML. While better than nothing, it still required an intimate knowledge of the underlying format. As such it wasn’t of much interest and most developers continued using the Office COM APIs. Unfortunately the COM libraries are very problematic. They require the associated Office products to be installed and cannot be safely used from servers such as IIS. Even when accessed via standalone programs, developers need to take extreme to avoid leaking instances of Word or Excel.

Open XML SDK 2.0 offers a higher level API for manipulating Open XML documents. Unlike the previous version there are specific APIs for each type of document. A deep understanding of the underlying file format is still required, but it is a stepping stone.

Also included in this release is the Open XML SDK v2.0 Productivity Tool. The primary purpose of this tool is to reverse engineer a Word, PowerPoint, or Excel document. It will then generate C# code that can recreate the document. This tool can also be used to validate documents.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Too hard to use for the common mortal, but there's a way by Francois Ward

That's a few months old, so not that new :)

That being said, the SDK is too hard to use for the average programmer. You need a deep understanding of the structure of an Open XML document. The SDK let's you create invalid documents quite easily. For example, there is nothing stopping you from putting a string straight in a cell, even though by default cells expect the value to be in a string dictionary. You need to either do that, or change the type of the cell.

Excel WILL open the invalid document, and prompt you to fix the corrupt data, and then it will work, but that isn't very friendly to your users.

There's a solution I stumbled on while doing a project that required generating Excel 2007 files on the fly.

This is a very actively maintained wrapper around the SDK. It only works with Excel files right now, but it is extremely user friendly, and intuitive (the documentation is top notch, but even without it you can generally guess how to do 90% of things).

It doesn't do everything by any mean, but 90% of common cases are covered. Give it a shot (disclaimer: I'm not associated with the project in any way, shape or form. I'm just a happy user)

too hard for small projects.. by Eric Fleites

they provide a C# class generation to create or clone a pre-existing document..
but, for small projects I continue using the old-school interop objects :)

Re: too hard for small projects.. by Francois Ward

Even if you close a pre-existing document. Just adding something as simple as text in a cell is non-trivial, as i described above. You need to get "parts", find parts in those parts, the naming convention is non-intuitive, etc. If you understand OOXML at a low level, it all makes sense.

Don't get me wrong, I've done it. It is just an order of magnitude harder than what you'd expect, and there's nothing stopping you from doing it wrong.

Interop objects work great, but they aren't thread safe, so its a no go for serious web site development (with large amount of concurrent users). My understanding and my testing so far show that the OOXML SDK works fine (since all it is is a glorified XML manipulation sdk) in those environments. There's Aspose that I beleive work fine too.

Re: Too hard to use for the common mortal, but there's a way by Jonathan Allen

That certainly look interesting, I'll have to get an interview with them.

And for the record, we don't mind self promotion. If you have a project that you think is worth talking about by all means let us know.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

4 Discuss