Making OpenXML Easy with ClosedXML
When working with Excel documents, developers traditionally choose to either use raw XML or rely on the Office Automation libraries. But as most of you know, the Office Automation library is not appropriate for servers and working with XML can be quite tedious. ClosedXML bridges the gap by providing an easy to use Office-like API without the overhead of COM.
To introduce this library we spoke with Manuel De Leon of the ClosedXML project.
1. What originally prompted you to develop the ClosedXML library?
It started as a side project while I was working at an investment bank in NYC. We were toying with the idea of moving the Excel reporting to .NET. I looked at different libraries out there (free and commercial ones) but they all felt a little clunky. They felt like they were designed around the OpenXML specifications but I was looking for something designed around Excel and how it works (the fact that files get saved as OpenXML is just incidental).
Since we weren’t pressed for a solution I decided to create one that more closely matched my coding style. A little later I had to move to Colorado so I told my boss I was interested in the project and that I was going to continue developing it as an open source. That gave me the freedom to mold the library to how I like to work instead of what the company needed at the moment. I pretty much scrapped what I had done and started from scratch on a new and improved API, ClosedXML.
2. What particular problems do you see developers having with Microsoft’s OpenXML libraries?
Microsoft’s OpenXML SDK is to OpenXML documents like the Assembly language is to processors. You can use it to get your work done but it takes a tremendous amount of effort to do anything.
These are just a few of the things that make the SDK hard to work with:
- The SDK is just not intuitive to work with at all.
- You have to manually specify every little detail even if it has nothing to do with what you’re trying to accomplish.
- Even though you’re working with XML documents you have to create all elements in a specific order or the file fails to load.
- There are so many moving parts that fixing a problem becomes a guessing game.
This touches on why I chose the name ClosedXML. At the beginning all I wanted was to put “Hello World” on a cell. A few hundred lines of code later (using the SDK) I had my hello world program. At that point I thought “if this is what it’s like to work with an open framework I’d rather use a closed one”. It reminds me of a comment someone made on the forum: “When I saw the name of this project I knew it was the one I was looking for, someone who gets it.”
3. Is the ClosedXML API based on Excel’s COM API? If so, how closely do you intend to follow it?
It is loosely based on Excel’s COM API but it doesn’t try to mimic it. I do look at VBA as a reference whenever I implement something new but it probably won’t be easy to port VBA code to ClosedXML. That said the API’s are similar enough that if you’ve worked with VBA then you’ll have no trouble figuring out how to do things with ClosedXML.
When I'm planning a new part of the API my first question is "How would I want to use this?" Then I look at how Excel does it, how it's implemented in VBA, and how other spreadsheet components do it too. From that I try to come up with the most intuitive interface I can think of. Ruby’s "Principle of least surprise (according to Matz)" comes to mind.
4. Other companies such as SmartXLS offer commercial libraries for Excel. Why did you choose to release your version under an open source license instead?
When I started this project I never thought of commercializing it. All I had was an itch I needed to scratch. After a month or so I had a barely working version of the library. Since I’ve never worked on an open source project before I thought this would be a great opportunity to do so. I decided to use CodePlex because it looked friendlier than other code repositories.
Right now I continue the project because of the community. I enjoy working with other people, getting their feedback, and knowing that I’m making a difference in the way people work.
5. Is your project currently looking for additional developers?
Help is always welcomed. People contribute to the project in different ways, reporting bugs, submitting patches, etc. Every now and then someone comes up and says “I need X functionality and I’d like to build it for you.”
6. Are there any plans to make a Word version for ClosedXML?
From the very beginning I’ve always entertained the idea of supporting Word documents. Unfortunately there’s so much left to do with Excel documents that unless someone else decides to work on it, that functionality won’t be available any time soon.
Below is a code sample for processing a simple table stored in an Excel file. Manuel writes:
Worth noting are the things we don't have to do:
- Write the boilerplate code to locate the data (the table can start on A1, B1, B2, or any other cell).
- Maintain a row counter. We just iterate through the rows in the table.
- Adjust our code to skip the first row (headers), since it's a table .Rows() skips it for us.
- Find the column where the name is.
- Find the last cell in the row
ClosedXML is just beautiful to work with. It is so intuitive you barely need the documentation, and if you do, the doc is top notch. Everything works the way you'd expect it to (and I never used the COM API), and more. In one line of code you can say "set all the cells in this range to this color", you can create tables, do auto-re-sizing, almost anything.
When i used it the only major things missing was inserting images, graphs, and formulas, which for some can be major, but if you don't need those, ClosedXML is by far your best bet. I also don't know what the official stance is, but it worked flawlessly for us in a web environment (as opposed to the COM API which isn't thread safe).