Open Source Word Generator Using OpenXML SDK 2.0
OpenXML SDK 2.0 for MS Office provides strongly typed part classes to manipulate Open XML documents. WorddocGenerator, an open source utility for generating template driven word files is one example of what can be done with this SDK. InfoQ got in touch with Atul Verma the developer of this utility to ask him a few questions about this project.
InfoQ: How is worddocgenerator different from other document generators like FlexDoc?
Atul: This utility
- Doesn’t require that Word needs to be installed for document generation
- Uses Open Xml 2.0 and Visual Studio 2010
- Used Content Controls for document generation
- Provides a lot of samples that cover many ways to generate a Word document e.g.
- Setting content using C#(no data binding)
- Data bound content controls
- XPath expressions
- Generate using Xml i.e. XNode or entity class e.g. Order
Though I never used FlexDoc, however I saw a warning message on the home page i.e. “WARNING: Current version of fleXdoc depends on a feature of Microsoft Word, that has been removed (sort of) in Office 2010 due to patent issues! This also applies to US-versions of Office 2007 released after november 2009.”. If that is the case then FlexDoc doesn’t seem to be appropriate for document generation.
InfoQ: How do the refreshable components work? Do they connect to the server to fetch data?
Atul: The utility expects that every content control for which data needs to be populated we need a specify a Tag in the Word template. During generation we need to map the Tag to the PlaceHolderType enum accordingly. The types of PlaceHolders are
- Recursive: This type corresponds to controls where there is 1:N relation between template and data i.e. one example will be repeating a list of Items.
- Non-Recursive: This type corresponds to controls where there is 1:1 relation between template and data i.e. one example will be showing a User name.
- Ignore: No action is required for these controls.
- Container: This type is required only for refreshable documents. We save the container region in CustomXmlPart the first time document is generated from template. Next time onwards we retrieve the container region that was saved and refresh the document. This makes the document self-refreshable.
I’ll explain the refresh operation with this example. I have a template e.g. “Test.docx”. I get a data object for which the document needs to be generated e.g. Order from my data layer(through database). The first time document is generated from template the content controls(container type) are saved to the CustomXmlPart. Let’s say that the generated document is “TestOut.docx”. Let’s say that a change happened to Order. This means that to be sync with database I need to refresh the document. I will get the document i.e. “TestOut.docx” and latest data i.e. Order object from data layer(through database) and refresh it. As the document is refreshable I don’t require “Test.docx” for refresh. I’ve covered all these types of PlaceHolders in the samples.
The utility requires a document, data object and a generator and returns the generated document. How the data is fetched is not required. Word need not to be installed for document generation.
I have added a sample which shows one of the ways to refresh the document from within the Word (e.g. right click on document and click Refresh Data) using document-level customizations for Word 2010. In this particular case utility can be hosted on Server (Word need not be installed) and invoked from the client (Word document having document-level customization). Please visit this link for more information.
InfoQ: How is the performance for generating multiple documents for the same data?
Atul: I’ve not done any performance benchmarking, however the document generation is quite fast. I wanted to create an utility for document generation using Open Xml 2.0 SDK from the point of view of POC/Samples. I’ll work during spare time on refactoring as well as performance in future.
InfoQ: Is a similar utility possible with Excel?
Atul: As this utility is specific to Word 2007/ Word 2010 it won’t work with Excel. However similar utilities/frameworks can be easily created for Excel using OpenXml 2.0 SDK e.g. ClosedXml is one such project.
InfoQ: This is a good example of what can be done using the OpenXML SDK – any other useful features that could ideally be added?
Atul: The purpose to create this Utility is
- Write minimum code to generate documents
- Show the samples to generate documents using approaches listed below
- Generate documents that can be non-refreshable as well as refreshable
- Generate documents from either Object(e.g. Order class) or XmlNode(using XPath expressions)
- Setting values of content controls using C#
- Using data bound content controls
- Append documents to the primary document
I’d like to seek feedback about the samples that should to be added to the utility.
Check out these blog posts for more details about this utility and to provide your feedback to Atul. To learn more about OpenXML SDK 2.0, you can refer to the XML in Office Developer resources as well as MSDN.
Shane Hastie on Distributed Agile Teams, Product Ownership and the Agile Manifesto Translation Program
Shane Hastie Apr 17, 2015