Document Production - An Overview

What is document production?

Document production can be thought of as glorified mail merge; that is, adding data to a template document, to produce an output document.

It is also known as document composition, document generation, document assembly.

The use cases range from producing a single highly customised bespoke output document (eg a legal contract), to producing millions of customer invoices.

The use cases also include report generation (think Jasper reports, BIRT etc).

Basic functionality

There are 3 basic requirements that all document assembly systems address:

inserting data (variable replacement)

conditional inclusion of paragraphs or other units of content

repeat (eg of list items, table rows, or other units of content)

It is conditional inclusion and repeats which distinguish document production from mail merge.

There are several other common features:


inclusion/insertion of other documents (content reuse)

Document Formats

System process a template document, to generate one or more output documents.

Systems vary in the format used for their template documents. Some use Microsoft Word (or OpenOffice) documents, others use Word documents plus additional data files. Others require the user to adopt a proprietary vendor specific file format.

Interactive versus non-interactive

A major distinction is between systems which operate without human interaction (ie all the necessary data is read from some system and merged into the template), and those which obtain some of the data from a user (eg via a web form).

Many systems are capable of both interactive and non-interactive processing.

Systems which are partially or wholly non-interactive can vary in how they obtain their data. Common sources include SQL, web services (SOAP), and XML.

Authoring versus Production

Systems typically have authoring and run-time/production components.

The authoring component is used to create the template document; this can be thought of as a one-off operation. If the template document is in Word format, the authoring component will typically be a Word Add-In (with obvious benefits in terms of author familiarity/training). If the system uses a proprietary file format, it will typically have its own bespoke authoring environment, as well as conversion utilities (for converting Word documents into the vendor format).

The run-time/production component is then used whenever a document is required. It reads the template document, and creates an output document. Interactive systems often have a web front end. Systems which use Word documents for their templates generally also allow assembly from within Microsoft Word.

Output Formats

Most systems produce Word documents, and possibly PDF, HTML and other formats.

For systems which don't use Microsoft Word as their underlying document format, output fidelity is usually a major issue, particularly where the source document utilises some of the richer features of Microsoft Word.