This ebook comes with a scorecard. Download and print the PDF file to follow along.
We certainly live in a time of choice. Apple’s now-famous and trademarked “There’s an App for That” debuted almost 10 years ago, at a time when the iTunes Store contained a mere 60,000 apps and games—there are now over 3 million! Choice is good, right? Well, yes to a point. In the 1950’s a psychologist named William Hick developed a theory stating that the time it takes for a person to make a decision logarithmically increases as the number of possible choices he or she has increases. If you believe Hick’s Law applies to making decisions about business software, you can easily understand why it can be both tedious and stressful. The purpose of this paper is to help you break your problem/opportunity down into its most important components, remove the clutter and allow you to evaluate your options quickly and clearly.
Nearly all businesses create documents in electronic and/or print form. And many of them contain data or information that originates from some place other than the document itself. Some are designed for internal communication and analysis and others are designed for customers and prospects. There are simple examples like an email that contains an automated signature line. And there are much more complex examples that cover a range, from performance reports and product catalogs to contracts and financial statements. What they have in common is that they all dynamically assemble content from external sources and format it within a predefined template to create documents—they are all data-powered documents.
“They all dynamically assemble content from external sources and format it within a predefined template to create documents—they are all data-powered documents.”
The more repetitive or frequent this process becomes, the more important it is to consider systems to automate the tasks of design, assembly and output. Such systems often begin with the concept of a template. Templates can save a lot of time and reduce errors and inconsistencies by providing a common set of content, layout and formatting. As processes require more automation and flexibility, they may begin to include some level of integration with external sources of data, formatting and content. And finally, a fully automated system may do all of that plus autonomously manage output and distribution.
In this ebook, we’re going to focus on applications that include some degree of automation. This is where significant amounts of time, effort and costs come into play, and where your understanding of the options will have the greatest impact on your business outcomes.
Our challenge begins with a lot of fuzzy terminology and marketing speak used to describe equally fuzzy and overlapping software functionality. But we’re not on a mission to standardize the jargon. Rather, we’re going to explain some of the commonly used terms, features and applications, and highlight the practical similarities and differences between them.
The term “report” is at the root of much of the confusion. Traditionally, reports were assembled and printed documents. But more recently, reports are rendered in real-time to the screen for temporary consumption and often not considered to be documents at all unless they are exported or printed. That distinction is more perception than reality in that the screen renderings are technically HTML documents. Since the word “report” is commonly used to describe both screen and print documents we’ll acknowledge that here and won’t try to make any further distinction.
Applications for automating data output are so numerous and universal that they span virtually all sizes and types of industries. To get a big picture, we’ll start at a point where we first beg into see an underlying commonality. There are three broad software categories that together encompass the majority of use cases and applications: Document Generation, Business Intelligence and Database Publishing.
These three software categories each have an important business function, and each is accompanied by an entourage of software offerings with common core capabilities. But the categories themselves also overlap one another in a number of ways that include some capabilities, features, applications and roles.
The best way to understand data-powered documents may be to list some familiar examples. Here are a few of the most common document types and the systems that are typically used to generate them.
Sales & Transaction Reports
Data Navigation - Drill Down Reports
Quotes & Proposals
Invoices & Receipts
Catalogs & Brochures
Direct Marketing Mailers
Another great way to understand the systems that underlie data-powered documents is to understand the users who interact with them. Roles and titles vary from company to company and from industry to industry but there is a pattern. They all tend to involve a few different levels of technological skill depending on the area of interaction with the systems.
Business & Operations Users
Designers & Content Professionals
Print Production Specialists
Business Intelligence, or BI software, uses data to support tactical and strategic business activities. It’s a category that’s gotten a lot of attention in recent years. The concept is certainly nothing new. Accounting records dating back 7,000 years are known to exist. The BI buzz now centers around real-time information and predictive algorithms that are beginning to automate not just the collection and presentation of data, but even the recommended actions associated with the data.
“Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.”
“A set of methodologies, processes, architectures, and technologies that leverage the output of information management processes for analysis, reporting, performance management, and information delivery. Research coverage includes executive dashboards as well as query and reporting tools.”
“You’ll need to carefully consider the currency of the data you’ll need, the output formats your users will want, and the potential need for specialized data sources such as data warehouses.”
We find it interesting that some authorities exclusively include query and reporting tools in this research category while query tools are used extensively in other categories and beyond, and reporting is a term equally, or more often, associated with document output as it is business intelligence. Gartner does a good job of painting a picture of the BI landscape. They describe four bands including Reporting, Analysis, Monitoring and Predicting with each of those bands including a few additional subcategories. It also paints a slightly daunting picture of the many nuances of a fairly simple concept. Virtually all companies employ some degree of BI processes and systems even if they are not referred to as such. This is partly due to the fact that BI functionality is often included as part of a broader operational system, and the output is given other names like “dashboard,” “report,” “metrics,” “spreadsheet” or “KPIs.”
“BI and Analytics are very similar, but while BI emphasizes the data and presentation, Analytics emphasizes the insights and actions indicated by that data.”
An often used synonym for BI is “Analytics.” BI and Analytics are very similar, but while BI emphasizes the data and presentation, Analytics emphasizes the insights and actions indicated by that data. In either case, the output is known by the same set of names as mentioned above. BI is something that users have come to expect in most applications. It can be a valuable feature of an application or a stand-alone function. Once you determine the level of BI functionality needed in your application, you’ll need to carefully consider the currency of the data you’ll need, the output formats your users will want, and the potential need for specialized data sources such as data warehouses.
KEY FUNCTIONALITY: Business Intelligence
Of course, this simple set of functionality belies the complexity and breadth of features needed to deliver on these requirements. As of this writing, Wikipedia lists 64 companies in the Business Intelligence category. There are certainly more. And this doesn’t account for the many thousands (probably millions) of applications that include some degree of BI functionality in their otherwise categorized applications.
Document Generation is a term that’s not well known, but it describes a category of software arguably larger than the other two we’re comparing here. Consider the many documents that are part of your everyday routines: bills, receipts, account statements, personalized offers, or even your driver’s license. Then think of some of the business documents you deal with regularly, such as contracts and proposals, benefits statements, pay stubs, shipping labels, packing slips, and so on. This diverse array share the fact they are all data-powered documents assembled and generated by an automated system.
he systems that generate these documents are often hiding behind the curtain, or more accurately, embedded within other systems. They’re often referred to by the names of the specific output they generate; for instance, a receipt printer is a classic example of document generation, but never referred to as such. Some document generation systems are highly specialized while others have broad application. Some are part of a closed system while others are a part of a loosely integrated collection of systems. The configurations are as diverse as the documents they produce.
“The systems that generate these documents are often hiding behind the curtain, or more accurately, embedded within other systems.”
Doing a Web search for a definition of Document Generation immediately exposes the root of the confusion around this software category. There is no Wiki page or Webster’s definition. But rather there are definitions offered by companies that have an interest in crafting a definition around their particular feature set, as well as a few offered by software review sites that confuse the space by featuring “leaders” in their categories that may only demonstrate a fraction of the features they themselves list as core to the category.
Full Disclosure: It’s impossible for us to be unbiased due to the fact that we at Windward Studios consider our products to be reporting and document generation solutions. However, rather than piling on new definitions, we’ll focus on the features that are core to the category and highlight how different solutions may emphasize certain features over others.
Database publishing is an area of automated media production where paginated documents are dynamically assembled for the purpose of mass reproduction. The most common application is product catalog production, and another that some of us may still remember is telephone directories. Product catalogs often contain detailed product images, descriptions and pricing that are maintained and managed in a central database. Catalogs tend to be repetitive in terms of style and layout from item to item or page to page, and therefore benefit greatly from the use of templates that automate content assembly and formatting. This is especially true as the number of items and pages increases and as publishing frequency increases.
Consider the difference between producing a 20-page fashion catalog once a year versus producing a weekly grocery store sale flyer. The small fashion catalog would benefit from the use of reusable templates, but little time would be saved by automating the content collection from an external database. On the other hand, a weekly grocery flyer could be generated in a tiny fraction of the time it would take to create a new flyer every week if it were generated by a database publishing system. In the case of parts catalogs that may contain many thousands of items, database publishing is essential.
Database publishing systems are rarely a single stand-alone application. At a minimum, they are made up of a content repository, a layout and design environment and a reproduction system (printing press), with each of those systems often having several sub-systems. High-volume publishing is a production-heavy process usually involving a team of people and partners. That’s where workflow management and integration with other systems becomes crucial. Creative collaboration, approval and proofing cycles, file processing and pre-press operations are all tightly managed processes that must be coordinated to be efficient.
Database publishing is a relatively specialized area of the data-powered document world, but the systems and companies that serve this space are very well developed and highly efficient.
There are a few more categories that frequently mingle with the ones we’ve listed. We won’t go into detail about these but they’re worth mentioning:
Document Management: This type of software has roots in version control, backup, storage and retrieval. These systems can work with data-powered documents but are more typically the management of static documents.
Document Automation: This label covers some of the same territory as data-powered documents but leaves out the design, layout and interactive aspects of working with data-powered documents. It could be said that document automation is the back-end functionality of data powered document workflow.
Document Assembly: There are software companies that call their products document assembly solutions. They typically focus on dynamically assembling preexisting content elements. They also frequently include some database connectivity or mail-merge functionality and are therefore quite similar to document generation solutions. For the purpose of this paper, we’re considering document assembly a subset of document generation.
Document Creation: This label is somewhat misleading. It’s sometimes used as an alternative to document generation or document assembly but the word “creation” is too easily taken to mean a means of creation and doesn’t obviously imply the use of external data sources or automated systems.
Data Visualization – This software category is almost synonymous with BI and analytics as it emphasizes the display and interaction with data. Here we’ll consider it primarily a feature of the BI software category.
Mail-Merge – This too is more a feature than a category. It’s really a very simple example of lightly personalizing a printed or electronic document. Some sophisticated products exist for high-speed applications like bulk mailing, but we’ll consider it a subset of data-powered document production.
Web Page/Site Builder Tools – These share some of the features and characteristics of three main categories we’re discussing in this paper. The reason we’re not comparing these directly to the other three is that they are more typically classified as development tools, or in the case of CMS systems, such as WordPress and similar products, they are narrowly focused on blog-style content.
Electronic Signature – (not to be confused with digital signatures) this is specialized functionality that applies to the signing of legal and official documents as a replacement for handwritten personal signatures. It’s possible that this feature is part of an integrated system that includes document generation, but it isn’t a core capability of the document generation category.
You may have seen some popular graphics that feature an intentionally overwhelming number of technology categories. Sirius Decisions has one that lists over 130 distinct technology categories related to Sales, Marketing and Product. Chiefmartec.com has been producing a graphic dedicated solely to the Marketing space that now contains nearly 50 categories with~ 5000 logos stuffed into these categories. Categorization is a natural mental technique that helps us recall and understand information. And it’s useful in helping to cut away some of the clutter. But it’s not a prudent way to make software buying and architecture decisions when taken at its face value. We recommend using high-level categorization to help reduce what Alvin Tofler referred to as “over choice,” but keep your options open enough to consider what really matters:
A distinction is sometimes made between operational and analytical reports. Operational data is typically a very current snapshot of the state of business operations while analytical data often includes historical or periodic snapshots of data over time. Operational data output typically takes the form of a digital dashboard or occasionally a printed report while analytical data is often output to an interactive tabular format such as a spreadsheet, or in more sophisticated applications, a data visualization tool.
Many analytical reports model data and status changes over time. This type of report requires that historical data be collected and stored. This in turn often requires the presence of a special type of database known as a data warehouse, or more recently known as “big data.” It’s important to note that many business applications that have otherwise robust dashboard reporting capabilities do not inherently have the ability to model trends and patterns since they only store the current state of information rather than a history of how the data has changed over time. So, the functionality of BI systems can be quite dependent on the type of data sources they are integrated with.
Data analysis is primarily the domain of the BI category. Document generation systems can collect and output operational statistics to a point, but lack the more sophisticated visualization capabilities found in leading BI systems. And, there are applications where a database publishing system could be used to output an “analysis” document such as an annual report. But those applications are few.
If you need to be able to visualize and interact with business data, BI tools are the strongest option. Interaction is a distinctive feature of many BI tools. Purpose-built BI tools typically provide a robust tool set for constructing analytical reports that allow users to interact with data. While the other software categories re centered around output to static formats, BI tools often display their output on screen and allow users to modify the output by sorting, filtering, tallying, pivoting and drilling into various levels of detail or summary.
Interaction capabilities run a gamut from simple sorting and filtering to sophisticated modeling. If you need cutting-edge analytics and modeling capabilities you’ll need to be careful to select a BI tool that specifically offers these advanced features—not all do.
Document generation software straddles the space between BI and database publishing. BI tools often have the ability to output a document to the screen, or export a file as a PDF or tabular format, but they do so one-at-a-time with a single document. On the other hand, database publishing systems are built to dynamically assemble content and then output a single resulting document for mass reproduction, such as a printed catalog. Document generation systems excel in their ability to output many unique documents either one-at-a-time on demand, or serially in a batch. Consider the examples of a printed theater ticket and monthly bank account statement. A single ticket is printed on demand when a customer chooses a seat at the box office, while a bank will generate, print and mail unique statements to all of its customers on a particular day each month.
The ability for a system to automate document creation and dynamically assemble preexisting elements is fundamental to all systems that support data-powered documents. This capability alone can save businesses many hours of searching, copy/pasting, writing and editing common documents. It also provides much-improved consistency and error reduction. There are some types of applications that focus primarily on just this capability. They are typically designed to collect “chunks” of preexisting content based on some conditional logic and then assemble them in a pre-formatted template for output,usually one at a time or in small quantities. Some examples include sales proposal builders, regulatory reports, and benefits statements.
Some applications restrict access to certain components of a document while allowing users to edit other parts. When document requirements are heavy with boilerplate text and when editing is forbidden or restricted, document generation systems will excel. This is often the case with legal and government regulated documents. Database publishing systems can meet some of these requirements but since these types of documents are intended to be ad-hoc and one-off, document generation systems are better suited.
The ability to connect document templates to data sources is common to the three categories but the type and location of the data sources can vary widely. In a closed system, the content may be available locally within the application, while in loosely integrated systems the data may reside in multiple locations and may have very specific requirements for the retrieval of the data. Data and content are most often stored in a database, but sometimes in a file or document-based system. Connecting to databases and navigating complex relational table structures can be a highly-technical task. Some BI tools and document generation solutions provide a layer of functionality to help deal with this and could be a key consideration depending on the level of technical expertise of your users.
While we’re on the subject of data sources, it’s important to consider the security implications of connecting to external data sources that may contain sensitive information. The administrator of the data source will typically control access to the data and only allow the appropriate level of access to the system, but the nature of the data itself may dictate that the editing, viewing and output systems be appropriately secure.
As mentioned, all of our three categories feature solutions that are adept at connecting to external data sources. Because BI and document generation have such broad application, the solutions in these categories tend to have correspondingly broad integration capabilities. Compatibility is key, but performance is also a consideration as some data formats are more optimized for performance than others. If at all possible, you’ll want to avoid forcing compatibility by translating data from one format to another.
These may be the features that distinguish the software categories from one another more than any other. It’s also an important area of differentiation within each category. Generally speaking, BI design tools excel at displaying data in the form of charts, graphs and tables, but allow only the most basic layout of panels on a page with minimal control over style. On the other end of the spectrum, the design environments of database publishing tools are built to provide maximum control and freedom over layout and style but typically lack even simple data visualization capabilities like charts and graphs. Document generation design environments tend to be built around the format and features commonly found in office documents and therefore either mimic or build upon standard word processors, spreadsheet tools or presentation applications. These environments are typically quite capable of handling data as well as layout and style—to a point. They are somewhat less specialized and may lack a few of the most sophisticated capabilities of their counterparts. But what they lack in specialization, they make up for with broad functionality and familiarity.
It’s worth noting that not all systems include design environments. In closed systems where the data and the document formats rarely or never change, the developer may programmatically generate the output by “hard-coding” the format parameters into the system and bypass need for an end-user design environment altogether.
Virtually all document systems make use of templates to help automate the application of format, style and layout. Templates are fundamental to all of the categories we’re discussing, but there are a few capabilities hat set some solutions apart. Document generation and database publishing systems are both built to work with multi-page documents. They allow designers to select and apply a variety of style templates to individual pages or groups of pages within a single document. Some include pre-formatted component libraries and “smart” templates that apply specific formatting to special pages such as cover pages, tables of contents, new sections, etc. And some of the most advanced allow the template creators to lock specific components and regions of a document to enforce consistency.
Many data-powered documents depend on conditional logic to assemble the appropriate content under varying circumstances. For instance, an investment proposal that includes a break-even timeline would have to dynamically calculate and format the document content based on the particular investments that were being recommended to the client. It’s possible to perform conditional logic at several different points in the process. It could be done within the database itself, which may be desirable if you wish to store the calculated output. It could be done with custom application code executed between the database and the document generation system, or with a more advanced system, it could be performed on-the-fly within the document generation code. The later may be the only possible solution if content formatting is also conditional.
It may be preferable to input the conditions that drive the content assembly in real-time, and only store them temporarily for the purpose of generating a single document. Consider a sales proposal application where a salesperson may input a few pieces of information via a form that will dictate the contents of the proposal—details such as the product(s) being proposed, delivery timeline, discounts and so on. Some document generation systems include input capabilities and are ideal for applications where the conditional logic is dependent on ad-hoc input. Carefully consider the logic that will drive the assembly, manipulation and formatting of the content that will be included in your documents. A well-designed solution can save countless hours of design, coding, and administrative effort.
Personalized content is everywhere, maybe even a little too much so. But the ability to add personalized content to documents is essential for many types of data-powered documents such as account statements and medical test results. Sales contracts and proposals may also contain personalized content. Studies indicate that personalization results in a lift in business results, so it follows that this may be an important consideration even if it’s not absolutely required.
If you need to view and work with data that’s very current or even real-time, then BI solutions are probably your best bet. They are primarily designed for the transitory viewing of data. They often have only basic output capabilities, but because the freshness of the data is so important to business intelligence, output to static formats defeats one of the most important purposes of the BI category.
Document generation systems can collect and output data that is just as current as any BI tool but the applications for real-time data output to print formats are relatively few. Some examples could be found in logistics operations in the form of pick tickets and packing slips.
Many data-powered documents contain multiple pages. Some contain just a few while others could be hundreds of pages in length. There are several special considerations when dealing with multi-page documents. One would be the ability to automate page and section numbering, or for a table of contents to be automatically generated and linked if it’s an interactive document. Another important feature is the ability to dynamically flow content from one page to another. Variable data size and length poses a challenge to even the most robust solutions. The most capable tools have the ability to fit or flow text to columns and new pages as well as being able to split tables across pages and include header rows at the beginning of each new section.
Printing is the most common form of reproduction but it’s also possible to distribute data-powered documents as digital files. Setting up documents or high-volume print production comes with specialized requirements for color management, trim and bleed margins, and sometimes page form (signature) layouts. This again underscores the highly specialized nature of database publishing systems in contrast to the more general requirements for document generation and BI output.
As mentioned earlier, some basic BI functionality is often built into applications whose main function is not exclusively BI. For example, Facebook’s built-in Activity Log displays a list of all the user’s activity with a couple of filters for time period and type of activity. This is simple BI functionality but Facebook is not a BI tool. There are, however, numerous purpose-built BI add-ons for Facebook that provide much more robust functionality. This is the case with many applications and should be a consideration when designing an application: how much BI functionality to build into the core application and how much to leave to external systems and processes.
Workflow is another broad term that can mean different things in different contexts. Where documents are concerned, workflow typically means collaboration, approvals, routing, scheduling and sometimes version control. Workflow is all about saving time by automating as many of the human processes as possible. Database publishing systems typically involve the most process steps and the largest teams and therefore excel in workflow management support. Document generation systems cover a wide gamut in this area as well. Since they are often part of office systems and processes, they too may include all of the above-mentioned features. However, they’re equally often embedded within closed systems and require no workflow management capabilities at all. When implementing data-powered document capabilities, always consider how the documents will flow through your processes and integrated systems.
The purpose of this document is to demystify data-powered documents and to help business users and developers make better clearer decisions when choosing tools for their business. To that end, we hope this paper has helped get you past some of the confusing terminology and focus on the task(s) at hand.
Automated data-powered document systems can provide huge savings for companies that are able to choose the right systems for the right tasks and take advantage of their full capabilities. It’s easy to see that there is a significant amount of overlapping functionality across these categories. But it also shows that BI and database publishing solutions are more specialized while document generation solutions cover a wider range of applications.
The type of documents you work with will be the primary driver of the systems you’ll want to use to manage and automate your data-powered document production. A close second is the type and location of the data sources you’re going to be accessing. Once you’ve chosen the right type of system, your next consideration will be the exact features that your developers, end-users and support staff will need. We hope you feel better equipped to make your decisions after reading this ebook!
If you've just discovered us, we're excited. Try Windward with our 30-day free trial and start creating documents in quick time with our low/no code solutions.
Document Automation (also known as document assembly) is the design of systems and workflows that assist in the creation of electronic documents. These include logic-based systems that use segments of preexisting text and/or data to assemble a new document.
Document Generation is the process of creating hundreds, thousands, or even millions of personalized and distinct documents for internal or external use from a single template. While Document Generation is a subset of automation, for some products (not all) you can’t get just the Document Generation component of a Document Automation solution.
Reporting Software is a subset of Document Generation. Reporting software can’t do documents. But Document Generation software easily creates reports.
Tags are elements placed in the automation documentation template (DOCX, PPTX, XLSX) that the docgen system acts on when generating a document. These tags can be data to insert, business logic rules to conditionally display or suppress content, and much more. Each vendor has their own term for “tags.”
Going forward, the word docgen will be used to stand for Document Generation system in this guide. When something is template based Document Automation system, the word docauto will be used.
Every modern docgen product uses Microsoft Office as the template designer. While you can find a few very old products that have their own designer, you want to limit your consideration to those built on Office as it is far superior.
Some document generation solutions work with Word, Excel, & PowerPoint while others are Word only. If you need Excel & PowerPoint, then obviously, go with a solution that supports them too. If you only need document automation tools using Word, think carefully if you might want Excel or PowerPoint someday in the future.
Again: if you go with a Word document automation solution, be very sure you won’t ever want Excel or PowerPoint. Ever!
The docgen solutions that have a separate addin or no add-in can usually work with any Word processor that can save as a DOCX file. It all tends to work exactly the same. For a full Word clone, this can work every bit as well.
Google Docs in this case though tends to be problematic because Google Docs does not have the layout and formatting capability of Microsoft Word. Not even close. Your limit here is not the docgen app; it’s Google Docs. For most use cases, Google Docs is not up to the job.
Some docgen solutions include an add-in to help you place & edit the tags in the template. These come in two flavors; one much better.
First, some automated document creation solutions have no add-in to assist in crafting tags. You usually end up with notepad open where you write all the various tags and you copy from there and paste into Word. And for special uses, you type in, from memory or other notes, the additional properties.
This “no add-in” approach is slow, painful, & error prone. If you have 5 templates, each with 5 tags – then no big deal. But if every month you’re creating 100 templates, each with 150 tags, you’re now in hell.
While Windward can legitimately claim to be a "no Add-In" solution for designing on platforms other than Windows - we find that approach so inferior, we state that we cannot be used for this use case.
We prefer to not get your business rather than provide you a significantly inferior approach.
Not only is it slow & expensive, but because it is a death march, designers will not put in the effort to make a business document template shine. They just want to be done.
The second approach (much better) is a second application (usually in a browser) that helps you write tags. You still have to copy & paste between this second app and Word, but the add-in provides all possible choices for tags and helps you write your queries.
Not all the side-by-side add-in approaches are the same. Play with each carefully to see how it works for you; not in simple sample cases, but in the more complex document templates you will need to create.
The third approach (best) is an add-in that becomes part of Word; adding additional tabs to the ribbon. This makes adding and revising tags a breeze because it works on the tag in the template. And while helping to write each tag, it can do so in the context of where it is in the template.
The incorporated add-in approach is by far the best in template based document generation. But by definition, it is limited to Office on Windows.
This add-in is one of the two features (the query wizard below is the other) that determines how much time your team will spend to design document templates, day after day, week after week, year after year. If one approach is 15 seconds longer, and you are going to create 500 templates each with just 35 tags (that’s low), that’s 73 hours.
While all the Document Generation solutions require you write code to call them (docauto is a no-code solution so not an issue), some of them require additional code for each template. This is called “code behind.”
In some cases, this code behind is defining different data specifications, such as you now also need the hire date. For these solutions, you don’t need code for each template, but a fair number of times templates will require additional data, or data ordered differently, and you have a code change.
Even worse, some require code behind for each template. Therefore, each new template based document generation means additional code. This is a giant hit.
Why? First you have programmers involved in template design. That’s expensive and slows the process down. Second, each new template requires rebuilding your application and pushing it through test & staging.
The one advantage to code behind is the developers can build data on the fly as it’s needed, including data generated according to business rules within the code. But in almost all cases, doing so directly in the template, as opposed to in the code behind, is superior.
In other words, you want the template file to be everything.
1. How do you create a doclet?
The best solution is to select content in Word and save that as a doclet. If it's more restrictive than this, will those restrictions stop you from creating very useful doclets?
2. Does it bring the full formatting of the doclet into the document it is dropped into?
This is actually a very hard thing to do in Word if the doclet uses styles that exist in the template with the same name - but different settings.
3. What can be saved?
Just template content? Or can you also save datasources, parameters, and more? This is not as important, but it is still a timesaver.
4. After you drop is it complete? Or do you need to perform additional steps? For example, if a doclet uses a different datasource, is that datasource now also tied to the template?
Not that important, but nice to have.
5. Can doclets in a template be updated?
If a doclet is the company logo and the logo changed, can all the templates using that doclet be updated to the new logo universally?
The dropped doclets come in several flavors. The optimum are linked doclets where the content of the doclet is displayed in your template in full, fully laid out and formatted. And as it is linked, when the doclet itself is revised, that change immediately appears in your template and is used in every generated document.
Once you drop a doclet into your template, you can can adjust it any way you wish from formatting to tags in the content. But if the original doclet is changed, that change is not applied in your template. In some uses this is preferable when you don’t want changes applied to existing templates.
The third approach is there is a tag that will import the doclet. You don’t see the contents of the doclet in your template, but when the template is processed, it will pull the live copy of the doclet. This is valuable when you have a select that will determine which doclet to import. This is useful for cases like you need to pull in content based on the State the recipient of the document lives in.
The optimum of course is to have all three flavors available to use each as appropriate.
Your most common activity creating templates will be writing the queries to select the data. You do this to select blocks of data such as all stocks you hold for a portfolio statement. You also do this for conditional logic in the template such as adding insurance requirements for an offer letter if they reside in California. Or when placing a name in loan papers.
Some docgen products do not have query wizards. With no wizards, then template creation is a developer-only task. And for developers, it will be slower. No wizards mean you can never turn template creation over to business users.
You will do this hundreds of times in complex templates. Thousands of times across all the templates. You want this to be quick & easy. This functionality, more than everything else put together, determines how much time you will spend designing templates, and how pleasant it is.
When you evaluate different document creation automation solutions, have a business user use the system to craft the queries and see how well they do. They’ll be slow & hesitant at first. But it’s key to see if they can learn it and then be successful on their own.
In the case of conditional tags (if, switch, etc.) make sure it also works well on elements returned by other tags (usually the iterative tags). Because in this case, it’s not a query of the data, it’s a condition on data already returned.
Finally, keep in mind that no matter how brilliant the query wizards are, the user will also generally struggle with the structure of the data (the metadata). This can be displayed to the user, but they still need to learn what is where. Reducing what metadata is displayed, providing the descriptions for each node in the metadata, etc., can make the difference between a usable and unusable solution for business users.
If you have a single datasource, then skip this section – you don’t care.
Ok, you have multiple datasources, for example Salesforce & Marketo. And you have documents you want to populate with data from each. In this case you must get a docgen solution that lets you have tags in a single template that are marked for which datasource that tag is to be applied to.
Some automate document generation providers implement this in two passes: First applying all the Salesforce tags and then starting over and applying all the Marketo tags. This works fine if you are not intermixing the data.
Sometimes you need to intermix the data: for example, if your document lists all Account Executives (from Salesforce) and then within the data for an AE it lists the emails they were sent (from Marketo). Then you need a solution that processes all datasources simultaneously.
If you have multiple datasources, you almost certainly will eventually need the best automated document assembly software that processes multiple datasources simultaneously. If it’s not a must-have today, it probably will be a must-have in a year.
Some tags have a start and end location, such as the if and forEach (iterative) tags. Generally, these are used to repeat or conditionally include a row in a table or a paragraph of text. All solutions do this.
But as time goes on and you create more advanced & complex templates, you will find yourself wanting to start the iteration in the middle of a table or an if that removes two cells and adjusts the table correctly.
In addition, you almost certainly will need a forEach (iterative) tag that adds columns in a table, as opposed to rows. You may want a column for each product or each month in a dataset. Finally watch out for any limitations on combinations. At the start you need a single forEach tag. A year later you are nesting five forEach tags within each other as it’s the only way to get what you want.
This is an area where it’s impossible to give guidance on what you may someday need. Your best bet is to select a solution that has no limitations on the start & end location.
For a simple template, this doesn’t matter (much). But as the logic expands in a template, you find that you are adding a lot of control tags. The most common are the iterative (forEach) and conditional (if) tags. But even a moderately complex template will also have numerous query and set tags along with several additional tags.
These tags, if displayed, pollute the template and enlarge the layout in the template. Usually you’ll find the template looks quite different from the final generated report. This makes it difficult to truly imagine the final document from the template. It’s frustrating to have to constantly run test documents to see what you’re going to get.
You’ll be much happier if the designer can at the click of a button hide or show the control tags. Show them when you’re working on the template logic. Hide them when you’re working on the final layout and formatting. This option will save you time and more importantly will make the design experience more pleasant.
The best way to use content across multiple templates is to have that content in a child template that the parent templates all import. These imported templates can be brought in as an explicit filename or as a data query that returns the filename.
Trust me: unless your needs are incredibly simple, you need this. You can work around it even if you repeat the same content in 100 templates, but you’re giving yourself too much extra work when wording changes due to company directives or legislation.
One critical detail on imports: Does the system process tags in the imported child template? If all of your child templates are static text (legal clauses), then this does not matter. But if you need to include anything live (a person’s name, a date, a state of residence), then you need a solution that process tags in the imported child template.
Finally, for Word only, how does it handle style mismatches? If the parent has the Normal style set to Times New Roman 12pt and the child has Normal set to Verdana 10pt, then what should the child paragraphs be styled as? This can be a royal pain because different users never have their styles matching.
Some systems convert the child to the parent formatting. Some retain the child formatting. And some (best solution) give you the option of either. The option is best but if it’s forced one of the two ways, make sure the system you get works that way.
Not having the expected styling on output is guaranteed to get upper management upset.
For the solutions that allow queries in the tags, you want one that also supports complex functions operating on the data. And not just simple functions like SUM() and COUNT() but most of what’s available in Excel. You will use Text and DateTime a lot.
In addition, can you add your own functions? Adding custom functions is often a significant component of providing a simple & easy design experience to business users. It’s also a lot safer. For complex calculations you write it once in the function and test it carefully. No worries about someone screwing it up writing it by hand in a template.
All of the products (I believe) support reading files from BASIC, Digest, Negotiate, & Oauth2. But what about a special Authenticate & Authorize you created in your company for one set of files? Or something special to get to a JSON file from a REST service that is home grown?
First off, make sure the solution supports the standard protocols you use. You should get a yes. And if that’s all you have – fantastic; you can skip to the next section. If you have a home-grown A&A. find out what needs to be done to have the system access it. This is a custom Access Provider. And make sure that the same Access Provider is used for reading data files (XML & JSON), accessing OData, and importing files (templates & pictures).
If you want to create DOCX or XLSX files where an employee can then edit parts of it, this is incredibly valuable. For example, you are generating portfolio statements and the legal disclaimers and actual financial results must not be changed, but there is a paragraph where the financial advisor can write up more summarizing the performance.
In this case, some of the solutions will carry document locking in DOCX & XLSX (PPTX does not have this) over to the output. So, if the template has locked all except one paragraph, then the generated DOCX will be locked except for that one paragraph.
Having the document locking functionality tends to make your lawyers very very happy. It eliminates a source of serious legal liability.
What is provided here is all over the board. And it’s difficult to get specific about what is most useful to you, as opposed to the next person. The best advice here is just look at what they have and try it out when evaluating.
One tool is validating a template. Not running it, but inspecting it and providing information on errors found. A second tool is to generate the document and deliver a list of errors and warnings. For example, if some content is placed off the page, it was rendered but you don’t see it. In this case it’s useful to have a listing of content off the page.
In this category you can include tag settings - what to do if a select fails, returns nothing, etc. Some of these are particularly useful but in other cases, you can find yourself investing more time than it’s worth.
What if you are generating portfolio statements using a Word template? It has descriptive text, a chart showing performance, legal disclaimers, etc. But where it has a table showing the actual numbers, you want to place an embedded spreadsheet with the numbers.
Why? Because this way the recipient can open that spreadsheet and then, using Excel, measure that data any way they want. It’s a much-improved portfolio statement and something that makes the recipient go WOW.
If you want this, verify that the document automation vendors you select not only carries embedded objects to the output, but that the embedded object, if a DOCX/PPTX/XLSX file, has tags in it processed. To make good use of this functionality the embedded object must be treated as a live template, not a static document.
If fully implemented, the output to any format, such as PDF, will include the displayed embedded object.
This is a DOCX -> PDF issue. Do you need to have form fields in the DOCX such as drop down, list or check box become the equivalent thing in PDF output? If so, you need to verify that this feature is supported.
In addition, make sure that the initial content/value in the form field can be set from data. If it’s just static values from the template, that tends to not be sufficient for all use cases.
And a suggestion. When you need an empty or checked box depending on data, don’t use a form field. Use the Wingdings characters and .
This is two XLSX -> XLSX issues. First, verify that a formula like SUM(D5:D5) expands to SUM(D5:D15) for the case where the row 5, inside an iterative loop, becomes rows 5 to 15. It’s very useful to have the formula adjusted (some products just write the literal value) on the output. This way, when someone adjusts say D7 to see what happens, all the formulas now adjust to that difference.
The same for pivot tables. If a pivot table is for D1:H5 and the generated XLSX now has those rows as D1: H125, the pivot tables are adjusted to match. This is necessary to use the pivot tables in the generated XLSX.
If you’re going to generate XLSX for Excel Power Users, this is key.
This is not an issue for docauto, just document generation.
There are three ways to call a docgen engine: Direct calls to a library, calls to a RESTful server on premises, and calls to a hosted (SAAS) RESTful server. Ask if they have what you want.
One note on Hosted solutions: You will be sending data to that system. First, you want to make sure that the vendor is providing adequate security. Second, if your data is not allowed to go outside your country or region (E.U.), find out not just where the default server is, but also the failover server.
If you’re concerned enough about security to be asking these questions, you should probably host the RESTful server yourself. Even if you place it on AWS or Azure, you are controlling access to the server and its location.
If all your data is JSON (or any other type), you don’t have to worry about what else the system can access. With that said, everything is getting more interconnected and odds are sooner or sooner you will have to access other datasource types.
Life is a lot safer if the solutions can use data from SQL, XML, JSON, & OData. (And why OData? 150 other vendor’s datasources, from ACT to Salesforce to Zoho.) Not a deal breaker but it will turn out to be useful.
See if you can create datasets from datasources. This is akin to views in SQL but you are creating them in the template (no DBA needed). And you want them for XML, JSON, & OData too. A good guide to how robust the dataset implementation is–do they basically become another datasource? If so, that’s a full implementation.
Furthermore, it can take time and bandwidth to download the metadata from a datasource. We saw one DB2 database take 28 minutes to download the full metadata (yes – truly!). If you have datasources with large metadata structures, find out if they have a way to read the schema once and reuse that. (This is unlikely to ever be needed for XML or JSON–it’s SQL, OData, & any custom datasources.)
Finally, for XML, make sure it uses the XML schema if one is available.
Check that it renders in the output formats you need. Everyone does PDF, HTML, DOCX, XLSX, & PPTX (last two if they support that template type). Additional output formats might be useful, but odds are you’ll never need them.
Check the accuracy of the PDF output. Everyone is imperfect on this. And in their, and our, defense, Microsoft does not document how Word calculates page layout. It does not specify the calculation between 2 lines of single-spaced text. And it’s impossible to reverse engineer accurately–Word is clearly performing complex calculations, not just using the font metrics.
Everyone does their best. Some come closer than others. Look for a good match but accept it won’t be perfect.
All products have a way to pass parameters to the template to use in the queries. Check that they have all the data types you need (they probably do).
Check that parameters can be set in a select as both a parameter (avoid injection attacks) and as a string substitution if desired. Setting as a parameter is valuable not only to avoid an injection attack, but to handle the cause of passing the name O’Malley.
Does the designer have a way to show the structure of the tags in the document? And clicking on one, go to that tag? There is no need for this in simple templates. but when you get to 30+ tags it becomes useful. And at 80+ it becomes essential.
If you’ll always be under 50 tags, no big deal. But if you start under 50 tags and will grow to 200+ tags in a template, not having this will become a big deal. So think about where you’ll be in 5 years.
If you run a template and it takes forever, or it completes but it’s 2,00 pages long when you expected 2 pages – why? You can ask a DBA and they can track your selects and tell you the problem.
It’s faster & easier if the template add-in has a tool that tells you for each iterative select how many rows of data it returns and how long the query took to complete. From this you can quickly find what is wrong.
Useful, not essential.
This is used once and saves at most 15 minutes - but it is very nice to have. This is irrelevant for the solutions that have code behind – they create code for each template.
For the one-time code to illustrate what code is needed to add to your application to use the docgen system, it’s ideal if they include a generate code feature that provides you sample code.. And in addition, you know the correct way to call the engine.
Nice, not essential.
Fortunately, these are rarely needed. But when needed, they can be a big time saver. There are several different debuggers that may be in a docgen template designer add-in.
As stated above, these are rarely needed so they're in the "useful but not important" category - except that one time you really really need it.