This ebook comes with a scorecard. Download and print the PDF file to follow along.
We certainly live in a time of choice. Apple’s now-famous and trademarked “There’s an App for That” debuted almost 10 years ago, at a time when the iTunes Store contained a mere 60,000 apps and games—there are now over 3 million! Choice is good, right? Well, yes to a point. In the 1950’s a psychologist named William Hick developed a theory stating that the time it takes for a person to make a decision logarithmically increases as the number of possible choices he or she has increases. If you believe Hick’s Law applies to making decisions about business software, you can easily understand why it can be both tedious and stressful. The purpose of this paper is to help you break your problem/opportunity down into its most important components, remove the clutter and allow you to evaluate your options quickly and clearly.
Nearly all businesses create documents in electronic and/or print form. And many of them contain data or information that originates from some place other than the document itself. Some are designed for internal communication and analysis and others are designed for customers and prospects. There are simple examples like an email that contains an automated signature line. And there are much more complex examples that cover a range, from performance reports and product catalogs to contracts and financial statements. What they have in common is that they all dynamically assemble content from external sources and format it within a predefined template to create documents—they are all data-powered documents.
“They all dynamically assemble content from external sources and format it within a predefined template to create documents—they are all data-powered documents.”
The more repetitive or frequent this process becomes, the more important it is to consider systems to automate the tasks of design, assembly and output. Such systems often begin with the concept of a template. Templates can save a lot of time and reduce errors and inconsistencies by providing a common set of content, layout and formatting. As processes require more automation and flexibility, they may begin to include some level of integration with external sources of data, formatting and content. And finally, a fully automated system may do all of that plus autonomously manage output and distribution.
In this ebook, we’re going to focus on applications that include some degree of automation. This is where significant amounts of time, effort and costs come into play, and where your understanding of the options will have the greatest impact on your business outcomes.
Our challenge begins with a lot of fuzzy terminology and marketing speak used to describe equally fuzzy and overlapping software functionality. But we’re not on a mission to standardize the jargon. Rather, we’re going to explain some of the commonly used terms, features and applications, and highlight the practical similarities and differences between them.
The term “report” is at the root of much of the confusion. Traditionally, reports were assembled and printed documents. But more recently, reports are rendered in real-time to the screen for temporary consumption and often not considered to be documents at all unless they are exported or printed. That distinction is more perception than reality in that the screen renderings are technically HTML documents. Since the word “report” is commonly used to describe both screen and print documents we’ll acknowledge that here and won’t try to make any further distinction.
Applications for automating data output are so numerous and universal that they span virtually all sizes and types of industries. To get a big picture, we’ll start at a point where we first beg into see an underlying commonality. There are three broad software categories that together encompass the majority of use cases and applications: Document Generation, Business Intelligence and Database Publishing.
These three software categories each have an important business function, and each is accompanied by an entourage of software offerings with common core capabilities. But the categories themselves also overlap one another in a number of ways that include some capabilities, features, applications and roles.
The best way to understand data-powered documents may be to list some familiar examples.Here are a few of the most common document types and the systems that are typically used togenerate them.
Sales & Transaction Reports
Data Navigation - Drill Down Reports
Quotes & Proposals
Invoices & Receipts
Catalogs & Brochures
Direct Marketing Mailers
Another great way to understand the systems that underlie data-powered documents is to understand the users who interact with them. Roles and titles vary from company to company and from industry to industry but there is a pattern. They all tend to involve a few different levels of technological skill depending on the area of interaction with the systems.
Business & Operations Users
Designers & Content Professionals
Print Production Specialists
Business Intelligence, or BI software, uses data to support tactical and strategic business activities. It’s a category that’s gotten a lot of attention in recent years. The concept is certainly nothing new. Accounting records dating back 7,000 years are known to exist. The BI buzz now centers around real-time information and predictive algorithms that are beginning to automate not just the collection and presentation of data, but even the recommended actions associated with the data.
“Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies include reporting,online analytical processing, analytics, data mining, process mining, complex event processing,business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.”
“A set of methodologies, processes, architectures, and technologies that leverage the output of information management processes for analysis, reporting, performance management, and information delivery.Research coverage includes executive dashboards as well as query and reporting tools.”
“You’ll need to carefully consider the currency of the data you’ll need, the output formats your users will want, and the potential need for specialized data sources such as data warehouses.”
We find it interesting that some authorities exclusively include query and reporting tools in this research category while query tools are used extensively in other categories and beyond, and reporting is a term equally, or more often, associated with document output as it is business intelligence. Gartner does a good job of painting a picture of the BI landscape. They describe four bands including Reporting, Analysis, Monitoring and Predicting with each of those bands including a few additional subcategories. It also paints a slightly daunting picture of the many nuances of a fairly simple concept. Virtually all companies employ some degree of BI processes and systems even if they are not referred to as such. This is partly due to the fact that BI functionality is often included as part of a broader operational system, and the output is given other names like “dashboard,” “report,”“metrics,” “spreadsheet” or “KPIs.”
“BI and Analytics are very similar, but while BI emphasizes the data and presentation, Analytics emphasizes the insights and actions indicated by that data.”
An often used synonym for BI is “Analytics.” BI and Analytics are very similar, but while BI emphasizes the data and presentation, Analytics emphasizes the insights and actions indicated by that data. In either case, the output is known by the same set of names as mentioned above. BI is something that users have come to expect in most applications. It can be a valuable feature of an application or a stand-alone function. Once you determine the level of BI functionality needed in your application, you’ll need to carefully consider the currency of the data you’ll need, the output formats your users will want, and the potential need for specialized data sources such as data warehouses.
KEY FUNCTIONALITY: Business Intelligence
Of course, this simple set of functionality belies the complexity and breadth of features needed to deliver on these requirements. As of this writing, Wikipedia lists 64 companies in the Business Intelligence category. There are certainly more. And this doesn’t account for the many thousands (probably millions) of applications that include some degree of BI functionality in their otherwise categorized applications.
Document Generation is a term that’s not well known, but it describes a category of software arguably larger than the other two we’re comparing here. Consider the many documents that are part of your everyday routines: bills, receipts, account statements, personalized offers, or even your driver’s license. Then think of some of the business documents you deal with regularly, such as contracts and proposals, benefits statements, pay stubs, shipping labels, packing slips, and so on. This diverse array share the fact they are all data-powered documents assembled and generated by an automated system.
he systems that generate these documents are often hiding behind the curtain, or more accurately, embedded within other systems. They’re often referred to by the names of the specific output they generate; for instance, a receipt printer is a classic example of document generation, but never referred to as such. Some document generation systems are highly specialized while others have broad application. Some are part of a closed system while others are a part of a loosely integrated collection of systems. The configurations are as diverse as the documents they produce.
“The systems that generate these documents are often hiding behind the curtain, or more accurately, embedded within other systems.”
Doing a Web search for a definition of Document Generation immediately exposes the root of the confusion around this software category. There is no Wiki page or Webster’s definition.But rather there are definitions offered by companies that have an interest in crafting a definition around their particular feature set, as well as a few offered by software review sites that confuse the space by featuring “leaders” in their categories that may only demonstrate a fraction of the features they themselves list as core to the category.
Full Disclosure: It’s impossible for us to be unbiased due to the fact that we at Windward Studios consider our products to be reporting and document generation solutions. However, rather than piling on new definitions, we’ll focus on the features that are core to the category and highlight how different solutions may emphasize certain features over others.
Database publishing is an area of automated media production where paginated documents are dynamically assembled for the purpose of mass reproduction. The most common application is product catalog production, and another that some of us may still remember is telephone directories. Product catalogs often contain detailed product images, descriptions and pricing that are maintained and managed in a central database. Catalogs tend to be repetitive in terms of style and layout from item to item or page to page, and therefore benefit greatly from the use of templates that automate content assembly and formatting. This is especially true as the number of items and pages increases and as publishing frequency increases.
Consider the difference between producing a 20-page fashion catalog once a year versus producing a weekly grocery store sale flyer. The small fashion catalog would benefit from the use of reusable templates, but little time would be saved by automating the content collection from an external database. On the other hand, a weekly grocery flyer could be generated in a tiny fraction of the time it would take to create a new flyer every week if it were generated by a database publishing system. In the case of parts catalogs that may contain many thousands of items, database publishing is essential.
Database publishing systems are rarely a single stand-alone application. At a minimum, they are made up of a content repository, a layout and design environment and a reproduction system (printing press), with each of those systems often having several sub-systems.High-volume publishing is a production-heavy process usually involving a team of people and partners. That’s where workflow management and integration with other systems becomes crucial. Creative collaboration, approval and proofing cycles, file processing and pre-press operations are all tightly managed processes that must be coordinated to be efficient.
Database publishing is a relatively specialized area of the data-powered document world, but the systems and companies that serve this space are very well developed and highly efficient.
There are a few more categories that frequently mingle with the ones we’ve listed. We won’t go into detail about these but they’re worth mentioning:
Document Management: This type of software has roots in version control, backup,storage and retrieval. These systems can work with data-powered documents but are more typically the management of static documents.
Document Automation: This label covers some of the same territory as data-powered documents but leaves out the design, layout and interactive aspects of working with data-powered documents. It could be said that document automation is the back-end functionality of data powered document workflow.
Document Assembly: There are software companies that call their products document assembly solutions. They typically focus on dynamically assembling preexisting content elements. They also frequently include some database connectivity or mail-merge functionality and are therefore quite similar to document generation solutions. For the purpose of this paper, we’re considering document assembly a subset of document generation.
Document Creation: This label is somewhat misleading. It’s sometimes used as an alternative to document generation or document assembly but the word “creation” is too easily taken to mean a means of creation and doesn’t obviously imply the use of external data sources or automated systems.
Data Visualization – This software category is almost synonymous with BI and analytics as it emphasizes the display and interaction with data. Here we’ll consider it primarily a feature of the BI software category.
Mail-Merge – This too is more a feature than a category. It’s really a very simple example of lightly personalizing a printed or electronic document. Some sophisticated products exist for high-speed applications like bulk mailing, but we’ll consider it a subset of data-powered document production.
Web Page/Site Builder Tools – These share some of the features and characteristics of three main categories we’re discussing in this paper. The reason we’re not comparing these directly to the other three is that they are more typically classified as development tools, or in the case of CMS systems, such as Wordpress and similar products, they are narrowly focused on blog-style content.
Electronic Signature – (not to be confused with digital signatures) this is specialized functionality that applies to the signing of legal and official documents as a replacement for handwritten personal signatures. It’s possible that this feature is part of an integrated system that includes document generation, but it isn’t a core capability of the document generation category.
You may have seen some popular graphics that feature an intentionally overwhelming number of technology categories. Sirius Decisions has one that lists over 130 distinct technology categories related to Sales, Marketing and Product. Chiefmartec.com has been producing a graphic dedicated solely to the Marketing space that now contains nearly 50 categories with~ 5000 logos stuffed into these categories. Categorization is a natural mental technique that helps us recall and understand information. And it’s useful in helping to cut away some of the clutter. But it’s not a prudent way to make software buying and architecture decisions when taken at its face value. We recommend using high-level categorization to help reduce what Alvin Tofler referred to as “over choice,” but keep your options open enough to consider what really matters:
A distinction is sometimes made between operational and analytical reports. Operational data is typically a very current snapshot of the state of business operations while analytical data often includes historical or periodic snapshots of data over time. Operational data output typically takes the form of a digital dashboard or occasionally a printed report while analytical data is often output to an interactive tabular format such as a spreadsheet, or in more sophisticated applications, a data visualization tool.
Many analytical reports model data and status changes over time. This type of report requires that historical data be collected and stored. This in turn often requires the presence of a special type of database known as a data warehouse, or more recently known as “big data.” It’s important to note that many business applications that have otherwise robust dashboard reporting capabilities do not inherently have the ability to model trends and patterns since they only store the current state of information rather than a history of how the data has changed over time. So, the functionality of BI systems can be quite dependent on the type of data sources they are integrated with.
Data analysis is primarily the domain of the BI category. Document generation systems can collect and output operational statistics to a point, but lack the more sophisticated visualization capabilities found in leading BI systems. And, there are applications where a database publishing system could be used to output an “analysis” document such as an annual report. But those applications are few.
If you need to be able to visualize and interact with business data, BI tools are the strongest option. Interaction is a distinctive feature of many BI tools. Purpose-built BI tools typically provide a robust tool set for constructing analytical reports that allow users to interact with data. While the other software categories re centered around output to static formats, BI tools often display their output on screen and allow users to modify the output by sorting, filtering, tallying, pivoting and drilling into various levels of detail or summary.
Interaction capabilities run a gamut from simple sorting and filtering to sophisticated modeling. If you need cutting-edge analytics and modeling capabilities you’ll need to be careful to select a BI tool that specifically offers these advanced features—not all do.
Document generation software straddles the space between BI and database publishing. BI tools often have the ability to output a document to the screen, or export a file as a PDF or tabular format, but they do so one-at-a-time with a single document. On the other hand, database publishing systems are built to dynamically assemble content and then output a single resulting document for mass reproduction, such as a printed catalog. Document generation systems excel in their ability to output many unique documents either one-at-a-time on demand, or serially in a batch. Consider the examples of a printed theater ticket and monthly bank account statement. A single ticket is printed on demand when a customer chooses a seat at the box office, while a bank will generate, print and mail unique statements to all of its customers on a particular day each month.
The ability for a system to automate document creation and dynamically assemble preexisting elements is fundamental to all systems that support data-powered documents. This capability alone can save businesses many hours of searching, copy/pasting, writing and editing common documents. It also provides much-improved consistency and error reduction. There are some types of applications that focus primarily on just this capability. They are typically designed to collect “chunks” of preexisting content based on some conditional logic and then assemble them in a pre-formatted template for output,usually one at a time or in small quantities. Some examples include sales proposal builders, regulatory reports, and benefits statements.
Some applications restrict access to certain components of a document while allowing users to edit other parts. When document requirements are heavy with boilerplate text and when editing is forbidden or restricted, document generation systems will excel. This is often the case with legal and government regulated documents. Database publishing systems can meet some of these requirements but since these types of documents are intended to be ad-hoc and one-off, document generation systems are better suited.
The ability to connect document templates to data sources is common to the three categories but the type and location of the data sources can vary widely. In a closed system, the content may be available locally within the application, while in loosely integrated systems the data may reside in multiple locations and may have very specific requirements for the retrieval of the data. Data and content are most often stored in a database, but sometimes in a file or document-based system. Connecting to databases and navigating complex relational table structures can be a highly-technical task. Some BI tools and document generation solutions provide a layer of functionality to help deal with this and could be a key consideration depending on the level of technical expertise of your users.
While we’re on the subject of data sources, it’s important to consider the security implications of connecting to external data sources that may contain sensitive information. The administrator of the data source will typically control access to the data and only allow the appropriate level of access to the system, but the nature of the data itself may dictate that the editing, viewing and output systems be appropriately secure.
As mentioned, all of our three categories feature solutions that are adept at connecting to external data sources. Because BI and document generation have such broad application,the solutions in these categories tend to have correspondingly broad integration capabilities.Compatibility is key, but performance is also a consideration as some data formats are more optimized for performance than others. If at all possible, you’ll want to avoid forcing compatibility by translating data from one format to another.
These may be the features that distinguish the software categories from one another more than any other. It’s also an important area of differentiation within each category. Generally speaking, BI design tools excel at displaying data in the form of charts, graphs and tables, but allow only the most basic layout of panels on a page with minimal control over style. On the other end of the spectrum, the design environments of database publishing tools are built to provide maximum control and freedom over layout and style but typically lack even simple data visualization capabilities like charts and graphs. Document generation design environments tend to be built around the format and features commonly found in office documents and therefore either mimic or build upon standard word processors, spreadsheet tools or presentation applications. These environments are typically quite capable of handling data as well as layout and style—to a point. They are somewhat less specialized and may lack a few of the most sophisticated capabilities of their counterparts. But what they lack in specialization, they make up for with broad functionality and familiarity.
It’s worth noting that not all systems include design environments. In closed systems where the data and the document formats rarely or never change, the developer may programmatically generate the output by “hard-coding” the format parameters into the system and bypass need for an end-user design environment altogether.
Virtually all document systems make use of templates to help automate the application of format, style and layout. Templates are fundamental to all of the categories we’re discussing, but there are a few capabilities hat set some solutions apart. Document generation and database publishing systems are both built to work with multi-page documents. They allow designers to select and apply a variety of style templates to individual pages or groups of pages within a single document. Some include pre-formatted component libraries and “smart” templates that apply specific formatting to special pages such as cover pages, tables of contents, new sections, etc. And some of the most advanced allow the template creators to lock specific components and regions of a document to enforce consistency.
Many data-powered documents depend on conditional logic to assemble the appropriate content under varying circumstances. For instance, an investment proposal that includes a break-even timeline would have to dynamically calculate and format the document content based on the particular investments that were being recommended to the client. It’s possible to perform conditional logic at several different points in the process. It could be done within the database itself, which may be desirable if you wish to store the calculated output. It could be done with custom application code executed between the database and the document generation system, or with a more advanced system, it could be performed on-the-fly within the document generation code. The later may be the only possible solution if content formatting is also conditional.
It may be preferable to input the conditions that drive the content assembly in real-time, and only store them temporarily for the purpose of generating a single document. Consider a sales proposal application where a salesperson may input a few pieces of information via a form that will dictate the contents of the proposal—details such as the product(s) being proposed, delivery timeline,discounts and so on. Some document generation systems include input capabilities and are ideal for applications where the conditional logic is dependent on ad-hoc input. Carefully consider the logic that will drive the assembly, manipulation and formatting of the content that will be included in your documents. A well-designed solution can save countless hours of design, coding, and administrative effort.
Personalized content is everywhere, maybe even a little too much so. But the ability to add personalized content to documents is essential for many types of data-powered documents such as account statements and medical test results. Sales contracts and proposals may also contain personalized content. Studies indicate that personalization results in a lift in business results, so it follows that this may be an important consideration even if it’s not absolutely required.
If you need to view and work with data that’s very current or even real-time, then BI solutions are probably your best bet. They are primarily designed for the transitory viewing of data. They often have only basic output capabilities, but because the freshness of the data is so important to business intelligence, output to static formats defeats one of the most important purposes of the BI category.
Document generation systems can collect and output data that is just as current as any BI tool but the applications for real-time data output to print formats are relatively few. Some examples could be found in logistics operations in the form of pick tickets and packing slips.
Many data-powered documents contain multiple pages. Some contain just a few while others could be hundreds of pages in length. There are several special considerations when dealing with multi-page documents. One would be the ability to automate page and section numbering, or for a table of contents to be automatically generated and linked if it’s an interactive document. Another important feature is the ability to dynamically flow content from one page to another. Variable data size and length poses a challenge to even the most robust solutions. The most capable tools have the ability to fit or flow text to columns and new pages as well as being able to split tables across pages and include header rows at the beginning of each new section.
Printing is the most common form of reproduction but it’s also possible to distribute data-powered documents as digital files. Setting up documents or high-volume print production comes with specialized requirements for color management, trim and bleed margins, and sometimes page form (signature) layouts. This again underscores the highly specialized nature of database publishing systems in contrast to the more general requirements for document generation and BI output.
As mentioned earlier, some basic BI functionality is often built into applications whose main function is not exclusively BI. For example,Facebook’s built-in Activity Log displays a list of all the user’s activity with a couple of filters for time period and type of activity. This is simple BI functionality but Facebook is not a BI tool. There are, however, numerous purpose-built BI add-ons for Facebook that provide much more robust functionality. This is the case with many applications and should be a consideration when designing an application: how much BI functionality to build into the core application and how much to leave to external systems and processes.
Workflow is another broad term that can mean different things in different contexts. Where documents are concerned, workflow typically means collaboration, approvals,routing, scheduling and sometimes version control. Workflow is all about saving time by automating as many of the human processes as possible. Database publishing systems typically involve the most process steps and the largest teams and therefore excel in workflow management support. Document generation systems cover a wide gamut in this area as well. Since they are often part of office systems and processes, they too may include all of the above-mentioned features. However, they’re equally often embedded within closed systems and require no workflow management capabilities at all. When implementing data-powered document capabilities, always consider how the documents will flow through your processes and integrated systems.
The purpose of this document is to demystify data-powered documents and to help business users and developers make better clearer decisions when choosing tools for their business. To that end, we hope this paper has helped get you past some of the confusing terminology and focus on the task(s) at hand.
Automated data-powered document systems can provide huge savings for companies that are able to choose the right systems for the right tasks and take advantage of their full capabilities. It’s easy to see that there is a significant amount of overlapping functionality across these categories. But it also shows that BI and database publishing solutions are more specialized while document generation solutions cover a wider range of applications.
The type of documents you work with will be the primary driver of the systems you’ll want to use to manage and automate your data-powered document production. A close second is the type and location of the data sources you’re going to be accessing. Once you’ve chosen the right type of system, your next consideration will be the exact features that your developers, end-users and support staff will need. We hope you feel better equipped to make your decisions after reading this ebook!