Get tips straight to your inbox and become a better document creator.

Subscribe to our monthly newsletter.
Windward Core is now Fluent by Apryse. Click here to experience the new and improved platform!
< Back to White Papers

11 Tips for Creating Report Friendly Data

Your Data: Friend or Foe?

Download the full White Paper

Here at Windward Studios we’ve seen numerous examples of how structuring customer data first leads to huge time savings in report design later. Along the way, we’ve also seen quite a few common errors.

We’re here to help you avoid these mistakes. This paper covers a quick look at the basics of data organization and features eleven useful tips to help you organize your data in a way that will save you time in the long run.

The Principles of Data Access

When you try to access a set of data from another program—any program—the process will run more smoothly when you keep in mind three key principles:

Principle #1: You are using a machine to get information from another machine.

Machines do not speak the language that we do, so working out a problem over coffee isn’t going to cut it. You need to have a basic understanding of how machines store and retrieve data.

9 objects that represent data access

Principle #2: Duplicate data can occur, but the good news is you can reference it uniquely.

When information on its face appears to be exactly the same but has different definitions by human standards, such as two individuals with the same first and last names and birth date, the machine storing that information usually creates a unique identifier in a separate field or record to make sure you can reference them uniquely.

Principle #3: Relations in data are crucial.

Everyone knows that relationships are important in life, and how you define those relationships and build them directly affects the ease with which you interact with people.

It is no different when it comes to data. Mapping out a clear structure for your data and knowing how items relate to other items from the start will make it easier to integrate your data with software applications.

Tips to Ensure A Reporting-Friendly Database

Tip #1: Always index your columns.

Indexing is a delicate balance of doing just enough without overdoing it.

Indexing improves the query response time of a select by creating a system-managed table that allows the data to be directly referenced instead of searching for it. But since each modification of data in a user table potentially involves updating the indexes, adding or removing data rapidly can noticeably slow down performance. In addition, not enough indexing will also decrease the performance of SQL selects when querying data.

To achieve this delicate balance, we recommend you:

  • Build your queries with an index order. If the data being returned is usually ordered by a certain column(e.g., dates) every time, then it makes sense to index the order of that column.
  • Make use of covering indexes. A covering index consists of all the columns a query needs. This optimizes your query for only the columns contained within it.
  • Rebuild fragmented indexes regularly. Indexes become fragmented through the modification of table information activity splitting the physical and logical locations, thus creating mismatches.

Tip #2: Separate data into logical pieces and types.

Applications that collect data frequently do not store that data in a logical manner. This most often occurs with text fields. Prime examples are names, addresses, dates and numerical values.

If you store a name as a single string, e.g., “First Middle Last,” you will encounter problems later when you need to sort by first name only, last name only or a mixture of any of the three. Placing this data into separate columns ensures that you can sort and access your data in an optimized manner.

A trick that DBAs use is to create a query that will assemble and return the full name based on these individual parts using the COALESCE function. This prevents the need for duplicate data by creating an additional “full name” column.

Tip #3: Use views to logically group data from multiple tables.

SQL gives you great power to create complex questions and return a list of organized results. Sadly, the queries themselves are not always easy to construct. Furthermore, if you want to return this data again, you either have to reconstruct the query from scratch or have planned ahead and saved the initial query.

For data grouping that is complex and used often, there is a better way. The SQL 92 specification allows you to save queries in a structure called a “view” so that you can reference it later by its descriptive name.

You do this simply by executing the statement “CREATE VIEW view name AS [Your Full Select Statement]”. You can then reference this VIEW anytime by issuing a “SELECT * FROM view name” statement. This not only makes it fast and easy to execute complex queries that you have previously built, but it is simple and error-proof for others in your organization to execute them as well.

We see this often when users are first working with our AutoTag product. They need to correlate data and group results from multiple tables in their database. AutoTag’s drag and drop table design makes it easy for them to simply drag a VIEW from our Data Bin to the template and pick the columns they want returned in their data set.

Tip #4: Employ NOT NULL unless there is a reason not to.

You might be asking yourself: What is this NULL term?

Keep in mind that even an empty string is a value, and there are times where you need blank items like this in your database fields. NULL is a placeholder in your database that represents missing or unknown data.

In most databases that conform to the SQL 92 specification, you have the option to specify adding NULL values for missing or unknown data to your tables and columns. This is useful because you can easily create queries excluding values that are not equal to NULL, therefore removing any blank or incomplete data results from your query.

Noting that NULL is not equal to 0

Tip #5: Don’t use separators in your data.

When storing long strings of data, end users can be prone to creating entries with separators (e.g., City/State) to signify different segments of data in the string.

First off, just...don’t. This is a bad practice because SQL utilizes many special characters in queries themselves,and this could interfere with an otherwise properly functioning query.

Worse yet, the programming language you are using to handle queries and results from your queries also has reserved keywords and characters that may break your code as well. Common examples we have seen:

  • City/State
  • Comma-delimited lists
  • Semicolon-delimited lists

But this applies to more than long strings. It also applies to data types such as:

  • Phone numbers
  • Social security numbers
  • Driver’s license numbers

Remember, you can always use SQL functions or your native programming language to manipulate a text string and print it in the form you desire. But when encoding the information in your database, it is always best to stick with storing the information without separators. It is less work for you when parsing it in your native language, and it will cause fewer headaches in your queries and code in the long run.

Recommendation image

Tip #6: Properly structure your metadata.

Data typing is crucial when organizing your data. Storing dates and currencies in different formats leads to unexpected results when applications interface with your data structure.

Image of very small code

If a date is stored as text, you will need to transform it using SQL functions into an SQL DATETIME object. This causes extra processing time and reduces the overall performance of your queries. The same can be said with currencies. Storing the currency symbol or storing the currency in a non-standard format, e.g., 1.000,56 instead of 1000.56, will also cause inconsistencies and errors with applications trying to utilize these currencies.

Therefore, it is best to encode all numerical values in the standard decimal format, which you can later transpose to other formats based on SQL functions. You can define this typing when you create your SQL column by specifying the column encoding. Examples are DATE, TIME, INTEGER, SMALLINT, BIGINT, DECIMAL, FLOAT, and VARCHAR, to mention a few.

Tip # 7: Be aware of the character sets used by your data.

Decades ago, data was encoded in 8 bits, or 256 unique characters, which seemed like a lot back then. But over time, more languages entered the computing landscape—languages like Mandarin Chinese, Thai, Korean, Japanese, Arabic and Russian—and suddenly 256 characters was not nearly enough to house an entire language symbol set.

So the developer powers-that-be got together and implemented an encoding called Universal Transfer Character Set Transformation Format, or as everyone else lovingly calls it, UTF-8. This is basically like a shift key for your keyboard that, in relation to the language set, allows combinations of character set number to be combined to create a new character reference. 32-bit encoding allows much more information to be encoded.

Okay, so great, we can now encode all the symbols that exist in these languages. Why should you care? You set it(your data to UTF-8) and forget it, right?

Not so fast.

The character set can only be defined at the database, schema, table or column level. UTF-8 is a 32-bit character field, which takes up memory and decreases performance, so blanket encoding all fields to UTF-8 would be a waste. The best practice is to reserve setting UTF-8 encoding for large fields of text such as MD5 password hashes, Web addresses and lengthy internal codes or block text.

Recommendation image

Tip #8: Remember that duplicate data is bad, period.

Data takes up space. Space requires hardware to store it. Hardware costs money. Searching and organizing data costs time, and time is money.

Therefore, the more duplicate data your system has, the more you are costing your organization by storing and processing it.

A key ring

The most common example of this we see is with storing name objects. Consider an instance where you have a customer list. To add a new customer, you enter the first name and last name, and your customer is created. Then suppose this customer later becomes a reseller of your product, so you enter the first name and last name in the reseller table.

The most common example of this we see is with storing name objects. Consider an instance where you have a customer list.To add a new customer, you enter the first name and last name, and your customer is created. Then suppose this customer later becomes a reseller of your product, so you enter the first name and last name in the reseller table.

You can solve this conundrum easily by creating a person table in the database. Each person is an individual entity with a first name, last name and other personal information. If a person is a customer then we can create a foreign key in the customers table that matches the primary (indexed, see why indexes are helpful?) key in the person table.

The same process can take place in the reseller table. Suddenly we no longer have duplicate data, and we can now more powerfully filter our data because we have applied Tip #2 above to our table structures.

The best way to identify duplicate data is to first look for duplicate entry. If this is occurring, stop and ask yourself if there is a better way to organize the data so it only needs to be entered once and can be referred too many times by multiple items.

Tip #9: Data can be referenced in different dimensions, so reference your data effectively.

Duplicate data occurs not only at the row/column level but also at a tabular and database level.

You can keep data references between tables in order by utilizing foreign key and primary key relationships, but you may need to filter and correlate this data across different databases as well. The key item to pay attention to is the fact that data correlation can occur in different dimensions. This can be very powerful—but if you are not careful, it can also generate very confusing data result sets.

Take an example where you have two databases:

  • An Orders database containing order information
  • A Sales database containing sales information.

There are also four tables that call upon data stored in these databases: a table of orders, a table of order detail, a table of sales people and a table of regions.

Five tables with connections between them

Now suppose you want to return the result of the top-grossing sales people,ordered by region, for each month. You need to relate the four tables with information stored in two different databases, and the one key piece of information to filter on is total sales.

Here it makes sense to create what is referred to in the industry as a fact table,a single table that contains the figures that tie the different data dimensions together and then makes use of foreign key/primary key relationships to link to these databases and tables.

In our example, we create a table with foreign keys linking to the orders table,sales table and region table. The orders table returns the total amount of each order to a column in the fact table based upon the order details table. You could run a query on the total sales fact table to return all salespeople in a filtered region during a date range. We could take this further and create another table that uses SQL SUM functions to create monthly totals for each salesperson grouped by region and filtered by data range.

This prevents the need to create complex relationships directly in a query that only the DBA could assemble, allowing someone referencing the database to access the information in a single table in an intuitive manner.

Tip #10: Make use of lookup tables to prevent grouping different data in a single table.

Three tables with connections between them

Let’s face it: Organizations need internal codes,and those who work with these codes day in and day out know the codes by heart. The medical billing industry is notorious for this, and we at Windward often encounter abbreviations for states, regions and countries.

But those who don’t know the codes by heart need what is called a lookup table, or hash table, that relates the shortened code version to the full text version.

Referencing these full name values by their abbreviation equivalent is a great way to keep larger VARCHAR values minimized on other tables while allowing the select to return those large values as needed.

Recommendation image

Tip #11: Use proper naming strategies (i.e., human-readable) for your data.

Humans love to be creative when naming things. Programmers and IT staff take this to a whole new level, often searching a popular domain dedicated to Naming Schemes. And while we may love referring to all the items we work with by their Nordic God names, when we expect other people not living in our world to work with it, well, it can become a problem.

Keeping your naming structures human-readable and intuitive is key to getting the most use out of your data sets. Naming your databases, tables and columns obscure names will only cause confusion and sometimes latency. (We have seen table names that are VERY long -- on the order of 80-100 characters. This is not only a pain to look at from a database architecture view but even worse for the poor soul who has to write SQL queries against them.)

So what do you do when you have a creative DBA or you yourself are this DBA with a fixation for non-intuitive naming?

Remember that ALIAS is your friend.

The SQL ALIAS command is structured as follows:

For Column Level Aliases

SELECT column_name AS alias_name FROM table_name;

For Table Level Aliases

SELECT column_name(s) FROM table_name AS alias_name;

An example will show how changing the column or table name into an intuitive alias can not only shorten your SQL queries but also make your database more usable for end users.

Example image

You can see that column names labeled in Elvish, as well as very long table names, are difficult to type and not easy to grasp quickly. This makes it easy to make mistakes in queries. Using aliases to keep thing short,simple and intuitive will save you many headaches for yourself and your database users down the road.

Well-Structured Data Leads to Efficient Reporting

Image about Windward

We hope you’ve found these tips helpful. As you put them to use, remember to let your reporting software do as much of the work as possible when accessing your data.

If you find that your current database-reporting software makes for a clunky experience, we invite you to see how easy it is to create reports in Windward.

Here at Windward Studios, we think that reporting and document generation should be simple—not overly complex, tedious, or technical. Your reports deserve to look as impressive as the information they contain.

Why can’t designing documents linked to your databases be as easy as creating a Word document, Excel spreadsheet, or PowerPoint slide deck?

It can. Windward's software applications simplify how businesses design and generate professional reports and documents. Windward provides a unique experience using Microsoft Office to format and edit report templates, and the sophisticated engine pulls data from multiple sources and merges data into documents. Windward provides a hassle-free experience that can actually make generating reports fun.

If you've just discovered us, we're excited. Try Windward with our 30-day free trial and start creating documents in quick time with our low/no code solutions.

A Guide to Evaluating Document Automation & Document Generation Products

White Paper | June 2020
Download the White Paper

INTRODUCTION

This guide will walk you through how to determine which document automation solution and document generation product is best for you. No one product, not even ours, is best for all use cases.
This guide only discusses the document template design component. While just a part of any solution, this is generally the most important part as it’s where the lion’s share of users’ time will be spent and limitations in this restrict the types of documents that can be created. So find out how you can start building document generation systems from templates in this article.
DEFINITIONS

Document Automation (also known as document assembly) is the design of systems and workflows that assist in the creation of electronic documents. These include logic-based systems that use segments of preexisting text and/or data to assemble a new document.

Document Generation is the process of creating hundreds, thousands, or even millions of personalized and distinct documents for internal or external use from a single template. While Document Generation is a subset of automation, for some products (not all) you can’t get just the Document Generation component of a Document Automation solution.

Reporting Software is a subset of Document Generation. Reporting software can’t do documents. But Document Generation software easily creates reports.

Tags are elements placed in the automation documentation template (DOCX, PPTX, XLSX) that the docgen system acts on when generating a document. These tags can be data to insert, business logic rules to conditionally display or suppress content, and much more. Each vendor has their own term for “tags.”

NOTE:

Going forward, the word docgen will be used to stand for Document Generation system in this guide. When something is template based Document Automation system, the word docauto will be used.

THE DESIGNER - MICROSOFT OFFICE

Every modern docgen product uses Microsoft Office as the template designer. While you can find a few very old products that have their own designer, you want to limit your consideration to those built on Office as it is far superior.

Some document generation solutions work with Word, Excel, & PowerPoint while others are Word only. If you need Excel & PowerPoint, then obviously, go with a solution that supports them too. If you only need document automation tools using Word, think carefully if you might want Excel or PowerPoint someday in the future.

Again: if you go with a Word document automation solution, be very sure you won’t ever want Excel or PowerPoint. Ever!

Google Docs, Star Office, etc.

The docgen solutions that have a separate addin or no add-in can usually work with any Word processor that can save as a DOCX file. It all tends to work exactly the same. For a full Word clone, this can work every bit as well.

Google Docs in this case though tends to be problematic because Google Docs does not have the layout and formatting capability of Microsoft Word. Not even close. Your limit here is not the docgen app; it’s Google Docs. For most use cases, Google Docs is not up to the job.

CRITICAL FUNCTIONALITY

The following eight items are key to the success of the docgen solution you select. If any of these is a mismatch for your needs and requirements, you will at best have a lousy solution. And you could very well fail. Understanding how each aligns to your use cases is critical to your success.

DESIGNER ADD-IN

Some docgen solutions include an add-in to help you place & edit the tags in the template. These come in two flavors; one much better.

First, some automated document creation solutions have no add-in to assist in crafting tags. You usually end up with notepad open where you write all the various tags and you copy from there and paste into Word. And for special uses, you type in, from memory or other notes, the additional properties.

This “no add-in” approach is slow, painful, & error prone. If you have 5 templates, each with 5 tags – then no big deal. But if every month you’re creating 100 templates, each with 150 tags, you’re now in hell.

FROM OUR CEO

While Windward can legitimately claim to be a "no Add-In" solution for designing on platforms other than Windows - we find that approach so inferior, we state that we cannot be used for this use case.

We prefer to not get your business rather than provide you a significantly inferior approach.

Not only is it slow & expensive, but because it is a death march, designers will not put in the effort to make a business document template shine. They just want to be done.

The second approach (much better) is a second application (usually in a browser) that helps you write tags. You still have to copy & paste between this second app and Word, but the add-in provides all possible choices for tags and helps you write your queries.

Not all the side-by-side add-in approaches are the same. Play with each carefully to see how it works for you; not in simple sample cases, but in the more complex document templates you will need to create.

The third approach (best) is an add-in that becomes part of Word; adding additional tabs to the ribbon. This makes adding and revising tags a breeze because it works on the tag in the template. And while helping to write each tag, it can do so in the context of where it is in the template.

The incorporated add-in approach is by far the best in template based document generation. But by definition, it is limited to Office on Windows.

This add-in is one of the two features (the query wizard below is the other) that determines how much time your team will spend to design document templates, day after day, week after week, year after year. If one approach is 15 seconds longer, and you are going to create 500 templates each with just 35 tags (that’s low), that’s 73 hours.

CODE BEHIND

While all the Document Generation solutions require you write code to call them (docauto is a no-code solution so not an issue), some of them require additional code for each template. This is called “code behind.”

In some cases, this code behind is defining different data specifications, such as you now also need the hire date. For these solutions, you don’t need code for each template, but a fair number of times templates will require additional data, or data ordered differently, and you have a code change.

Even worse, some require code behind for each template. Therefore, each new template based document generation means additional code. This is a giant hit.

Why? First you have programmers involved in template design. That’s expensive and slows the process down. Second, each new template requires rebuilding your application and pushing it through test & staging.

The one advantage to code behind is the developers can build data on the fly as it’s needed, including data generated according to business rules within the code. But in almost all cases, doing so directly in the template, as opposed to in the code behind, is superior.

In other words, you want the template file to be everything.

DRAG & DROP DOCLETS

One (or several) users can create content for a template that are saved off. Then they or other users can drag those saved doclets to drop onto a template. This provides template designers a way to create very complex templates easily by dragging the needed components. It also eliminates repetitive tasks.
For each docgen app, evaluate their drag/drop on the following criteria:

1. How do you create a doclet?
The best solution is to select content in Word and save that as a doclet. If it's more restrictive than this, will those restrictions stop you from creating very useful doclets?

2. Does it bring the full formatting of the doclet into the document it is dropped into?
This is actually a very hard thing to do in Word if the doclet uses styles that exist in the template with the same name - but different settings.

3. What can be saved?
Just template content? Or can you also save datasources, parameters, and more? This is not as important, but it is still a timesaver.

4. After you drop is it complete? Or do you need to perform additional steps? For example, if a doclet uses a different datasource, is that datasource now also tied to the template?
Not that important, but nice to have.

5. Can doclets in a template be updated?
If a doclet is the company logo and the logo changed, can all the templates using that doclet be updated to the new logo universally?

The dropped doclets come in several flavors. The optimum are linked doclets where the content of the doclet is displayed in your template in full, fully laid out and formatted. And as it is linked, when the doclet itself is revised, that change immediately appears in your template and is used in every generated document.

Once you drop a doclet into your template, you can can adjust it any way you wish from formatting to tags in the content. But if the original doclet is changed, that change is not applied in your template. In some uses this is preferable when you don’t want changes applied to existing templates.

The third approach is there is a tag that will import the doclet. You don’t see the contents of the doclet in your template, but when the template is processed, it will pull the live copy of the doclet. This is valuable when you have a select that will determine which doclet to import. This is useful for cases like you need to pull in content based on the State the recipient of the document lives in.

The optimum of course is to have all three flavors available to use each as appropriate.

QUERY WIZARDS

Your most common activity creating templates will be writing the queries to select the data. You do this to select blocks of data such as all stocks you hold for a portfolio statement. You also do this for conditional logic in the template such as adding insurance requirements for an offer letter if they reside in California. Or when placing a name in loan papers.

Some docgen products do not have query wizards. With no wizards, then template creation is a developer-only task. And for developers, it will be slower. No wizards mean you can never turn template creation over to business users.

FROM OUR CEO

You will do this hundreds of times in complex templates. Thousands of times across all the templates. You want this to be quick & easy. This functionality, more than everything else put together, determines how much time you will spend designing templates, and how pleasant it is.

- David Thielen

When you evaluate different document creation automation solutions, have a business user use the system to craft the queries and see how well they do. They’ll be slow & hesitant at first. But it’s key to see if they can learn it and then be successful on their own.

In the case of conditional tags (if, switch, etc.) make sure it also works well on elements returned by other tags (usually the iterative tags). Because in this case, it’s not a query of the data, it’s a condition on data already returned.

Finally, keep in mind that no matter how brilliant the query wizards are, the user will also generally struggle with the structure of the data (the metadata). This can be displayed to the user, but they still need to learn what is where. Reducing what metadata is displayed, providing the descriptions for each node in the metadata, etc., can make the difference between a usable and unusable solution for business users.

MULTIPLE DATASOURCES

If you have a single datasource, then skip this section – you don’t care.

Ok, you have multiple datasources, for example Salesforce & Marketo. And you have documents you want to populate with data from each. In this case you must get a docgen solution that lets you have tags in a single template that are marked for which datasource that tag is to be applied to.

Some automate document generation providers implement this in two passes:  First applying all the Salesforce tags and then starting over and applying all the Marketo tags. This works fine if you are not intermixing the data.

Sometimes you need to intermix the data: for example, if your document lists all Account Executives (from Salesforce) and then within the data for an AE it lists the emails they were sent (from Marketo). Then you need a solution that processes all datasources simultaneously.

If you have multiple datasources, you almost certainly will eventually need the best automated document assembly software that processes multiple datasources simultaneously. If it’s not a must-have today, it probably will be a must-have in a year.

TAGS START & END LOCATION

Some tags have a start and end location, such as the if and forEach (iterative) tags. Generally, these are used to repeat or conditionally include a row in a table or a paragraph of text. All solutions do this.

But as time goes on and you create more advanced & complex templates, you will find yourself wanting to start the iteration in the middle of a table or an if that removes two cells and adjusts the table correctly.

In addition, you almost certainly will need a forEach (iterative) tag that adds columns in a table, as opposed to rows. You may want a column for each product or each month in a dataset. Finally watch out for any limitations on combinations. At the start you need a single forEach tag. A year later you are nesting five forEach tags within each other as it’s the only way to get what you want.

This is an area where it’s impossible to give guidance on what you may someday need. Your best bet is to select a solution that has no limitations on the start & end location.

OPTIONALLY HIDING CONTROL TAGS

For a simple template, this doesn’t matter (much). But as the logic expands in a template, you find that you are adding a lot of control tags. The most common are the iterative (forEach) and conditional (if) tags. But even a moderately complex template will also have numerous query and set tags along with several additional tags.

These tags, if displayed, pollute the template and enlarge the layout in the template. Usually you’ll find the template looks quite different from the final generated report. This makes it difficult to truly imagine the final document from the template. It’s frustrating to have to constantly run test documents to see what you’re going to get.

You’ll be much happier if the designer can at the click of a button hide or show the control tags. Show them when you’re working on the template logic. Hide them when you’re working on the final layout and formatting. This option will save you time and more importantly will make the design experience more pleasant.

Even on something as simple as this table, the ability to hide the control tags is a clear benefit.

IMPORTED TEMPLATES

The best way to use content across multiple templates is to have that content in a child template that the parent templates all import. These imported templates can be brought in as an explicit filename or as a data query that returns the filename.

Trust me: unless your needs are incredibly simple, you need this. You can work around it even if you repeat the same content in 100 templates, but you’re giving yourself too much extra work when wording changes due to company directives or legislation.

One critical detail on imports:  Does the system process tags in the imported child template? If all of your child templates are static text (legal clauses), then this does not matter. But if you need to include anything live (a person’s name, a date, a state of residence), then you need a solution that process tags in the imported child template.

Finally, for Word only, how does it handle style mismatches? If the parent has the Normal style set to Times New Roman 12pt and the child has Normal set to Verdana 10pt, then what should the child paragraphs be styled as? This can be a royal pain because different users never have their styles matching.

Some systems convert the child to the parent formatting. Some retain the child formatting. And some (best solution) give you the option of either. The option is best but if it’s forced one of the two ways, make sure the system you get works that way.

Not having the expected styling on output is guaranteed to get upper management upset.

IMPORTANT FUNCTIONALITY

One of these might be critical to your use case. Several might be useful. But in most cases, none is a must have. They do however provide a picture of the breadth of each product.

FUNCTIONS (MACROS) INCLUDED & CUSTOM

For the solutions that allow queries in the tags, you want one that also supports complex functions operating on the data. And not just simple functions like SUM() and COUNT() but most of what’s available in Excel. You will use Text and DateTime  a lot.

In addition, can you add your own functions? Adding custom functions is often a significant component of providing a simple & easy design experience to business users. It’s also a lot safer.  For complex calculations you write it once in the function and test it carefully. No worries about someone screwing it up writing it by hand in a template.

ACCESS PROVIDERS

All of the products (I believe) support reading files from BASIC, Digest, Negotiate, & Oauth2. But what about a special Authenticate & Authorize you created in your company for one set of files? Or something special to get to a JSON file from a REST service that is home grown?

First off, make sure the solution supports the standard protocols you use. You should get a yes. And if that’s all you have – fantastic; you can skip to the next section. If  you have a home-grown A&A. find out what needs to be done to have the system access it. This is a custom Access Provider. And make sure that the same Access Provider is used for reading data files (XML & JSON), accessing OData, and importing files (templates & pictures).

DOCUMENT LOCKING

If you want to create DOCX or XLSX files where an employee can then edit parts of it, this is incredibly valuable. For example, you are generating portfolio statements and the legal disclaimers and actual financial results must not be changed, but there is a paragraph where the financial advisor can write up more summarizing the performance.

In this case, some of the solutions will carry document locking in DOCX & XLSX (PPTX does not have this) over to the output. So, if the template has locked all except one paragraph, then the generated DOCX will be locked except for that one paragraph.

FROM OUR CEO

Having the document locking functionality tends to make your lawyers very very happy. It eliminates a source of serious legal liability.

- David Thielen

VALIDATION, ERROR & WARNING HANDLING

What is provided here is all over the board. And it’s difficult to get specific about what is most useful to you, as opposed to the next person. The best advice here is just look at what they have and try it out when evaluating.

One tool is validating a template. Not running it, but inspecting it and providing information on errors found. A second tool is to generate the document and deliver a list of errors and warnings. For example, if some content is placed off the page, it was rendered but you don’t see it. In this case it’s useful to have a listing of content off the page.

In this category you can include tag settings -  what to do if a select fails, returns nothing, etc. Some of these are particularly useful but in other cases, you can find yourself investing more time than it’s worth.

PROCESS EMBEDDED OFFICE OBJECTS

What if you are generating portfolio statements using a Word template? It has descriptive text, a chart showing performance, legal disclaimers, etc. But where it has a table showing the actual numbers, you want to place an embedded spreadsheet with the numbers.

Why? Because this way the recipient can open that spreadsheet and then, using Excel, measure that data any way they want. It’s a much-improved portfolio statement and something that makes the recipient go WOW.

If you want this, verify that the document automation vendors you select not only carries embedded objects to the output, but that the embedded object, if a DOCX/PPTX/XLSX file, has tags in it processed. To make good use of this functionality the embedded object must be treated as a live template, not a static document.

If fully implemented, the output to any format, such as PDF, will include the displayed embedded object.


This is generally not required, but it is an opportunity to make people love what you create.

WORD FORM FIELDS

This is a DOCX -> PDF issue. Do you need to have form fields in the DOCX such as drop down, list or check box become the equivalent thing in PDF output? If so, you need to verify that this feature is supported.

In addition, make sure that the initial content/value in the form field can be set from data. If it’s just static values from the template, that tends to not be sufficient for all use cases.

And a suggestion. When you need an empty or checked box depending on data, don’t use a form field. Use the Wingdings characters  and .

EXCEL REFERENCES & PIVOT TABLES

This is two XLSX -> XLSX issues. First, verify that a formula like SUM(D5:D5) expands to SUM(D5:D15) for the case where the row 5, inside an iterative loop, becomes rows 5 to 15. It’s very useful to have the formula adjusted (some products just write the literal value) on the output. This way, when someone adjusts say D7 to see what happens, all the formulas now adjust to that difference.

The same for pivot tables. If a pivot table is for D1:H5 and the generated XLSX now has those rows as D1: H125, the pivot tables are adjusted to match. This is necessary to use the pivot tables in the generated XLSX.

If you’re going to generate XLSX for Excel Power Users, this is key.

CAPABILITIES

These are mostly yes/no items. If you have data in SQL, the product needs to support SQL and you don’t care if it supports JSON. Same for programming language & output formats. So fast check-off here.

PROGRAMMING LANGUAGES

This is not an issue for docauto, just document generation.

There are three ways to call a docgen engine: Direct calls to a library, calls to a RESTful server on premises, and calls to a hosted (SAAS) RESTful server. Ask if they have what you want.

One note on Hosted solutions:  You will be sending data to that system. First, you want to make sure that the vendor is providing adequate security. Second, if your data is not allowed to go outside your country or region (E.U.), find out not just where the default server is, but also the failover server.

If you’re concerned enough about security to be asking these questions, you should probably host the RESTful server yourself. Even if you place it on AWS or Azure, you are controlling access to the server and its location.

SUPPORTED DATASOURCES

If all your data is JSON (or any other type), you don’t have to worry about what else the system can access. With that said, everything is getting more interconnected and odds are sooner or sooner you will have to access other datasource types.

Life is a lot safer if the solutions can use data from SQL, XML, JSON, & OData. (And why OData? 150 other vendor’s datasources, from ACT to Salesforce to Zoho.) Not a deal breaker but it will turn out to be useful.

See if you can create datasets from datasources. This is akin to views in SQL but you are creating them in the template (no DBA needed). And you want them for XML, JSON, & OData too. A good guide to how robust the dataset implementation is–do they basically become another datasource? If so, that’s a full implementation.

Furthermore, it can take time and bandwidth to download the metadata from a datasource. We saw one DB2 database take 28 minutes to download the full metadata (yes – truly!). If you have datasources with large metadata structures, find out if they have a way to read the schema once and reuse that. (This is unlikely to ever be needed for XML or JSON–it’s SQL, OData, & any custom datasources.)

Finally, for XML, make sure it uses the XML schema if one is available.

OUTPUT FORMATS

Check that it renders in the output formats you need. Everyone does PDF, HTML, DOCX, XLSX, & PPTX (last two if they support that template type). Additional output formats might be useful, but odds are you’ll never need them.

Check the accuracy of the PDF output. Everyone is imperfect on this. And in their, and our, defense, Microsoft does not document how Word calculates page layout. It does not specify the calculation between 2 lines of single-spaced text. And it’s impossible to reverse engineer accurately–Word is clearly performing complex calculations, not just using the font metrics.

Everyone does their best. Some come closer than others. Look for a good match but accept it won’t be perfect.

MISCELLANEOUS

Are you still here? Wow – congratulations! We’re now into some features that you may find useful but are unlikely to be major. But these do make good tiebreakers in your decision. And you may find one of the below to be major; for example, output that must be auto-hyphenated.

PARAMETERS

All products have a way to pass parameters to the template to use in the queries. Check that they have all the data types you need (they probably do).

Check that parameters can be set in a select as both a parameter (avoid injection attacks) and as a string substitution if desired. Setting as a parameter is valuable not only to avoid an injection attack, but to handle the cause of passing the name O’Malley.

CONDITIONAL FORMATTING

Excel has conditional formatting for a cell. But Word and PowerPoint do not. If you need conditional formatting, check if the solution you are looking at has it, and if so, if it’s sufficient.

AUTO-HYPHENATION

For output to PDF and a printer, if you want auto-hyphenation, make sure the solutions you are looking at offer it. Most people don’t care about this, or at least don’t care that strongly. But it’s a “must have” for a few.

TAG TREE

Does the designer have a way to show the structure of the tags in the document? And clicking on one, go to that tag? There is no need for this in simple templates. but when you get to 30+ tags it becomes useful. And at 80+ it becomes essential.

If you’ll always be under 50 tags, no big deal. But if you start under 50 tags and will grow to 200+ tags in a template, not having this will become a big deal. So think about where you’ll be in 5 years.

DATA COUNT

If you run a template and it takes forever, or it completes but it’s 2,00 pages long when you expected 2 pages – why? You can ask a DBA and they can track your selects and tell you the problem.

It’s faster & easier if the template add-in has a tool that tells you for each iterative select how many rows of data it returns and how long the query took to complete. From this you can quickly find what is wrong.

Useful, not essential.

GENERATE CODE

This is used once and saves at most 15 minutes - but it is very nice to have. This is irrelevant for the solutions that have code behind – they create code for each template.

For the one-time code to illustrate  what code is needed to add to your application to use the docgen system, it’s ideal if they include a generate code feature that provides you sample code.. And in addition, you know the correct way to call the engine.

Nice, not essential.

DEBUGGERS

Fortunately, these are rarely needed. But when needed, they can be a big time saver. There are several different debuggers that may be in a docgen template designer add-in.

  • Template Debugger - This is a means to step through applying data to a template to generate the document. You want the common features of breakpoints, single step, and viewing all data and state when you brake into the debugger. This helps you debug the business logic in your template.
  • Connection Debugger - This is a means to help you find the right connection string to a datasource. (Usually SQL but it can also be a URL to XML/JSON/OData.) This will attempt the connection and if it fails, provide all exception information. It will also help with writing the connection string.
  • Query Debugger - This is a means to determine why a select is invalid. Again, the main use here is try different selects and then see the exceptions return. It's a fast way to go through trial & error.

As stated above, these are rarely needed so they're in the "useful but not important" category - except that one time you really really need it.

TAGS

Every product has different names for the various tags. Here we use the tag names from Windward, but everyone has most of these.

It’s also important to look at the specific functionality of some tags. Can the import tag optionally insert a section break (Word) before/after the import? Can the forEach tag insert a section break, new workbook, and/or new slide on each iteration? Are bitmap & chart tags actual Office pictures & charts?

  • out
    Place the data from a select at this location
  • import
    Data returns a filename or URL, place what's in that file at this location
  • bitmap
    Some solutions handle bitmaps as part of the out & import tag. Others have a distinct tag for bitmaps (sometimes called pictures).
  • set
    Set the value of a parameter (new or existing)
  • query
    Reads one row of data to be used by other tags
  • forEach, endForEach
    Iterate through the rows of data in the query repeating the template content between them once for each row of data returned.
  • if, else, elseIf, endIf
    Conditionally include content between if and else or else and endIf based on the result of the query. The else is optional. Note: Windward does not have elseIf.
  • switch, case, endSwitch
    Like if but has multiple case statements within the switch.
  • link, endLink
    The query returns a URL for the link. The link is applied to all content between the start and end tag.
  • bookmark
    A link anchor inside the document. The name set from the data.
  • chart
    A chart built from data. Some systems for DOCX/XLSX/PPTX output create an Office chart object (good) while some create a bitmap rendering of the chart (poor).

It's fascinating the power of what you can create with this set of 11 tags. There really is no limit, yet it's with a moderately sized set of constructs. The power is in what each of those do under the covers.

DOCUMENT GENERATION PRODUCTS

Here is a list of document generation software that you can embed into your applications or solutions. The template design functionality is most of what’s key for these solutions.

  • Docmosis
    Generate documents and reports based on templates. Output in PDF/Doc/ODT from Java, PHP, C#, Ruby and more.  
  • Ecrion
    At Ecrion, we make customer communications management software for companies who want to establish genuine connections across multiple engagement channels.
  • Formstack
    Use Formstack document generation software to merge data into custom-built documents. It save hours of time and money.  
  • HotDocs
    No matter your industry or company size, HotDocs from AbacusNext has a solution to help speed up your document creation workflow.
  • Windward Studios
    The Global Leader in Document Generation Solutions. Revolutionize your docgen. Windward provides seamless integration in your CRM or custom apps.
  • XPertDoc
    Our specialty is document generation and automation. Our mission is to enable organizations to digitally transform their document processes.

DOCUMENT AUTOMATION PRODUCTS

Here is a list of Low Code/No Code solutions that provide an end-to-end document automation solution. This guide focused solely on the template design step of these total solutions. You need to also evaluate the additional functionality each provides.

  • Conga
    Conga’s end-to-end AI digital document transformation increases business-critical efficiencies, leads, and revenue generation. Automate for ROI...
  • Formstack
    Use Formstack document generation software to merge data into custom-built documents. It saves hours of time and money.  
  • Nintex
    Nintex is the market leader in end-to-end process management and workflow automation. Easily manage, automate, and optimize your processes with no code.
  • Templafy
    Templafy helps companies perfect every aspect of business document creation. Enable your employees to work faster & within company standards every time.
  • Windward Studios
    The Global Leader in Document Automation Solutions. Revolutionize your document generation. From a comprehensive SaaS or desktop solution, to seamless integration in your CRM, we have you covered. Take advantage of document automation using Word when you choose our software.

Document Automation for high volume output and a familiar user-friendly design environment

Start your FREE trial!
Call us with questions at 1.303.499.2544
Apryse Software Corp. © 2024 All Rights Reserved.