< Back to Blog

Document File Formats (Complete Guide)

Techie

The Most Popular File Formats

If you use a computer, then you probably interact with documents and files often. You however may not pay much attention to the formats. Document file formats play a big part in determining how a computer opens a document and how that document will function.

Your computer probably has a default setting that can make it possible for you to open, edit, and save documents without worrying about what format it may be in. This can go on until you decide to email the document and the recipient may let you know they cannot view the file because of the format it is in. That is when knowledge about different file formats may become critical, you need to know them as well as extensions.

Document File Format Terminology

Document: You may be used to referring to a document like a written copy of the information. When it comes to computers, we use the Word document to describe any file created using software. This could be text, video, image, and audio.

File: A file is where documents or any data is saved on a computer. When you create a document, you will save it as a file and give it a file name. So a document can be referred to as a file once you save it.

Format: This is the way data in a file is saved and encrypted. Think of it as the language of the file or document. For example, a word document would be saved in DOCX format. That is the language that the document will communicate in. Particular data needs to be saved in its unique format.

Extension: An extension is simply what identifies the format a file is saved in. For example, if your filename is tom when you save it, your computer will automatically add the extension docx so that you will have the file saved as “tom.docx”. that will help the user know the format of the document. Please understand, however, that if you changed the file name to tom.pdf, that will not change the format. Changing the file extension doesn’t convert a file into a new format. We shall get into that later.

Throughout this guide we will be referring to the above terms, so it is good to take them in before proceeding to the rest of the guide.

Now let us look at some of the common file types:

Common File Types

There are numerous types of files, but some get used more often than others. You have come across many of the file types in this list, but you may also get to know new ones.

Text Files: Typing is probably the most common thing people do on a computer and whatever they type is saved as a text file. Your word document would be an example of a text file once you save it. There are many formats for txt files which we shall see eventually.

Image Files: There are various image file types you may have interacted with. Some are heavier than others and there are image files that are suitable for particular applications like on websites, while others are compressed for easier storage.

Audio Files: If you have saved music on your computer or other music devices, you have had to save an audio file like MP3 or Wav. These are the common audio file formats.

Internet Files: The internet creates a blend of files because it is used to share files. You will find that the files mentioned above like text, audio, and image are also internet files with different formatting. Some of the document file formatting for internet files includes HTML, JavaScript, and RSS among many others.

Spreadsheet: If you work with figures a lot, then these are files you would be familiar with. Microsoft Excel files are the most common file formats for spreadsheets.

Video File: Videos are saved in very many ways and you may encounter several video file types as you download videos online. The common types however include a Windows media player, MP4, Adobe Flash, and MPEG.

As we said, there are several file types. The above mentioned are the most common and they can give you a fair idea of what they are. For the different file types to function as they should, they depend on the document formatting. Below are some of the common types of document formatting.

Common Types of Document Formatting

From the definitions above for file format and extensions, you must have realized that the two are related. While we mention the common formats, we will identify them by their extensions, so this may serve as a list of file extensions as well.

.Doc: This was the earlier format for text-based files used by Microsoft. You can still use this format for files although there is a new option that is more commonly used. When you save a word document, you can choose to save it in DOC Word format but certain programs may fail to open the document.

.Docx: With more than 1 billion people using Microsoft Word, Docx is the most shared Word file format. This is the extension for the Microsoft Word document format. It replaced .doc and can be read by different programs. This format is a compressed archive for many files with added files that contain stylesheets and more information. Primarily it is an XML based file. These files however cannot be read with a text editor although you can unzip them and then inspect the files that make it up. There are other applications other than MS Word that can create DOCX files. Open Office is one of those applications and so is LibreOffice.

DOCM: These documents are not so different from docx files only that they contain embedded macros code. The code helps to automate docx files. Macros are particularly helpful when executing repetitive tasks in Word like data entry.

.TXT: These are purely text documents and have no formatting added to them. A typical TXT document would be used for taking down notes. Many programmers use them for writing code or instructions. Just about any program can open and read this format. On your PC, you can create these files using Notepad for Microsoft and Apple TextEdit on Mac.

.HTML: Now, meet the language that is also a file format. HTML stands for Hypertext Markup Language. It is used for Web Pages as a language and as a plain text file format. You can view this format on a plain text editor like Notepad which will show you certain features of the text that are commands and you cannot see them on an actual web page. HTML format is behind the scenes text that gets websites to work as intended.

.PDF: If you have been wondering why most people will request PDF format, well it is because this format can be viewed in any environment. It is a Portable Document Format which makes it appropriate for sharing any file. A PDF file will display the document exactly as the creator intended it to appear. It is also print-ready. PDF documents however cannot be edited since they are read-only. You can use various applications to view a file with the .PDF extension although Adobe Reader would be the standard choice.

PPT and PPTX: People who create presentations often are familiar with these file extensions. PowerPoint presentation files are usually saved in PPTX format. This format supports text, image as well as video and can be opened in various programs although Microsoft is the default program for PPTX files.

PPTM: PPTM documents can be used to automate certain functions within a presentation. These documents are just like PPT files but they contain Macros which are embedded instructions that determine how certain tasks are executed by simply pressing a single button.

XLS and XLSM: If you are dealing with files that contain a lot of data especially in table or graph form, then you will encounter files with these extensions. The XLS and XLSM extensions are commonly used on the Microsoft Excel spreadsheet. There are however many other programs that can create, edit and save these files.

XLSX: Although originally designed for Microsoft Excel, XLSX files can be opened using any spreadsheet app. The files are stored as Zip files that are used to open the document. Data in these documents is stored in columns and rows this is convenient when dealing with figures.

XLSM: These files are spreadsheet documents with embedded Macros. Xlsm files contain a set of instructions that help to automate spreadsheet documents. Developer tools are needed to record macros on spreadsheet. They help users work faster by automating repeated actions.

document file formats CSV extension

CSV: These files make it easier to export and import data files. CSV files are usually files that contain data with a lot of commas. The best program to use for this would be Notepad++ especially if the data is big. Many programmers use this format to store their code. You can also use it to save phone contacts that you would like to export.

ODT: This is an alternative to the DOCX file. It can be used for text, objects, images, and styles. ODT files are open document text files that can be created with any word processing files. It is commonly used by free document editors. If you do not have a program that can open these types of files, you can convert it by saving it as docx.

BMP: This is a Bitmap format that is used to store a map of images. This file stores all the color information for an image. By storing this image data, the document will maintain the image resolution even when transferred. The size of bitmap files however makes it a hard format to use often.

The list of document formats and their extensions is constantly growing and it would be impossible to exhaust it. Let's move on to the application of these different file formats.

DIB: Device independent bitmaps is a file format that can be used to save graphic files without a display device. Two dimensional images can be stored in this format. It supports different color resolutions and makes it easy to transfer images from one device to another.

GIF: Graphic interchange format is a portable bitmap file format that supports animation as well as several colors up to 8 bits per pixel. It is a format used mainly for graphics and logos but not advisable for photographs.

JPG/JPEG: This format of files is web friendly because it compresses digital information. It is not the best option for high resolution printing since it limits a lot while trying to maintain the size of the file.

PNG: PNG files are similar to JPEG, they are portable files used mainly for network graphics but unlike JPEG, they support transparent backgrounds. PNG format allows a user to save images with more colors and make the image sharper.

TIFF: Tagged Image File Format is a lossless format that allows images to maintain their original quality. For high resolution photography, TIFF format would be the best option to save files in. To preserve the quality of a scan image, this format is a good option.

PostScript

Files with the ps extension are Adobe postscript files which are used by publishers to print text and images on the same page. Within the file script, a user can embed printing instructions. PS image format also acts as programming language.

Choosing Document Automation Formats

When choosing automation formats, priority is given to the already existing program being used. In most cases, that program would be Microsoft Word or a similar Word processing program. The same MS Word format that you are using would be appropriate when automating. That would mean automated documents would either be DOCX format or DOC since these are the default formats for most users.

If you need to automate reports, you will probably be comfortable using Excel format since figures are more commonly managed on an Excel sheet. You can then choose any of the Excel extensions to save the document template for automation.

Although PDF may also be used for Word document templates, it is not an easy to use format. If you have ever tried to copy and paste a PDF document onto Word, you know the struggles that you would have. This is why docx would be preferred. If however, you are dealing with a document that has a lot of graphs, you would be better off formatting it as PDF. One of the solutions would be to use an HTML report template that can control the layout and content of a file. This, however, requires the involvement of someone with the knowledge of CSS.

File Formats for Presenting

When you are dealing with presentations, you may have a combination of document files, right from text to video, and all those in between. PowerPoint can support various formatting text and documents.

The format you choose will be dependent on the type of presentation you are making. For example, if it has a lot of figures, then you probably will use data from an excel document so it will need an Excel file format that is compatible with PowerPoint like .XML or if the presentation is comprised of many graphs, then PDF would be the right format.

However, the standard PowerPoint format would be PPTX which can open in various versions of PC programs as well as on a smartphone with the PowerPoint program installed. If you have images as part of a slide show, you can save them as jpg so that every slide is saved as an ordinary image.

Now, the following formats are very useful for sharing data but at the same time, they can be dangerous if you do not have a safety procedure to follow when dealing with them.

Executable Document File Downloads

Executable files come in various types but you can identify them by their extensions that show the format. Once you click on this document file download, the PC will try to run the file. Usually, this will be no problem since executable file formats are used for auto-installers as well as apps. But there is a big problem with these formats and that is because they are also used by people trying to infect your PC with a virus.

The document formatting guidelines for executable files give them the same privileges an authorized user has and this means they have access to the entire system. If you have administrator privileges, then the file will have the same privileges and will upload whatever data has been stored in it. Hopefully, you can see how this can be useful when installing software or updating your antivirus, and at the same time, you notice how dangerous it can be.

The rule of thumb when dealing with this kind of document file format download is not to click on any executables that are sent from an untrusted source. Many attackers will give the virus a name that disguises it as a useful software or an app you might be interested in. If the source of such a file is not known, do not trust the download.

Executable formats include the following extensions:

Additional Executable

JAR: These files are archive files that are compressed to include a manifest file for easy execution. JAR documents contain metadata as well as resources and are faster to download. A complete application and its classes can be deployed in a single request.

EAR (Enterprise Java app): Java EE uses EAR file format to package modules into an archive. These modules can then be deployed at the same time on the application server. Within the files, a user can store XML files that help in the deployment of modules.

Compressed Files

Compressed formats are mainly used for transporting files. If a file is very large it can be compressed so that it is easy to send and when the recipient gets it, they can unzip it to access the data. Executable files may also fall into this category. Word documents in RFT format can be compressed since they tend to take up a lot of space.

To open files in the compressed format you may need special software for the particular file while others can be opened automatically. Most software updates can be sent as compressed files and once you download, it can install without the user having to open it.

Once again, you should be very careful when dealing with compressed files, although they are helpful, they can be used to send viruses. If you have automated your system, there may be software that can unzip compressed files without your involvement and that may wind up infecting the system.

Common compressed file extensions include:

Rar: A user can use WinRAR to create rar files as compressed archive files. Rar files are easier to archive and to access them a user has to extract them like a Zip file. They can be used on Windows as well as Mac OS.

Scripts

Scripts are similar to executable files but they are written in a scripting language and they are not compiled the way executable files are. For files in this format like Postscript, you can use a text editor to open them and inspect its source code. These files can then be run like executable files. These are also a danger to your system. Be careful when dealing with scripts from sources you haven’t verified.

Advancement in technology makes it easy to deal with different documents. The process of choosing a file format can be done automatically based on the kind of document you are creating. It is also made easier when you use formats that can be opened and viewed by many programs. For example, DOCX, PDF, and XML are commonly used.

Since it is impossible to master all the formats right away, users have to keep learning through their experience once they understand the basics contained in this guide.

Bat: Bat files make it easier for a user to automate their documents. By embedding scripts in the file, you can achieve automation.

Reg: Use reg files to add and change values within windows registry. A user can use these files to backup files before making changes to the registry. Reg files can be used to make manual changes to files that have already been shared.

Windward offers products that automate documents and output them in a wide range of formats including HTML, PDF, DOCX, XLSX, PPTX, and a lot more.

Tags Start & End

Tags Can Start & End Anywhere

Appendix B

.NET code for multi-page image output

Appendix A

Java code for multi-page image output

Data Bin Search

The Data Bin can now be searched to find a table, column, node or other piece of data without scrolling through it all.

Shrink to Fit

This will shrink the contents of a cell until it fits the defined cell size.

Time Zone Conversion

A new Windward macro has been added to help with converting dates and times from UTC time to the local time zone.

Image Output Format

New image output formats added.

PostScript Output Format

PostScript, commonly used with printers and printing companies, has been added as an additional output format.

New and Improved Datasets (Designer, Java Engine, .NET Engine)

Datasets have been re-written from scratch to be more powerful and easier to use.

Stored Procedure Wizard (Designer)

This works for all tag types that are connected to a SQL-based data source (Microsoft SQL Server, Oracle, MySQL, or DB2).

Boolean Conditional Wizard (Designer)

Before, conditional statements could only be written manually. Now they can also be built using our intuitive Wizard interface.

Reorganized Ribbon

The ribbon menus have been re-organized and consolidated to improve the report design workflow.

XPath 2.0 as Data Source

Adds various capabilities such as inequalities,descending sort, joins, and other functions.

SQL Select Debugger

SQL Select  Debugger

  • The look and feel was improved
  • Stored Procedure Wizard
  • Improved Exceptions pane

Tag Editor/Tag Selector

Added a Query tab as a field for typing or pasting in a select statement

  • Color Coding of Keywords
  • TypeAhead
  • Evaluate is now "Preview"

Rename a Datasource

All tags using that Data source will be automatically updated with that name.

Connecting to a Data Source

New single interface to replace 2 separate dialog boxes

Tag Tree

Displays of all the tags in the template, structured as they are placed in the template. This provides a simple & intuitive way to see the structure of your template. Also provides the capability to go to any tag and/or see the properties of any tag.

Added Javelin into the RESTful Engine

Support for Google Application Engine Integration

The ability to integrate the Windward Engine into Google’s cloud computing platform for developing and hosting web applications dubbed Google Applications Engine (GAE).

Additional Refinement for HTML Output

  • Improved indentation for ordered and unordered lists
  • Better handling of template header and footer images
  • Better handling for background images and colors

Redesigned PDF Output Support

This new  integration will allow for processing of complex scripts and bi-directional  text such as Arabic.  Your PDF output  will be much tighter and more closely match your template, and we’ll be able  to respond rapidly to PDF requests and fixes.

PowerPoint Support

Includes support for new ForEach and slide break handling, table header row repeat across slide breaks, and native Microsoft support for charts and images.

Tags are Color Coded

Tags are color coded in the template by type, making it easy to visually identify them.

Increased Performance

Version 13’s core code has been reworked and optimized to offer a reduced memory footprint, faster PDF generation and full documentation of supported features and limitations in the specifications for DOCX, XLSX and PPTX.

Advanced Image Properties

Documents can include advanced Word image properties such as shadows, borders, and styles.

Improved HTML Output

Windward has updated HTML output to reflect changing HTML standards.

Version 13 New Data Sources

Windward now works with a slew of new datasources: MongoDB, JSON, Cassandra, OData, Salesforce.com

Generate Code

The Generate Code tool in the designer allows you to open an existing template and, with a click of a button, automatically create a window with the code needed to run your current template with all data sources and variables. Simply copy this code and paste into your application's code in the appropriate place. You now have Windward integrated into your application.

You only need to do this once. You do not do this for each template. Instead, where it has explicit files for the template and output, change that to parameters you pass to this code. Same for the parameters passed to Windward. This example uses explicit values to show you what to substitute in where.

Pivot Tables Adjusted in Output

Any pivot tables in an XLSX template are carried over to the XLSX output. The ranges in the pivot ranges are adjusted to match the generated output. So your final XLSX will have pivot tables set as expected in the generated file.

This makes creating an XLSX workbook with pivot tables trivial.

Imported Template Can be Set to Match the Parent Styles

In an imported sub-template, if its properties for a style (ex. Normal) differ from the parent template's properties for the style, the use in the sub-template can be set to either use the properties in the sub-template, or the properties in the parent.

You set to retain when you don't want the child template's styling to change when imported. You set to use the parent when you want the styling of the imported template to match the styling in the parent.

Any explicit styling is always retained. This only impacts styling set by styles.

Tags can be Placed in Text Boxes

Tags can be placed in text boxes. Including linked text boxes. This gives you the ability to set the text in a textbox from your data.

Tags can be Placed in Shapes & Smart Art

Tags can be placed in shapes & smart art. This gives you the ability to set the text in a shape from your data.

HTML Output Supports Embedded Images

When generating HTML output, the engine can either write bitmaps as distinct files the generate HTML references, or it can embed the images in the HTML providing a single file for the output.

Footnotes & Endnotes can Have Tags

You can place tags in pretty much any part of a template, including in footnotes & endnotes.

Document Locking Supported in DOCX & XLSX

Any parts of a DOCX or XLSX (PowerPoint does not support this) file that are locked in the template, will be locked the same in the output.

Specify Font Substitution

If a font used in the template does not exist on the server generating a report, the font to substitute can be specified.
In addition, if a glyph to be rendered does not exist in the font specified, you can specify the replacement font. This can be set distinctly for European, Bi-Directional, and Far East fonts.

Process Multiple Datasources Simultaneously

If you need this - it's essential. And if you don't need it, it's irrelevant.

Windward enables you to build a document by applying multiple datasources to the template simultaneously. When Windward is merging the data into a template, it processes the template by handling each tag in order, and each tag pulls from different datasources. (As opposed to processing all of one datasource, then processing the next.)

This allows the select tag to use data from another datasource in its select. For example, if you are pulling customer information from one data source, you can then pull data from the sales datasource using the customer ID of the customer presently processing to pull the sales information for that customer. If you're interested in patching together your data from multiple datasources, read this post on our blog.

Genesis Abel

Written by:_
Genesis Abel
Windward © 2020 All Rights Reserved.

Contact

Got questions about reporting and document generation? We've got answers—let's connect!
Send a note
messaging, phone, or email contact optionsclose out button