< Back to Blog

Everything you need to know about Locales

Industry

A long time ago when I was a senior developer in the Windows group at Microsoft, I was sent to the Far East to help get the F.E. version of Windows 3.1 shipped. That was my introduction to localizing software – basically being pushed in to the deep end of the pool and told to learn how to swim. This is where I learned that localization is a lot more than translation.

Note: One interesting thing we hit – the infamous Blue Screen of Death switched the screen into text mode. You can't display Asian languages in text mode. So we (and by we I mean me) came up with a system where we put the screen in VGA mode, stored the 12 pt. courier bitmap at the resolution for just the characters used in BSoD messages, and rendered it that way. You kids today have it so easy J.

So keep in mind that taking locale into account can lead to some very unexpected work.

The Locale

Ok, so forward to today. What is a locale and what do you need to know? A locale is fundamentally the language and country a program is running under. (There can also be a variant added to the country but use of this is extremely rare.) The locale is this combination but you can have any combination of these two parts. For example a Spanish national in Germany would set es_DE so that their user interface is in Spanish (es) but their country settings are in German(DE). Do not assume location based on language or vice-versa.

The language part of the locale is very simple – that's what language you want to display the text in your app in. If the user is a Spanish speaker, you want to display all text in Spanish. But what dialect of Spanish – it is quite different between Spain and Mexico (just as in America we spell color while in England it's colour). So the country can impact the language used, depending on the combination.

All languages that support locale specific resources (which is pretty much all of them today) use a fall-back system. They will first look for a resource for the language_country combination. While es_DE has probably never been done, there often is an es_MX and es_ES. So for a locale set to es_MX it will first look for the es_MX resource. If that is not found, it then looks for the es resource. This is the resource for that language, but not specific to any country. Generally this is copied from the largest country (economically) for that language. If that is not found, it then goes to the "general" resource which is almost always the native language the program was written in.

The theory behind this fallback is you only have to define different resources for the more specific resources – and that is very useful. But even more importantly, when new parts of the UI are made and you want to ship beta copies or you release before you can get everything translated, well then the translated parts are in localized but the untranslated parts still display – but in English. This annoys the snot out of users in other countries, but it does get them the program sooner. (Note: We use Sisulizer for translating our resources – good product.)

The second half is the country. This is used primarily for number and date/time settings. This spans the gamut from what the decimal and thousand separator symbols are (12,345.67 in the U.S. is 12 345,67 in Russia) to what calendar is in use. The way to handle this is by using the run-time classes available for all operations on these elements when interacting with a user. Classes exist for both parsing user entered values as well as displaying them.

Keep a clear distinction between values the user enters or are displayed to the user and values stored internally as data. A number is a string in an XML file but in the XML file it will be "12345.67" (unless someone did something very stupid). Keep your data strongly typed and only do the locale specific conversions when displaying or parsing text to/from the user. Storing data in a locale specific format will bite you in the ass sooner or later.

Chinese

Chinese does not have an alphabet but instead has a set of glyphs. The People's Republic of China several decades ago significantly revised how to draw the glyphs and this is called simplified. The Chinese glyphs used elsewhere continued with the original and that is called traditional. It is the exact same set of characters, but they are drawn differently. It is akin to our having both a text A and a script A – they both mean the same thing but are drawn quite differently.

This is more of a font issue than a translation issue, except that wording and usage has diverged a bit, in part due to the differences in approach between traditional and simplified Chinese. The end result is that you generally do want to have two Chinese language resources, one zh_CN (PRC) and one zh_TW (Taiwan). As to which should be the zh resource – that is a major geopolitical question and you're on your own (but keep in mind PRC has nukes – and you don't).

Strings with substituted values

So you need to display the message Display ("The operation had the error: " + msg); No, no, no! Because in another language the proper usage could be Display("The error: " + msg + " was caused by the operation"); Every modern run-time library has a construct where you can have a string resource "The operation had the error: {0}" and will then substitute in your msg at {0}. (Some use a syntax other than {0}, {1}, …)

You store these strings in a resource file that can be localized. Then when you need to display the message, you load it from the resources, substitute in the variables, and display it. The combination of this, plus the number & date/time formatters make it easy to build up these strings. And once you get used to them, you'll find it easier than the old approach. (If you are using Visual Studio – download and install ResourceRefactoringTool to make this trivial.)

Arabic, Hebrew, and complex scripts.

Arabic & Hebrew are called b-directional because parts of it are right to left while other parts are left to right. The text in Arabic/Hebrew are written and read right to left. But when you get to Latin text or numbers, you then jump to the left-most part and read that left to right, then jump back to where that started and read right to left again. And then there is punctuation and other non-letter characters where the rules depend on where they are used.

Here's the bottom line – it is incredibly complex and there is no way you are going to learn how it works unless you take this on as a full-time job. But not to worry, again the run-time libraries for most languages have classes to handle this. The key to this is the text for a line is stored in the order you read the characters. So in the computer memory it is in left to right order for the order you would read (not display) the characters. In this way everything works normally except when you display the text and determine moving the caret.

Complex scripts like Indic scripts have a different problem. While they are read left to right, you can have cases where some combinations of letters are placed one above the other, so the string is no wider on the screen when the second letter is added. This tends to require a bit of care with caret movement but nothing more.

We even have cases like this in English where ae is sometimes rendered as a single æ character. (When the human race invented languages, they were not thinking computer friendly.)

Don't Over-Stress it

It seems like a lot but it's actually quite simple. In most cases you need to display text based on the closest resource you have. And you use the number & date/time classes for all locales, including your native one. No matter where you live, most computer users are in another country speaking another language – so localizing well significantly increases your potential market.

And if you're a small company, consider offering a free copy for people who translate your product. When I created Page 2 Stage I offered a free copy (list price $79.95) for translating it – and got 28 translations. I also met some very nice people online in the process. For an enterprise level product, many times a VAR in another country will translate it for you at a reduced rate or even free if they see a good market potential. But in these cases, do the first translation in-house to get the kinks worked out.

One resource I find very useful is the Microsoft Language Portal where you can put in text in English and if that text is in any of the Microsoft products, it will give you the translation Microsoft used for a given language. This can give you a fast high-quality translation for up to 80% of your program in many cases.

Удачи! (Good Luck)

What every developer should know series:

  • What every developer should know series
  • Tags Start & End

    Tags Can Start & End Anywhere

    Appendix B

    .NET code for multi-page image output

    Appendix A

    Java code for multi-page image output

    Data Bin Search

    The Data Bin can now be searched to find a table, column, node or other piece of data without scrolling through it all.

    Shrink to Fit

    This will shrink the contents of a cell until it fits the defined cell size.

    Time Zone Conversion

    A new Windward macro has been added to help with converting dates and times from UTC time to the local time zone.

    Image Output Format

    New image output formats added.

    PostScript Output Format

    PostScript, commonly used with printers and printing companies, has been added as an additional output format.

    New and Improved Datasets (Designer, Java Engine, .NET Engine)

    Datasets have been re-written from scratch to be more powerful and easier to use.

    Stored Procedure Wizard (Designer)

    This works for all tag types that are connected to a SQL-based data source (Microsoft SQL Server, Oracle, MySQL, or DB2).

    Boolean Conditional Wizard (Designer)

    Before, conditional statements could only be written manually. Now they can also be built using our intuitive Wizard interface.

    Reorganized Ribbon

    The ribbon menus have been re-organized and consolidated to improve the report design workflow.

    XPath 2.0 as Data Source

    Adds various capabilities such as inequalities,descending sort, joins, and other functions.

    SQL Select Debugger

    SQL Select  Debugger

    • The look and feel was improved
    • Stored Procedure Wizard
    • Improved Exceptions pane

    Tag Editor/Tag Selector

    Added a Query tab as a field for typing or pasting in a select statement

    • Color Coding of Keywords
    • TypeAhead
    • Evaluate is now "Preview"

    Rename a Datasource

    All tags using that Data source will be automatically updated with that name.

    Connecting to a Data Source

    New single interface to replace 2 separate dialog boxes

    Tag Tree

    Displays of all the tags in the template, structured as they are placed in the template. This provides a simple & intuitive way to see the structure of your template. Also provides the capability to go to any tag and/or see the properties of any tag.

    Added Javelin into the RESTful Engine

    Support for Google Application Engine Integration

    The ability to integrate the Windward Engine into Google’s cloud computing platform for developing and hosting web applications dubbed Google Applications Engine (GAE).

    Additional Refinement for HTML Output

    • Improved indentation for ordered and unordered lists
    • Better handling of template header and footer images
    • Better handling for background images and colors

    Redesigned PDF Output Support

    This new  integration will allow for processing of complex scripts and bi-directional  text such as Arabic.  Your PDF output  will be much tighter and more closely match your template, and we’ll be able  to respond rapidly to PDF requests and fixes.

    PowerPoint Support

    Includes support for new ForEach and slide break handling, table header row repeat across slide breaks, and native Microsoft support for charts and images.

    Tags are Color Coded

    Tags are color coded in the template by type, making it easy to visually identify them.

    Increased Performance

    Version 13’s core code has been reworked and optimized to offer a reduced memory footprint, faster PDF generation and full documentation of supported features and limitations in the specifications for DOCX, XLSX and PPTX.

    Advanced Image Properties

    Documents can include advanced Word image properties such as shadows, borders, and styles.

    Improved HTML Output

    Windward has updated HTML output to reflect changing HTML standards.

    Version 13 New Data Sources

    Windward now works with a slew of new datasources: MongoDB, JSON, Cassandra, OData, Salesforce.com

    Generate Code

    The Generate Code tool in the designer allows you to open an existing template and, with a click of a button, automatically create a window with the code needed to run your current template with all data sources and variables. Simply copy this code and paste into your application's code in the appropriate place. You now have Windward integrated into your application.

    You only need to do this once. You do not do this for each template. Instead, where it has explicit files for the template and output, change that to parameters you pass to this code. Same for the parameters passed to Windward. This example uses explicit values to show you what to substitute in where.

    Pivot Tables Adjusted in Output

    Any pivot tables in an XLSX template are carried over to the XLSX output. The ranges in the pivot ranges are adjusted to match the generated output. So your final XLSX will have pivot tables set as expected in the generated file.

    This makes creating an XLSX workbook with pivot tables trivial.

    Imported Template Can be Set to Match the Parent Styles

    In an imported sub-template, if its properties for a style (ex. Normal) differ from the parent template's properties for the style, the use in the sub-template can be set to either use the properties in the sub-template, or the properties in the parent.

    You set to retain when you don't want the child template's styling to change when imported. You set to use the parent when you want the styling of the imported template to match the styling in the parent.

    Any explicit styling is always retained. This only impacts styling set by styles.

    Tags can be Placed in Text Boxes

    Tags can be placed in text boxes. Including linked text boxes. This gives you the ability to set the text in a textbox from your data.

    Tags can be Placed in Shapes & Smart Art

    Tags can be placed in shapes & smart art. This gives you the ability to set the text in a shape from your data.

    HTML Output Supports Embedded Images

    When generating HTML output, the engine can either write bitmaps as distinct files the generate HTML references, or it can embed the images in the HTML providing a single file for the output.

    Footnotes & Endnotes can Have Tags

    You can place tags in pretty much any part of a template, including in footnotes & endnotes.

    Document Locking Supported in DOCX & XLSX

    Any parts of a DOCX or XLSX (PowerPoint does not support this) file that are locked in the template, will be locked the same in the output.

    Specify Font Substitution

    If a font used in the template does not exist on the server generating a report, the font to substitute can be specified.
    In addition, if a glyph to be rendered does not exist in the font specified, you can specify the replacement font. This can be set distinctly for European, Bi-Directional, and Far East fonts.

    Process Multiple Datasources Simultaneously

    Windward enables you to build a document by applying multiple datasources to the template simultaneously. When Windward is merging the data into a template, it process the template handling each tag in order, and each tags can pull from different datasources. (As opposed to processing all of one datasource, then processing the next.)

    This allows the select in a tag to use data from another datasource in its select. For example, if you are pulling customer information from one datasource, you can then pull data from the sales datasource using the customer ID of the customer presently processing to pull the sales information for that customer. Additional details ...

    David Thielen

    President/CEO at Windward Studios

    From his early years as a Senior Developer at Microsoft, to legendary designer of the popular Enemy Nations strategy game, to reporting and document generation guru, Dave has never lost his passion for building superb software and teams.

    david@windward.nethttps://www.linkedin.com/in/davethielen/
    This blog was written by:_
    David Thielen

    For over 10 years, Windward has lead the industry with our world-class document generation platform that creates visually stunning, data-powered documents designed exactly the way users want and are created in a fraction of the time and cost compared to existing solutions. Proudly located in Boulder, Colorado, Windward Studios is the premier solution for developers and business users adding reporting and document generation capabilities to their applications in over 70 countries around the world.

    © 2019 Windward Studios Inc.

    Contact

    Got questions about reporting and document generation? We've got answers—let's connect!
    Send a note
    messaging, phone, or email contact optionsclose out button