Charset issues

Page Contents

FreeMarker, as most Java applications, works with "UNICODE text" (UTF-16). Nonetheless, there are situations when it must deal with charsets, because it has to exchange data with the outer world that may uses various other charsets.

The charset of the input

When FreeMarker has to load a template file (or an unparsed text file), then it must know the charset of the file, since files are just raw byte sequences. You can use the encoding setting to specify the charset. This setting takes effect only when FreeMarker loads a template (parsed or unparsed) with the getTemplate method of Configuration. Note that the include directive uses this method internally, so the value of the encoding setting is significant for an already loaded template if the template contains include directive call.

The getter and setter method of the encoding setting is special in the first (configuration) layer. The getter method guesses the return value based on a Locale passed as parameter; it looks up the encoding in a table that maps locales to encodings (called encoding map), and if the locale was not found there, it returns the default encoding. You can fill the encoding map with the setEncoding(Locale locale, String encoding) method of the configuration; the encoding map is initially empty. The default encoding is initially the value of the file.encoding system property, but you always should set a default default with the setDefaultEncoding method, rather than relying on that. For new projects, a popular default encoding is utf-8.

You can give the charset directly by overriding the encoding setting in the template layer or runtime environment layer (When you specify an encoding as the parameter of getTemplate method, you override the encoding setting in the template layer.). If you don't override it, the effective value will be what the configuration.getEncoding(Locale) method returns for the effective value of the locale setting.

Also, instead of relying on this charset guessing mechanism, you can specify the charset of the template in the template file itself, with the ftl directive, like <#ftl encoding="utf-8">.

Note that the charset of the template is independent from the charset of the output that the tempalte generates (unless the enclosing software deliberately sets the output charset to the same as the template charset).

The charset of the output

Note:

The output_encoding setting/variable and the url built-in is available since FreeMarker 2.3.1. It doesn't exist in 2.3.

In principle FreeMarker does not deal with the charset of the output, since it writes the output to a java.io.Writer. Since the Writer is made by the software that encapsulates FreeMarker (such as a Web application framework), the output charset is controlled by the encapsulating software. Still, FreeMarker has a setting called output_encoding (starting from FreeMarker version 2.3.1). The enclosing software should set this setting (to the charset that the Writer uses), to inform FreeMarker what charset is used for the output (otherwise FreeMarker can't find it out). Some features, such as the url built-in, and the output_encoding special variable utilize this information. Thus, if the enclosing software doesn't set this setting then FreeMarker features that need to know the output charset can't be used.

If you write software that will use FreeMarker, you may wonder what output charset should you choose. Of course it depends on the consumer of the FreeMarker output, but if the consumer is flexible regarding this question, then the common practice is either using the charset of the template file for the output, or using UTF-8. Using UTF-8 is usually a better practice, because arbitrary text may comes from the data-model, which then possibly contains characters that couldn't be encoded with the charset of the template.

FreeMarker settings can be set for each individual template processing if you use Template.createProcessingEnvironment(...) plus Environment.process(...) instead of Template.process(...). Thus, you can set the output_encoding setting for each template execution independently:

Writer w = new OutputStreamWriter(out, outputCharset);
Environment env = template.createProcessingEnvironment(dataModel, w);
env.setOutputEncoding(outputCharset);
env.process();

Docs

Docs4dev

Title here

Charset issues

The charset of the input

The charset of the output