Language Tag Filtering and Lookup

The Java Programming language contains internationalization support for language tags, language tag filtering, and language tag lookup. These features are specified by IETF BCP 47 , which contains RFC 5646 "Tags for Identifying Languages" and RFC 4647 "Matching of Language Tags." This lesson describes how this support is provided in the JDK.

What Are Language Tags?

Language tags are specially formatted strings that provide information about a particular language. A language tag might be something simple (such as "en" for English), something complex (such as "zh-cmn-Hans-CN" for Chinese, Mandarin, Simplified script, as used in China), or something in between (such as "sr-Latn", for Serbian written using Latin script). Language tags consist of "subtags" separated by hyphens; this terminology is used throughout the API documentation.

The java.util.Locale class provides support for language tags. A Locale contains several different fields: language (such as "en" for English, or "ja" for Japanese), script (such as "Latn" for Latin or "Cyrl" for Cyrillic), country (such as "US" for United States or "FR" for France), variant (which indicates some variant of a locale), and extensions (which provides a map of single character keys to String values, indicating extensions apart from language identification). To create a Locale object from a language tag String, invoke Locale.forLanguageTag(String) , passing in the language tag as its only argument. Doing so creates and returns a new Locale object for use in your application.

Example 1:

package languagetagdemo;

import java.util.Locale;

public class LanguageTagDemo {
     public static void main(String[] args) {
         Locale l = Locale.forLanguageTag("en-US");
     }
}

Note that the Locale API only requires that your language tag be syntactically well-formed. It does not perform any extra validation (such as checking to see if the tag is registered in the IANA Language Subtag Registry).

What Are Language Ranges?

Language ranges (represented by class java.util.Locale.LanguageRange ) identify sets of language tags that share specific attributes. Language ranges are classified as either basic or extended, and are similar to language tags in that they consist of subtags separated by hyphens. Examples of basic language ranges include "en" (English), "ja-JP" (Japanese, Japan), and "*" (a special language range which matches any language tag). Examples of extended language ranges include "*-CH" (any language, Switzerland), "es-*" (Spanish, any regions), and "zh-Hant-*" (Traditional Chinese, any region).

Furthermore, language ranges may be stored in Language Priority Lists, which enable users to prioritize their language preferences in a weighted list. Language Priority Lists are expressed by placing LanguageRange objects into a java.util.List , which can then be passed to the Locale methods that accept a List of LanguageRange objects.

Creating a Language Range

The Locale.LanguageRange class provides two different constructors for creating language ranges:

  • public Locale.LanguageRange(String range)
  • public Locale.LanguageRange(String range, double weight)

The only difference between them is that the second version allows a weight to be specified; this weight will be considered if the range is placed into a Language Priority List.

Locale.LanguageRange also specifies some constants to be used with these constructors:

  • public static final double MAX_WEIGHT
  • public static final double MIN_WEIGHT

The MAX_WEIGHT constant holds a value of 1.0, which indicates that it is a good fit for the user. The MIN_WEIGHT constant holds a value of 0.0, indicating that it is not.

Example 2:

package languagetagdemo;

import java.util.Locale;

public class LanguageTagDemo {

     public static void main(String[] args) {
         // Create Locale
         Locale l = Locale.forLanguageTag("en-US");

         // Define Some LanguageRange Objects
         Locale.LanguageRange range1 = new Locale.LanguageRange("en-US",Locale.LanguageRange.MAX_WEIGHT);
         Locale.LanguageRange range2 = new Locale.LanguageRange("en-GB*",0.5);
         Locale.LanguageRange range3 = new Locale.LanguageRange("fr-FR",Locale.LanguageRange.MIN_WEIGHT);
     }
}

Example 2 creates three language ranges: English (United States), English (Great Britain), and French (France). These ranges are weighted to express the user's preferences, in order from most preferred to least preferred.

Creating a Language Priority List

You can create a Language Priority List from a list of language ranges by using the LanguageRange.parse(String) method. This method accepts a list of comma-separated language ranges, performs a syntactic check for each language range in the given ranges, and then returns the newly created Language Priority List.

For detailed information about the required format of the "ranges" parameter, see the API specification for this method.

Example 3:

package languagetagdemo;

import java.util.Locale;

import java.util.List;

public class LanguageTagDemo {

    public static void main(String[] args) {

        // Create Locale

        Locale l = Locale.forLanguageTag("en-US");

        // Create a Language Priority List

        String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";

        List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges)

    }
}

Example 3 creates the same three language ranges as Example 2, but stores them in a String object, which is passed to the parse(String) method. The returned List of LanguageRange objects is the Language Priority List.

Filtering Language Tags

Language tag filtering is the process of matching a set of language tags against a user's Language Priority List. The result of filtering will be a complete list of all matching results. The Locale class defines two filter methods that return a list of Locale objects. Their signatures are as follows:

In both methods, the first argument specifies the user's Language Priority List as described in the previous section.

The second argument specifies a Collection of Locale objects to match against. The match itself will take place according to the rules specified by RFC 4647.

The third argument (if provided) specifies the "filtering mode" to use. The Locale.FilteringMode enum provides a number of different values to choose from, such as AUTOSELECT_FILTERING (for basic language range filtering) or EXTENDED_FILTERING (for extended language range filtering).

Example 4 provides a demonstration of language tag filtering.

Example 4:

package languagetagdemo;

import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;

public class LanguageTagDemo {

    public static void main(String[] args) {

        // Create a collection of Locale objects to filter
        Collection<Locale> locales = new ArrayList<>();
        locales.add(Locale.forLanguageTag("en-GB"));
        locales.add(Locale.forLanguageTag("ja"));
        locales.add(Locale.forLanguageTag("zh-cmn-Hans-CN"));
        locales.add(Locale.forLanguageTag("en-US"));

        // Express the user's preferences with a Language Priority List
        String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
        List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);

        // Now filter the Locale objects, returning any matches
        List<Locale> results = Locale.filter(languageRanges,locales);

        // Print out the matches
        for(Locale l : results){
        System.out.println(l.toString());
        }
    }
}

The output of this program is:

en_US
en_GB

This returned list is ordered according to the weights specified in the user's Language Priority List.

The Locale class also defines filterTags methods for filtering language tags as String objects.

The method signatures are as follows:

Example 5 provides the same search as Example 4, but uses String objects instead of Locale objects.

Example 5:

package languagetagdemo;

import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;

public class LanguageTagDemo {

    public static void main(String[] args) {

        // Create a collection of String objects to match against
        Collection<String> tags = new ArrayList<>();
        tags.add("en-GB");
        tags.add("ja");
        tags.add("zh-cmn-Hans-CN");
        tags.add("en-US");

        // Express user's preferences with a Language Priority List
        String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
        List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);

        // Now search the locales for the best match
        List<String> results = Locale.filterTags(languageRanges,tags);

        // Print out the matches
        for(String s : results){
            System.out.println(s);
        }
    }
} 

As before, the search will match and return "en-US" and "en-GB" (in that order).

Performing Language Tag Lookup

In contrast to language tag filtering, language tag lookup is the process of matching language ranges to sets of language tags and returning the one language tag that best matches the range. RFC4647 states that: "Lookup produces the single result that best matches the user's preferences from the list of available tags, so it is useful in cases in which a single item is required (and for which only a single item can be returned). For example, if a process were to insert a human-readable error message into a protocol header, it might select the text based on the user's language priority list. Since the process can return only one item, it is forced to choose a single item and it has to return some item, even if none of the content's language tags match the language priority list supplied by the user."

Example 6:

package languagetagdemo;

import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;

public class LanguageTagDemo {

    public static void main(String[] args) {

        // Create a collection of Locale objects to search
        Collection<Locale> locales = new ArrayList<>();
        locales.add(Locale.forLanguageTag("en-GB"));
        locales.add(Locale.forLanguageTag("ja"));
        locales.add(Locale.forLanguageTag("zh-cmn-Hans-CN"));
        locales.add(Locale.forLanguageTag("en-US"));

        // Express the user's preferences with a Language Priority List
        String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
        List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);

        // Find the BEST match, and return just one result
        Locale result = Locale.lookup(languageRanges,locales);
        System.out.println(result.toString());
    }
}

In contrast to the filtering examples, the lookup demo in Example 6 returns the one object that is the best match (en-US in this case). For completenes, Example 7 shows how to perform the same lookup using String objects.

Example 7:

package languagetagdemo;

import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;

public class LanguageTagDemo {

    public static void main(String[] args) {
        // Create a collection of String objects to match against
        Collection<String> tags = new ArrayList<>();
        tags.add("en-GB");
        tags.add("ja");
        tags.add("zh-cmn-Hans-CN");
        tags.add("en-US");

        // Express user's preferences with a Language Priority List
        String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
        List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);

        // Find the BEST match, and return just one result
        String result = Locale.lookupTag(languageRanges, tags);
        System.out.println(result);
    }
} 

This example returns the single object that best matches the user's Language Priority List.