BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Going Global: a Deep Dive to Build an Internationalization Framework

Going Global: a Deep Dive to Build an Internationalization Framework

Bookmarks

Key Takeaways

  • Internationalization (i18n) and localization are critical processes in web development that ensure software can be adapted for different languages and regions and the actual adaptation of the software to meet these specific requirements.
  • Though JavaScript-focused i18n libraries (like i18next, react-intl, and react-i18next) are dominant tools in the field, aiding developers in efficiently handling translations and locale-specific configurations, they are only available for Javascript-based web applications. There is a need for a language-agnostic framework for internationalization.
  • JSON is a widely-accepted format for storing translations and locale-specific configurations, allowing for easy integration and dynamic content replacement in various applications irrespective of the language and framework used.
  • Content Delivery Networks (CDN) can be strategically used to efficiently serve locale-specific configuration files, mitigating potential downsides of loading large configurations.
  • Building and integrating a custom internationalization framework with databases or data storage solutions enables dynamic and context-aware translations, enhancing the user experience for different regions and languages.

Dipping your toes into the vast ocean of web development? You’ll soon realize that the web isn’t just for English speakers -- it’s global. Before you’re swamped with complaints from a user in France staring at a confusing English-only error message, let’s talk about internationalization (often abbreviated as i18n) and localization.

What’s the i18n Buzz About?

Imagine a world where your software speaks fluently to everyone, irrespective of their native tongue. That’s what internationalization and localization achieve. While brushing it off is tempting, remember that localizing your app isn’t just about translating text. It’s about offering a tailored experience that resonates with your user’s culture, region, and language preferences.

However, a snag awaits. Dive into the tool chest of i18n libraries, and you’ll notice a dominance of JavaScript-focused solutions, particularly those orbiting React (like i18next, react-intl, and react-i18next).

Venture outside this JavaScript universe, and the choices start thinning out. More so, these readily available tools often wear a one-size-fits-all tag, lacking the finesse to cater to unique use cases.

But fret not! If the shoe doesn’t fit, why not craft one yourself? Stick around, and we’ll guide you on building an internationalization framework from scratch -- a solution that’s tailored to your app and versatile across languages and frameworks.

Ready to give your application a global passport? Let’s embark on this journey.

The Basic Approach

One straightforward way to grasp the essence of internationalization is by employing a function that fetches messages based on the user’s locale. Below is an example crafted in Java, which offers a basic yet effective glimpse into the process:

public class InternationalizationExample {

    public static void main(String[] args) {
        System.out.println(getWelcomeMessage(getUserLocale()));
    }

    public static String getWelcomeMessage(String locale) {
        switch (locale) {
            case "en_US":
                return "Hello, World!";
            case "fr_FR":
                return "Bonjour le Monde!";
            case "es_ES":
                return "Hola Mundo!";
            default:
                return "Hello, World!";
        }
    }

    public static String getUserLocale() {
        // This is a placeholder method. In a real-world scenario,
        // you'd fetch the user's locale from their settings or system configuration.
        return "en_US";  // This is just an example.
    }
}

In the example above, the getWelcomeMessage function returns a welcome message in the language specified by the locale. The locale is determined by the getUserLocale method. This approach, though basic, showcases the principle of serving content based on user-specific locales.

However, as we move forward, we’ll dive into more advanced techniques and see why this basic approach might not be scalable or efficient for larger applications.

Pros:

  • Extensive Coverage -- Given that all translations are embedded within the code, you can potentially cater to many languages without worrying about external dependencies or missing translations.
  • No Network Calls -- Translations are fetched directly from the code, eliminating the need for any network overhead or latency associated with fetching translations from an external source.
  • Easy Code Search -- Since all translations are part of the source code, searching for specific translations or troubleshooting related issues becomes straightforward.
  • Readability -- Developers can instantly understand the flow and the logic behind choosing a particular translation, simplifying debugging and maintenance.
  • Reduced External Dependencies -- There’s no reliance on external translation services or databases, which means one less point of failure in your application.

Cons:

  • Updates Require New Versions -- In the context of mobile apps or standalone applications, adding a new language or tweaking existing translations would necessitate users to download and update to the latest version of the app.
  • Redundant Code -- As the number of supported languages grows, the switch or conditional statements would grow proportionally, leading to repetitive and bloated code.
  • Merge Conflicts -- With multiple developers possibly working on various language additions or modifications, there’s an increased risk of merge conflicts in version control systems.
  • Maintenance Challenges -- Over time, as the application scales and supports more locales, managing and updating translations directly in the code becomes cumbersome and error-prone.
  • Limited Flexibility -- Adding features like pluralization, context-specific translations, or dynamically fetched translations with such a static approach is hard.
  • Performance Overhead -- For high-scale applications, loading large chunks of translation data when only a tiny fraction is used can strain resources, leading to inefficiencies.

Config-Based Internationalization

Building on the previous approach, we aim to retain its advantages and simultaneously address its shortcomings. To accomplish this, we’ll transition from hard-coded string values in the codebase to a config-based setup. We’ll utilize separate configuration files for each locale, encoded in JSON format. This modular approach simplifies the addition or modification of translations without making code changes.

Here’s how a configuration might look for the English and Spanish locales:

Filename: en.json

{
    "welcome_message": "Hello, World"
}
Filename: es.json
{
    "welcome_message": "Hola, Mundo"
}

Implementation in Java:

First, we need a way to read the JSON files. This often involves utilizing a library like Jackson or GSON. For the sake of this example, we’ll use Jackson.

import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.io.IOException;
import java.util.Map;

public class Internationalization {

    private static final String CONFIG_PATH = "/path_to_configs/";
    private Map<String, String> translations;

    public Internationalization(String locale) throws IOException {
        ObjectMapper mapper = new ObjectMapper();
        translations = mapper.readValue(new File(CONFIG_PATH + locale + ".json"), Map.class);
    }

    public String getTranslation(String key) {
        return translations.getOrDefault(key, "Key not found!");
    }
}

public static class Program {

    public static void main(String[] args) throws IOException {
        Internationalization i18n = new Internationalization(getUserLocale());
        System.out.println(i18n.getTranslation("welcome_message"));
    }

    private static String getUserLocale() {
        // This method should be implemented to fetch the user's locale.
        // For now, let's just return "en" for simplicity.
        return "en";
    }
}

The Internationalization class reads the relevant JSON configuration in the above code based on the provided locale when instantiated. The getTranslation method fetches the desired translated string using the identifier.

Pros:

  • Retains all the benefits of the previous approach -- It offers extensive coverage, no network calls for translations once loaded, and the code remains easily searchable and readable.
  • Dynamic Loading -- Translations can be loaded dynamically based on the user’s locale. Only necessary translations are loaded, leading to potential performance benefits.
  • Scalability -- It’s easier to add a new language. Simply add a new configuration file for that locale, and the application can handle it without any code changes.
  • Cleaner Code -- The logic is separated from the translations, leading to cleaner, more maintainable code.
  • Centralized Management -- All translations are in centralized files, making it easier to manage, review, and update. This approach provides a more scalable and cleaner way to handle internationalization, especially for larger applications.

Cons:

  • Potential for Large Config Files -- As the application grows and supports multiple languages, the size of these config files can become quite large. This can introduce a lag in the initial loading of the application, especially if the config is loaded upfront.

Fetching Config from a CDN

One way to mitigate the downside of potentially large config files is to host them on a Content Delivery Network (CDN). By doing so, the application can load only the necessary config file based on the user’s locale. This ensures that the application remains fast and reduces the amount of unnecessary data the user has to download. As the user switches locales or detects a different locale, the relevant config can be fetched from the CDN as required. This provides an optimal balance between speed and flexibility in a high-scale application. For simplicity, let’s consider you’re using a basic HTTP library to fetch the config file. We’ll use the fictional HttpUtil library in this Java example:

import java.util.Map;
import org.json.JSONObject;

public class InternationalizationService {

    private static final String CDN_BASE_URL = "https://cdn.example.com/locales/";

    public String getTranslatedString(String key) {
        String locale = getUserLocale();
        String configContent = fetchConfigFromCDN(locale);
        JSONObject configJson = new JSONObject(configContent);
        return configJson.optString(key, "Translation not found");
    }

    private String fetchConfigFromCDN(String locale) {
        String url = CDN_BASE_URL + locale + ".json";
        return HttpUtil.get(url);  // Assuming this method fetches content from a given URL
    }

    private String getUserLocale() {
        // Implement method to get the user's locale
        // This can be fetched from user preferences, system settings, etc.
        return "en";  // Defaulting to English for this example
    }
}

Note: The above code is a simplified example and may require error handling, caching mechanisms, and other optimizations in a real-world scenario.

The idea here is to fetch the necessary config file based on the user’s locale directly from the CDN. The user’s locale determines the URL of the config file, and once fetched, the config is parsed to get the required translation. If the key isn’t found, a default message is returned. The benefit of this approach is that the application only loads the necessary translations, ensuring optimal performance.

Pros:

  • Inherits all advantages of the previous approach.
  • Easy to organize and add translations for new locales.
  • Efficient loading due to fetching only necessary translations.

Cons:

  • Huge file size of the config might slow the application initially.
  • Strings must be static. Dynamic strings or strings that require runtime computation aren’t supported directly. This can be a limitation if you need to insert dynamic data within your translations.
  • Dependency on external service (CDN). If the CDN fails or has issues, the application’s ability to fetch translations.

However, to address the cons: The first can be mitigated by storing the config file on a CDN and loading it as required. The second can be managed by using placeholders in the static strings and replacing them at runtime based on context. The third would require a robust error-handling mechanism and potentially some fallback strategies.

Dynamic String Handling

A more flexible solution is required for situations where parts of the translation string are dynamic. Let’s take Facebook as a real-life example. In News Feed, you would have seen custom strings to represent the "Likes" for each post. If there is only one like to a post, you may see the string "John likes your post." If there are two likes, you may see "John and David like your post.". If there are more than two likes, you may see "John, David and 100 others like your post." In this use case, there are several customizations required. The verbs "like" and "likes" are used based on the number of people who liked the post. How is this done?

Consider the example: "John, David and 100 other people recently reacted to your post." Here, "David," "John," "100," "people," and "reacted" are dynamic elements.

Let’s break this down:

  • "David" and "John" could be user names fetched from some user-related methods or databases.
  • "100" could be the total number of people reacting on a post excluding David and John, fetched from some post-related methods or databases.
  • "people" could be the plural form of the noun person when referring to a collective group.
  • "reacted" could be used when the user reacts with the icon’s heart or care or anger to a post instead of liking it.

One way to accommodate such dynamic content is to use placeholders in our configuration files and replace them at runtime based on context.

Here’s a Java example:

Configuration File (for English locale):

{
      oneUserAction: {0} {1} your post,
      twoUserAction: {0} and {1} {2} your post,
      multiUserAction: {0}, {1} and {2} other {3} recently {4} to your post,
      people: people,
      likeSingular: likes,
      likePlural: like,
}

Configuration File (for French locale):

{
      oneUserAction: {0} {1} votre publication,
      twoUserAction: {0} et {1} {2} votre publication,
      multiUserAction: {0}, {1} et {2} autres {3} ont récemment {4} à votre publication,
      people: personnes,
      likeSingular: aime,
      likePlural: aiment,
}

Java Implementation:

import java.util.Locale;
import java.util.ResourceBundle;

public class InternationalizationExample {

    public static void main(String[] args) {
        // Examples
        System.out.println(createMessage("David", null, 1, new Locale("en", "US"))); // One user
        System.out.println(createMessage("David", "John", 2, new Locale("en", "US"))); // Two users
        System.out.println(createMessage("David", "John", 100, new Locale("en", "US"))); // Multiple users

        // French examples
        System.out.println(createMessage("David", null, 1, new Locale("fr", "FR"))); // One user
        System.out.println(createMessage("David", "John", 2, new Locale("fr", "FR"))); // Two users
        System.out.println(createMessage("David", "John", 100, new Locale("fr", "FR"))); // Multiple users
    }

    private static String createMessage(String user1, String user2, int count, Locale locale) {
        // Load the appropriate resource bundle
        ResourceBundle messages = ResourceBundle.getBundle("MessagesBundle", locale);    

        if (count == 0) {
            return ""; // No likes received
        } else if (count == 1) {
            return String.format(
                  messages.getString("oneUserAction"),
                  user1,
                  messages.getString("likeSingular")
            ); // For one like, returns "David likes your post"
        } else if (count == 2) {
            return String.format(
                  messages.getString("twoUserAction"),
                  user1,
                  user2,
                  messages.getString("likePlural")
            ); // For two likes, returns "David and John like your post"
        } else {
            return String.format(
                  messages.getString("multiUserAction"),
                  user1,
                  user2,
                  count,
                  messages.getString("people"),
                  messages.getString("likePlural")
                  ); // For more than two likes, returns "David, John and 100 other people like your post"
        }
    }
}

Conclusion

Developing an effective internationalization (i18n) and localization (l10n) framework is crucial for software applications, regardless of size. This approach ensures your application resonates with users in their native language and cultural context. While string translation is a critical aspect of i18n and l10n, it represents only one facet of the broader challenge of globalizing software.

Effective localization goes beyond mere translation, addressing other critical aspects such as writing direction, which varies in languages like Arabic (right-to-left) and text length or size, where languages like Tamil may feature longer words than English. By meticulously customizing these strategies to meet specific localization needs, you can deliver your software’s truly global and culturally sensitive user experience.

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Not a deep dive article

    by Sheikh Sajid,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This is not a deep dive article. There are far more things that need to be considered like date and time format, calendars, daylight savings, currency, units and so on.
    Also Java is used as an example, however, Java has far better standard libraries for i18n than the hand crafted code mentioned in the article.
    Refer docs.oracle.com/en/java/javase/21/intl/internat...

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT