Locale hero

I was interested in the internationalisation (‘i18n’) of Haskell programs. The Haskell wiki has a page on the subject, referring variously to: using the constructors of a Haskell type to represent messages, GNU’s gettext, or the Grammatical Framework programming language. A starting point was identifying the user’s locale. As is often the case, that was not straight forward on Windows.

Locale names

The POSIX.1-2017 standard (The Open Group Technical Standard Base Specifications, Issue 7) states (in the Base Definitions, Chapter 8 (Environment Variables)) that the environment variables supporting i18n can take values: "C"; "POSIX"; of the form language[_territory][.codeset] or (in the case of certain variables) langage[_territory][.codeset][@modifier], where all of language, territory, codeset and modifier are implementation defined; or other implementation-defined values. "C" and "POSIX" are synonyms.

In the case of Linux, language is an ISO 639 language code, territory is an ISO 3166 country code, and codeset is a character set or encoding identifier like ISO-8859-1 or UTF-8.

Microsoft’s documentation of the Win32 API (from Windows Vista) specifies locale names that follow the conventions of the language tagging conventions of RFC 4646 (which has subsequently been replaced by RFC 5646). The most complex patterns are <language>-<Script>-<REGION>\_<sort order> or, for replacement locales, <language>-<Script>-<REGION>-x-<custom>.

Microsoft’s documentation of the Universal C runtime library specifies other locale names for use with C function setlocale (see further below) and other functions.

C: setlocale and localeconv

The C programming language standard defines a function setlocale that selects the appropriate portion of the program’s locale as specified by its arguments, category and locale. At program startup, the equivalent of setlocale(LC_ALL, "C"); is executed. Other than values "C" and "", the values of the locale argument are implementation-defined strings. The C18 standard references ISO/IEC 9945-2 (now replaced by ISO/IEC/IEEE 9945:2009) as specifying locale formats.

The C standard also defines a function localeconv that provides access to LC_MONETARY and LC_NUMERIC information.

POSIX: nl_langinfo

The POSIX.1-2017 standard defines a C function nl_langinfo that provides access to more information than localeconv and in a more focussed way.

Haskell: setlocale and env-locale

The Haskell package setlocale exports a binding to the C setlocale function.

The Haskell package env-locale export functions that make use of bindings to the POSIX nl_langinfo C function.

Unix: locale

On Unix-like operating systems, the command locale outputs information about locales. The version of MSYS2 provided by stack includes locale, the version provided by GHC does not. For example:

Haskell: system-locale and current-locale

The Haskell packages system-locale and current-locale assume the availability of the locale command and parse some of its output. Both packages depend on the old-locale package, which has been overtaken by the time package.

Win32 API: GetUserDefaultLocaleName and GetLocaleInfoEx

Win32 API function GetUserDefaultLocaleName retrieves the user default locale name and function GetLocaleInfoEx retrieves information about a locale specified by a name.

Haskell: Win32

The Hackage package Win32-2.9.0.0 provides module System.Win32.NLS, which exports getUserDefaultLCID (a binding to the legacy GetUserDefaultLCID) but does not export a binding to GetLocaleInfoEx or the legacy GetLocaleInfo.

Consequently, I created a fork of Win32-2.9.0.0 and implemented bindings to GetLocaleInfoEx and other functions that had replaced legacy functions already exported by the package.

The implementation was not straight forward. First, the version of MSYS2 used by GHC 8.8.4 includes a header file, _mingw.h, that defines _WIN32_WINNT as 0x502 (Windows Server 2003) if it is undefined. Header file sdkddkver.h then defines WINVER as _WIN32_WINNT, if the latter is defined and the former is not. Windows Server 2003 pre-dates Windows Vista (0x0600). GHC dropped support for Windows XP from GHC 8.0.1 in May 2016. So, it was necessary to force WINVER and _WIN32_WINNT to be at least 0x0600.

Second, Win32-2.9.0.0 has a lower bound on its dependency on the base package of base-4.5.0.0 (GHC 7.4.1, from February 2012) and the repository uses continuous integration to test with GHC versions from GHC 7.6.3. The version of MSYS2 used by GHC before GHC 7.10.1 (March 2015) includes a header file winnls.h that excludes certain locale information constants or locale map flag constants introduced with Windows Vista. Also, Control.Applicative functions were not exported from the Prelude before GHC 7.10.1.

GetLocaleInfoEx, and some of the other Win32 API functions, take a pointer to an output buffer (LPWSTR) and its size (int) among their arguments and return an int. If the size is 0, the buffer is not used and the return is the size of the buffer required. If the buffer is too small, the function fails and returns 0. Module System.Win32.Utils did not export a helper function for that pattern; the existing try was for functions that returned the size of the buffer required, even if the buffer was too small.