Unicode

The Haskell type Char represents Unicode code points. I was struggling with the output of certain Unicode characters by Haskell code on Windows Terminal, under Windows 11. Specifically:

either did not work at all or, when the OEM code page was changed in PowerShell Core 7.2.1 from the default 437 to 65001 (UTF-8) (using command chcp 65001), output ΓëúΓëúΓëúΓëú (in hex CE 93 C3 AB C3 BA, repeated).

The UTF-8 encoding of U+2263 is E2 89 A3. In the 437 code page, that sequence of bytes corresponds to the characters (in Unicode) U+393, U+EB and U+FA. The UTF-8 encoding of U+393, U+EB and U+FA is CE 93 C3 AB C3 BA. So, what was being output was, apparently, UnicodeToUFT-8( CP437ToUnicode( UncodeToUTF-8( ‘≣’ ) ) ).

Solution

The solution was, first, to choose Change system locale... from the Administrative tab of the Region dialog, accessed from Administative language settings under Time & language > Language & region.

Second, to check the Beta: Use Unicode UTF-8 for worldwide language support on the resulting Region Settings dialog.

This is said to set the values of keys ACP, MACCP, and OEMCP in the Registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage to 65001 (type REG_SZ). The keys are said to indicate ACP – the default ‘ANSI’ code page, MACCP – the default Macintosh code page, and OEMCP the default OEM code page. Console windows use the OEM code page. Legacy, non-Unicode GUI-subsystem applications use the ‘ANSI’ code page.