How To Set Up Surrogate fonts

 

 

 

 

If you encounter display problem with Unicode characters where a single Unicode character appears as two empty square boxes, you should read this article to find out how to fix it. Specifically, this problem occurs when your Unicode character is greater than U+FFFF (65635 decimal) and your Unicode font can handle only characters that are 64K in value or less.

 

1. What is a surrogate character?

 

Unicode characters can range in scalar values from 0 to over a million. Characters above 64K (greater than U+FFFF or 65635) are called supplementary Unicode characters. The entire range of Unicode characters is divided into 17 blocks of 64K values each. Each block is referred to as a plane and is numbered starting from 0 as follows:

 

 

Note that all characters in Plane 0 can be represented as a single 16-bit value. Characters in other planes are greater than 64K and can be represented as a single 32-bit value (UCS-4) or a pair of two 16-bit values (UTF-16). In the latter representation the pair is commonly called surrogate pair, which consists of a high-order 16-bit surrogate and a low-order 16-bit surrogate.

 

Unfortunately Windows knows and processes only 16-bit Unicode characters by default. When Windows encounters a surrogate pair, it thinks there are two distinct 16-bit characters and simply displays as such. This turns out to be always two empty square boxes. The reason is that each of the 16-bit surrogate piece is a forbidden 16-bit character in the Unicode standards and is represented as an empty square box in Unicode fonts; only the combination of the two surrogate pieces together yields a single Unicode character.

 

To make the long story short:

 

 

2. How to set up Windows to display surrogate characters?

 

According to Microsoft

 

     http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_192r.asp

 

users have to set up Windows registry as follows:

 

          [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]

              SURROGATE=(REG_DWORD)0x00000002

        

[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42]

               IEFixedFontName=(Surrogate Font Face Name)

               IEPropFontName=(Surrogate Font Face Name)

 

Additional information is available at

 

Based on the information obtained from the websites above, WinVNKey provides a user-interface to help users change the registry easily:

 

Setting 1

This setting tells Windows to load Uniscribe, which is an engine to process surrogate characters

 

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]

    SURROGATE=(REG_DWORD)0x00000002

 

Setting 2

This setting specifies the names of fixed and proportional fonts for Internet Explorer to use to display surrogate characters

 

    [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42]

    IEFixedFontName=[Surrogate Font Face Name]

    IEPropFontName=[Surrogate Font Face Name]

 

Setting 3

This setting is for Windows XP systems only and is optional. Basically, this setting specifies the fallback fonts for characters in supplementary planes. You can specify as many planes as you like.

 

     [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback]

     Plane1=(Name for fallback font for characters in Plane 1)

     Plane2=(Name for fallback font for characters in Plane 2)

     ... etc. ...

 

3. How to use WinVNKey to change surrogate registry?

The three registry settings above can be changed by using WinVNKey as follows:

 

Click on Run button ==> Preferences ==> Surrogate Fonts

 

Note there are no OK/Cancel buttons on the "Surrogate Fonts" page. Any change you make to the page will become effective immediately.

 

Enable the checkbox and change surrogate registry to 2 as shown below.  This will force Windows to process surrogate characters.

 

 

This change is enough to display surrogate characters in your applications such as Microsoft Word, etc.  Of course, when you type in surrogate characters, you have to select a Unicode font that support your characters in order to see it.

 

If you want Internet Explorer to use certain fonts when it encounters surrogate characters, you can specify the font names as follows.

 

 

 

Finally, you can specify fallback fonts for Windows XP or later.   

Suppose you are writing a document in Microsoft Word using "Arial Unicode MS" font.  This font does not have any surrogate characters.  If you type  Han/Nom surrogate characters in your document, you will see empty boxes in place of the characters.  In this case, if you tell Windows in advance what font is fallback for Plane 2 characters, Windows will use that font to display the Han/Nom surrogate characters.  In the example below the author uses "Han Nom 3.1B", but you should use the font you have in your system.

 

 

 

After setting up surrogate registry with supplementary fonts, you can test if the system works. You can browse several websites that use supplementary characters and check if you can see them. These sites are listed in

 

http://www.i18nguy.com/surrogates.html

http://www.daouyen.com/NomDoc/CJKVB.htm

 

Specifically,

 

http://www.daouyen.com/NomDoc/CuTranLacDao.htm

http://www.i18nguy.com/unicode/plane1-utf-16.html

http://www.i18nguy.com/unicode-plane1-utf8.html

http://www.i18nguy.com/unicode-example-plane1.html

http://www.i18nguy.com/unicode/unicode-example-intro.html

http://homepage.mac.com/thgewecke/BeyondBMP.html

 

4. Download Surrogate Fonts For Plane 1

 

If you have no fonts for supplementary characters, you can download from the Internet for free. At the present time the author of WinVNKey knows one such font, CODE2001.ZIP (207,345 bytes), which covers a number of characters in Plane 1 (U+10000 through U+1FFFF):

 

http://home.att.net/~jameskass/code2001.htm

 

This font certainly does not contain Han/Nom surrogate characters in Plane 2 (Extension B)

5. Download Han Nom fonts for Planes 0 and 2

 

Plane 2 contains mostly Han Nom surrogate characters in Extension B.   Commercial fonts for Extension B characters are available.  The largest font is perhaps SURSONG.TTF (41MB), which contains about 65K Han/Nom characters both in

the BMP plane and Plane 2.   It is shipped with the Chinese version of Windows and with Microsoft Office Proofing Tools.  You can search on www.google.com for "sursong.ttf" and may be lucky enough to find a site that offers free download of the font.

 

There are a few free Han Nam fonts that are downloadable from the Viet Unicode website:

 

       1. Microsoft font "Arial Unicode MS" (Aruniupd.exe, 14 MB). This is not a Plane 0 font, not a surrogate font.

 

       2. Fonts "HAN NOM A" (not surrogate font) and "HAN NOM B" (surrogate font), which are packaged together in HannomH.zip (27 MB, high resolution) or Hannom.zip (19 MB, low resolution).   "HAN NOM B" is a surrogate font.

 

Font "Arial Unicode MS" contains Han/Nom and Latin-based characters less than 64K in values, i.e., Unicode characters expressible as U+xxxx where there are at most 4 hex digits following the plus sign.

 

Font "HAN NOM A" contains Han/Nom characters less than 64K in values.

 

Font "HAN NOM B" contains Han/Nom surrogate characters in Unicode Extension B.  These characters are greater than 64K in values, i.e., they are expressible as U+xx..xx with 5 or 6 hex digits following the plus sign.

 

It is generally good to download both Aruniupd.exe and HannomH.zip.  Font "Arial Unicode MS" has lots of characters from many languages but lacks all Han/Nom characters in Extension B.  Therefore you need to download HannomH.zip.   Always try the high resolution version HannomH.zip first.  If the installation fails because your Windows does not recognize the high resolution font, you then try the low resolution version Hannom.zip.  You should not install two packages HannomH.zip and Hannom.zip because the latter file will replace the first.

 

Because these font packages are large, if you can afford downloading one package, you should choose HannomH.zip (two fonts HAN NOM A and HAN NOM B).

 

 

 

 

Next Topic:  Reset WinVNKey

Last Topic:  Appearance Preferences