The GB 18030-2022 Standard

History & Overview

China first published the GB 18030 standard in 2000 as GB 18030-2000 (信息技术 信息交换用汉字编码字符集 基本集的扩充 Information technology — Chinese ideograms coded character set for information interchange — Extension for the basic set), and it was revised five years later as GB 18030-2005 (信息技术 中文编码字符集 Information Technology — Chinese coded character set). Note that the title changed. Seventeen years later, GB 18030-2022 was published with the same title.

GB 18030-2022 cover
0x82358F33  U+9FA6 龦
0x82358F34 U+9FA7 龧
0x82358F35 U+9FA8 龨
0x82358F36 U+9FA9 龩
0x82358F37 U+9FAA 龪
0x82358F38 U+9FAB 龫
0x82358F39 U+9FAC 龬
0x82359030 U+9FAD 龭
0x82359031 U+9FAE 龮
0x82359032 U+9FAF 龯
0x82359033 U+9FB0 龰
0x82359034 U+9FB1 龱
0x82359035 U+9FB2 龲
0x82359036 U+9FB3 龳
0x82359135 U+9FBC 龼
0x82359136 U+9FBD 龽
0x82359137 U+9FBE 龾
0x82359138 U+9FBF 龿
0x82359139 U+9FC0 鿀
0x82359230 U+9FC1 鿁
0x82359231 U+9FC2 鿂
0x82359232 U+9FC3 鿃
0x82359233 U+9FC4 鿄
0x82359234 U+9FC5 鿅
0x82359235 U+9FC6 鿆
0x82359236 U+9FC7 鿇
0x82359237 U+9FC8 鿈
0x82359238 U+9FC9 鿉
0x82359239 U+9FCA 鿊
0x82359330 U+9FCB 鿋
0x82359331 U+9FCC 鿌
0x82359332 U+9FCD 鿍
0x82359333 U+9FCE 鿎
0x82359334 U+9FCF 鿏
0x82359335 U+9FD0 鿐
0x82359336 U+9FD1 鿑
0x82359337 U+9FD2 鿒
0x82359338 U+9FD3 鿓
0x82359339 U+9FD4 鿔
0x82359430 U+9FD5 鿕
0x82359431 U+9FD6 鿖
0x82359432 U+9FD7 鿗
0x82359433 U+9FD8 鿘
0x82359434 U+9FD9 鿙
0x82359435 U+9FDA 鿚
0x82359436 U+9FDB 鿛
0x82359437 U+9FDC 鿜
0x82359438 U+9FDD 鿝
0x82359439 U+9FDE 鿞
0x82359530 U+9FDF 鿟
0x82359531 U+9FE0 鿠
0x82359532 U+9FE1 鿡
0x82359533 U+9FE2 鿢
0x82359534 U+9FE3 鿣
0x82359535 U+9FE4 鿤
0x82359536 U+9FE5 鿥
0x82359537 U+9FE6 鿦
0x82359538 U+9FE7 鿧
0x82359539 U+9FE8 鿨
0x82359630 U+9FE9 鿩
0x82359631 U+9FEA 鿪
0x82359632 U+9FEB 鿫
0x82359633 U+9FEC 鿬
0x82359634 U+9FED 鿭
0x82359635 U+9FEE 鿮
0x82359636 U+9FEF 鿯
0x81318132–0x81319934  42     Arabic
0x8430BA32–0x8430FE35 59 Arabic Presentation Forms-A
0x84318730–0x84319530 84 Arabic Presentation Forms-B
0x8132E834–0x8132FD31 193 Tibetan
0x8134D238–0x8134E337 149 Mongolian
0x9034C538–0x9034C730 13 Mongolian Supplement
0x8134F434–0x8134F830 35 Tai Le
0x8134F932–0x81358437 83 New Tai Lue
0x81358B32–0x81359935 127 Tai Tham
0x82359833–0x82369435 1215 Yi Syllables
0x82369535–0x82369A32 48 Lisu
0x81339D36–0x8133B635 69 Hangul Jamo
0x8139A933–0x8139B734 51 Hangul Compatibility Jamo
0x8237CF35–0x8336BE36 3431 Hangul Syllables
0x9232C636–0x9232D635 133 Miao
0x81398B32–0x8139A135 214 Kangxi Radicals
0x8139EE39–0x82358738 6530 CJK Unified Ideographs Extension A
0x82358F33–0x82359636 66 CJK Unified Ideographs (URO)
0x95328236–0x9835F336 42711 CJK Unified Ideographs Extension B
0x9835F738–0x98399E36 4149 CJK Unified Ideographs Extension C
0x98399F38–0x9839B539 222 CJK Unified Ideographs Extension D
0x9839B632–0x9933FE33 5762 CJK Unified Ideographs Extension E
0x99348138–0x9939F730 7473 CJK Unified Ideographs Extension F
0xFE56  U+3447 㑇
0xFE55 U+3473 㑳
0xFE5A U+359E 㖞
0xFE5C U+360E 㘎
0xFE5B U+361A 㘚
0xFE60 U+3918 㤘
0xFE5F U+396E 㥮
0xFE62 U+39CF 㧏
0xFE65 U+39D0 㧐
0xFE63 U+39DF 㧟
0xFE64 U+3A73 㩳
0xFE68 U+3B4E 㭎
0xFE69 U+3C6E 㱮
0xFE6A U+3CE0 㳠
0xFE6F U+4056 䁖
0xFE70 U+415F 䅟
0xFE72 U+4337 䌷
0xFE78 U+43AC 䎬
0xFE77 U+43B1 䎱
0xFE7A U+43DD 䏝
0xFE7B U+44D6 䓖
0xFE7D U+464C 䙌
0xFE7C U+4661 䙡
0xFE80 U+4723 䜣
0xFE81 U+4729 䜩
0xFE82 U+477C 䝼
0xFE83 U+478D 䞍
0xFE85 U+4947 䥇
0xFE86 U+497A 䥺
0xFE87 U+497D 䥽
0xFE88 U+4982 䦂
0xFE89 U+4983 䦃
0xFE8A U+4985 䦅
0xFE8B U+4986 䦆
0xFE8D U+499B 䦛
0xFE8C U+499F 䦟
0xFE8F U+49B6 䦶
0xFE8E U+49B7 䦷
0xFE96 U+4C77 䱷
0xFE93 U+4C9F 䲟
0xFE94 U+4CA0 䲠
0xFE95 U+4CA1 䲡
0xFE97 U+4CA2 䲢
0xFE92 U+4CA3 䲣
0xFE98 U+4D13 䴓
0xFE99 U+4D14 䴔
0xFE9A U+4D15 䴕
0xFE9B U+4D16 䴖
0xFE9C U+4D17 䴗
0xFE9D U+4D18 䴘
0xFE9E U+4D19 䴙
0xFE9F U+4DAE 䶮
GB 18030-2022 code chart excerpt for 0x8233AF32 (U+4548 䕈)
GB 18030-2022 code chart excerpt for 0xA3A4 (U+FFE5 ¥)

No PUA Requirement

Except for 24 characters, all required characters in GB 18030-2005 had equivalent characters in the Unicode Standard. The characters that formerly were required to be represented using PUA (Private Use Area) code points can be classified into two groups. The first group involves mapping changes for 18 characters. The following list provides the two- or four-byte GB 18030 code points, their mappings to the Unicode Standard in GB 18030-2005, their mappings to the Unicode Standard in GB 18030-2022, and the characters, if any, that are displayed in the GB 18030-2022 code charts (on pp 11, 81, 160, and 242):

0xA6D9      U+E78D  U+FE10  ︐
0xA6DA U+E78E U+FE12 ︒
0xA6DB U+E78F U+FE11 ︑
0xA6DC U+E790 U+FE13 ︓
0xA6DD U+E791 U+FE14 ︔
0xA6DE U+E792 U+FE15 ︕
0xA6DF U+E793 U+FE16 ︖
0xA6EC U+E794 U+FE17 ︗
0xA6ED U+E795 U+FE18 ︘
0xA6F3 U+E796 U+FE19 ︙
0xFE59 U+E81E U+9FB4 龴
0xFE61 U+E826 U+9FB5 龵
0xFE66 U+E82B U+9FB6 龶
0xFE67 U+E82C U+9FB7 龷
0xFE6D U+E832 U+9FB8 龸
0xFE7E U+E843 U+9FB9 龹
0xFE90 U+E854 U+9FBA 龺
0xFEA0 U+E864 U+9FBB 龻
0x82359037 U+9FB4 U+E81E (none)
0x82359038 U+9FB5 U+E826 (none)
0x82359039 U+9FB6 U+E82B (none)
0x82359130 U+9FB7 U+E82C (none)
0x82359131 U+9FB8 U+E832 (none)
0x82359132 U+9FB9 U+E843 (none)
0x82359133 U+9FBA U+E854 (none)
0x82359134 U+9FBB U+E864 (none)
0x84318236 U+FE10 U+E78D (none)
0x84318237 U+FE11 U+E78F (none)
0x84318238 U+FE12 U+E78E (none)
0x84318239 U+FE13 U+E790 (none)
0x84318330 U+FE14 U+E791 (none)
0x84318331 U+FE15 U+E792 (none)
0x84318332 U+FE16 U+E793 (none)
0x84318333 U+FE17 U+E794 (none)
0x84318334 U+FE18 U+E795 (none)
0x84318335 U+FE19 U+E796 (none)
0xFE51      U+E816   (none)
0xFE52 U+E817 (none)
0xFE53 U+E818 (none)
0xFE6C U+E831 (none)
0xFE76 U+E83B (none)
0xFE91 U+E855 (none)
0x95329031 U+20087 𠂇
0x95329033 U+20089 𠂉
0x95329730 U+200CC 𠃌
0x9536B937 U+215D7 𡗗
0x9630BA35 U+2298F 𢦏
0x9635B630 U+241FE 𤇾

No CJK Compatibility Ideographs

Some of the required characters, 21 ideographs to be exact, were in the CJK Compatibility Ideographs block. 12 of them are actually CJK Unified Ideographs, which are still required according to GB 18030-2022. However, the nine that are actual CJK Compatibility Ideographs are no longer required, and a character is not displayed in the GB 18030-2022 code charts (on pp 80 and 81). The following list provides the two-byte GB 18030 code points, their mappings to the Unicode Standard in GB 18030-2000, GB 18030-2005, and GB 18030-2022, the ideographs themselves, and their canonically-equivalent CJK Unified Ideographs in parentheses:

0xFD9C  U+F92C 郎 (U+90CE 郎)
0xFD9D U+F979 凉 (U+51C9 凉)
0xFD9E U+F995 秊 (U+79CA 秊)
0xFD9F U+F9E7 裏 (U+88CF 裏)
0xFDA0 U+F9F1 隣 (U+96A3 隣)
0xFE40 U+FA0C 兀 (U+5140 兀)
0xFE41 U+FA0D 嗀 (U+55C0 嗀)
0xFE47 U+FA18 礼 (U+793C 礼)
0xFE49 U+FA20 蘒 (U+8612 蘒)

TGH 2013 Requirement

Incremental additions to the required portion of the GB 18030 standard are to be expected, and the 2022 update is no exception. The first section of this article mentioned the 66 ideographs that are now required for Implementation Level 1.

通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo) cover
0x9532A632  U+20164 𠅤
0x9533AA30 U+20676 𠙶
0x9534CE36 U+20CD0 𠳐
0x9535FE34 U+2139A 𡎚
0x95368C35 U+21413 𡐓
0x9632F737 U+235CB 𣗋
0x9634A937 U+23C97 𣲗
0x9634A938 U+23C98 𣲘
0x9634D133 U+23E23 𣸣
0x96378333 U+249DB 𤧛
0x96379335 U+24A7D 𤩽
0x96379B31 U+24AC9 𤫉
0x9639A936 U+25532 𥔲
0x9639AE34 U+25562 𥕢
0x9639B534 U+255A8 𥖨
0x9731A435 U+25ED7 𥻗
0x9731F837 U+26221 𦈡
0x9732B837 U+2648D 𦒍
0x9732E936 U+26676 𦙶
0x97338538 U+2677C 𦝼
0x9733E930 U+26B5C 𦭜
0x9733FC37 U+26C21 𦰡
0x97388237 U+27FF9 𧿹
0x9738EA36 U+28408 𨐈
0x9739AB30 U+28678 𨙸
0x9739AD39 U+28695 𨚕
0x9739CF30 U+287E0 𨟠
0x9830A833 U+28B49 𨭉
0x9830C137 U+28C47 𨱇
0x9830C235 U+28C4F 𨱏
0x9830C237 U+28C51 𨱑
0x9830C330 U+28C54 𨱔
0x9830FD31 U+28E99 𨺙
0x9834B536 U+29F7E 𩽾
0x9834B631 U+29F83 𩾃
0x9834B730 U+29F8C 𩾌
0x98368F39  U+2A7DD 𪟝
0x9836AC35 U+2A8FB 𪣻
0x9836AF33 U+2A917 𪤗
0x9836CB34 U+2AA30 𪨰
0x9836CC30 U+2AA36 𪨶
0x9836CF34 U+2AA58 𪩘
0x9837D838 U+2AFA2 𪾢
0x98388137 U+2B127 𫄧
0x98388138 U+2B128 𫄨
0x98388333 U+2B137 𫄷
0x98388334 U+2B138 𫄸
0x98389535 U+2B1ED 𫇭
0x9838B130 U+2B300 𫌀
0x9838BA39 U+2B363 𫍣
0x9838BC31 U+2B36F 𫍯
0x9838BC34 U+2B372 𫍲
0x9838BD35 U+2B37D 𫍽
0x9838CB30 U+2B404 𫐄
0x9838CC32 U+2B410 𫐐
0x9838CC35 U+2B413 𫐓
0x9838D433 U+2B461 𫑡
0x9838E137 U+2B4E7 𫓧
0x9838E235 U+2B4EF 𫓯
0x9838E332 U+2B4F6 𫓶
0x9838E335 U+2B4F9 𫓹
0x9838E535 U+2B50D 𫔍
0x9838E536 U+2B50E 𫔎
0x9838E936 U+2B536 𫔶
0x9838F536 U+2B5AE 𫖮
0x9838F537 U+2B5AF 𫖯
0x9838F631 U+2B5B3 𫖳
0x9838FB33 U+2B5E7 𫗧
0x9838FC36 U+2B5F4 𫗴
0x98398236 U+2B61C 𫘜
0x98398237 U+2B61D 𫘝
0x98398336 U+2B626 𫘦
0x98398337 U+2B627 𫘧
0x98398338 U+2B628 𫘨
0x98398430 U+2B62A 𫘪
0x98398432 U+2B62C 𫘬
0x98398E37 U+2B695 𫚕
0x98398E38 U+2B696 𫚖
0x98399131 U+2B6AD 𫚭
0x98399735 U+2B6ED 𫛭
0x9839AA33  U+2B7A9 𫞩
0x9839AD31 U+2B7C5 𫟅
0x9839B034 U+2B7E6 𫟦
0x9839B233 U+2B7F9 𫟹
0x9839B236 U+2B7FC 𫟼
0x9839B336 U+2B806 𫠆
0x9839B430 U+2B80A 𫠊
0x9839B538 U+2B81C 𫠜
0x9839C534  U+2B8B8 𫢸
0x9839FA31 U+2BAC7 𫫇
0x99308B33 U+2BB5F 𫭟
0x99308B36 U+2BB62 𫭢
0x99308E32 U+2BB7C 𫭼
0x99308E39 U+2BB83 𫮃
0x99309E31 U+2BC1B 𫰛
0x9930C039 U+2BD77 𫵷
0x9930C235 U+2BD87 𫶇
0x9930CD37 U+2BDF7 𫷷
0x9930D237 U+2BE29 𫸩
0x99318739 U+2C029 𬀩
0x99318830 U+2C02A 𬀪
0x99319437 U+2C0A9 𬂩
0x99319830 U+2C0CA 𬃊
0x9931B237 U+2C1D5 𬇕
0x9931B331 U+2C1D9 𬇙
0x9931B633 U+2C1F9 𬇹
0x9931C334 U+2C27C 𬉼
0x9931C436 U+2C288 𬊈
0x9931C734 U+2C2A4 𬊤
0x9931D239 U+2C317 𬌗
0x9931D937 U+2C35B 𬍛
0x9931DA33 U+2C361 𬍡
0x9931DA36 U+2C364 𬍤
0x9931F738 U+2C488 𬒈
0x9931F930 U+2C494 𬒔
0x9931F933 U+2C497 𬒗
0x99328C34 U+2C542 𬕂
0x9932A133 U+2C613 𬘓
0x9932A138 U+2C618 𬘘
0x9932A237 U+2C621 𬘡
0x9932A335 U+2C629 𬘩
0x9932A337 U+2C62B 𬘫
0x9932A338 U+2C62C 𬘬
0x9932A339 U+2C62D 𬘭
0x9932A431 U+2C62F 𬘯
0x9932A630 U+2C642 𬙂
0x9932A638 U+2C64A 𬙊
0x9932A639 U+2C64B 𬙋
0x9932BD34 U+2C72C 𬜬
0x9932BD37 U+2C72F 𬜯
0x9932C839 U+2C79F 𬞟
0x9932CC33 U+2C7C1 𬟁
0x9932D233 U+2C7FD 𬟽
0x9932E833 U+2C8D9 𬣙
0x9932E838 U+2C8DE 𬣞
0x9932E931 U+2C8E1 𬣡
0x9932EA39 U+2C8F3 𬣳
0x9932EC39 U+2C907 𬤇
0x9932ED32 U+2C90A 𬤊
0x9932EF31 U+2C91D 𬤝
0x99338830 U+2CA02 𬨂
0x99338932 U+2CA0E 𬨎
0x99339433 U+2CA7D 𬩽
0x99339837 U+2CAA9 𬪩
0x9933A535 U+2CB29 𬬩
0x9933A539 U+2CB2D 𬬭
0x9933A630 U+2CB2E 𬬮
0x9933A633 U+2CB31 𬬱
0x9933A730 U+2CB38 𬬸
0x9933A731 U+2CB39 𬬹
0x9933A733 U+2CB3B 𬬻
0x9933A737 U+2CB3F 𬬿
0x9933A739 U+2CB41 𬭁
0x9933A838 U+2CB4A 𬭊
0x9933A932 U+2CB4E 𬭎
0x9933AA34 U+2CB5A 𬭚
0x9933AA35 U+2CB5B 𬭛
0x9933AB34 U+2CB64 𬭤
0x9933AB39 U+2CB69 𬭩
0x9933AC32 U+2CB6C 𬭬
0x9933AC35 U+2CB6F 𬭯
0x9933AC39 U+2CB73 𬭳
0x9933AD32 U+2CB76 𬭶
0x9933AD34 U+2CB78 𬭸
0x9933AD38 U+2CB7C 𬭼
0x9933B331 U+2CBB1 𬮱
0x9933B435 U+2CBBF 𬮿
0x9933B436 U+2CBC0 𬯀
0x9933B630 U+2CBCE 𬯎
0x9933C336 U+2CC56 𬱖
0x9933C435 U+2CC5F 𬱟
0x9933D335 U+2CCF5 𬳵
0x9933D336 U+2CCF6 𬳶
0x9933D433 U+2CCFD 𬳽
0x9933D435 U+2CCFF 𬳿
0x9933D438 U+2CD02 𬴂
0x9933D439 U+2CD03 𬴃
0x9933D536 U+2CD0A 𬴊
0x9933E235 U+2CD8B 𬶋
0x9933E237 U+2CD8D 𬶍
0x9933E239 U+2CD8F 𬶏
0x9933E330 U+2CD90 𬶐
0x9933E435 U+2CD9F 𬶟
0x9933E436 U+2CDA0 𬶠
0x9933E534 U+2CDA8 𬶨
0x9933E539 U+2CDAD 𬶭
0x9933E630 U+2CDAE 𬶮
0x9933E939 U+2CDD5 𬷕
0x9933F036 U+2CE18 𬸘
0x9933F038 U+2CE1A 𬸚
0x9933F137 U+2CE23 𬸣
0x9933F230 U+2CE26 𬸦
0x9933F234 U+2CE2A 𬸪
0x9933FA36 U+2CE7C 𬹼
0x9933FB38 U+2CE88 𬺈
0x9933FC39 U+2CE93 𬺓
  1. GB 18030 is synchronized with ISO/IEC 10646, not with the Unicode Standard. The contents of GB 18030-2022 are synchronized with ISO/IEC 10646:2017 (aka Fifth Edition), which is equivalent to Unicode Version 11.0. This explains why the range 9FF0..9FFF is not included in Implementation Level 1. Those 16 ideographs were appended to the URO in Unicode Versions 13.0 (13) and 14.0 (3).
  2. New versions of GB 18030 incrementally include ideographs that are appended to the URO, whose precedent has been established by GB 18030-2022. I predicted this, and therefore predict that the same will be true of ideographs appended to Extension A. The key point here is that these are the only two CJK Unified Ideographs blocks that are required in their entirety, and that requirement would logically extend to any ideographs that are appended to them. This would impact Implementation Level 1, meaning critical for font implementations. Both the URO and Extension A blocks are now full, as of Unicode Versions 14.0 and 13.0, respectively.
0x82359637  U+9FF0 鿰
0x82359638 U+9FF1 鿱
0x82359639 U+9FF2 鿲
0x82359730 U+9FF3 鿳
0x82359731 U+9FF4 鿴
0x82359732 U+9FF5 鿵
0x82359733 U+9FF6 鿶
0x82359734 U+9FF7 鿷
0x82359735 U+9FF8 鿸
0x82359736 U+9FF9 鿹
0x82359737 U+9FFA 鿺
0x82359738 U+9FFB 鿻
0x82359739 U+9FFC 鿼
0x82359830  U+9FFD 鿽
0x82359831 U+9FFE 鿾
0x82359832 U+9FFF 鿿
0x82358739  U+4DB6 䶶
0x82358830 U+4DB7 䶷
0x82358831 U+4DB8 䶸
0x82358832 U+4DB9 䶹
0x82358833 U+4DBA 䶺
0x82358834 U+4DBB 䶻
0x82358835 U+4DBC 䶼
0x82358836 U+4DBD 䶽
0x82358837 U+4DBE 䶾
0x82358838 U+4DBF 䶿

About the Author

Dr Ken Lunde has worked for Apple as a Font Developer since 2021-08-02 (and was in the same role as a contractor from 2020-01-16 through 2021-07-30), is the author of CJKV Information Processing Second Edition (O’Reilly Media, 2009), and earned BA (1987), MA (1988), and PhD (1994) degrees in linguistics from The University of Wisconsin-Madison. Prior to working at Apple, he worked at Adobe for over twenty-eight years — from 1991-07-01 to 2019-10-18 — specializing in CJKV Type Development, meaning that he architected and developed fonts for East Asian typefaces, along with the standards and specifications on which they are based. He architected and developed the Adobe-branded “Source Han” (Source Han Sans, Source Han Serif, and Source Han Mono) and Google-branded “Noto CJK” (Noto Sans CJK and Noto Serif CJK) open source Pan-CJK typeface families that were released in 2014, 2017, and 2019, and published over 300 articles on Adobe’s now-static CJK Type Blog. Ken serves as the Unicode Consortium’s IVD (Ideographic Variation Database) Registrar, attends UTC and IRG meetings, participates in the Unicode Editorial Committee, became an individual Unicode Life Member in 2018, received the 2018 Unicode Bulldog Award, was a Unicode Technical Director from 2018 to 2020, became a Vice-Chair of the Emoji Subcommittee in 2019, published UTN #43 (Unihan Database Property “kStrange) in 2020, became the Chair of the CJK & Unihan Group in 2021, and published UTN #45 (Unihan Property History) in 2022. He and his wife, Hitomi, are proud owners of a His & Hers pair of acceleration-boosted 2018 LR Dual Motor AWD Tesla Model 3 EVs.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dr Ken Lunde

Dr Ken Lunde

72 Followers

Chair, Unicode® CJK & Unihan Group—Almaden Valley—San José—CA—USA—NW Hemisphere—Terra—Sol—Orion-Cygnus Arm—Milky Way—Local Group—Laniakea Supercluster