CJK characters

From Wikipedia for FEVERv2
(Redirected from CJK)
Jump to navigation Jump to search

In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. CJK characters_sentence_0

Occasionally, Vietnamese is included, making the abbreviation CJKV, since Vietnamese historically used Chinese characters as well. CJK characters_sentence_1

Collectively, the CJKV characters often include hànzì in Chinese, kanji, kana in Japanese, hanja, hangul in Korean, and hán tự or chữ nôm in Vietnamese. CJK characters_sentence_2

Character repertoire CJK characters_section_0

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. CJK characters_sentence_3

It requires over 3,000 characters for general literacy, but up to 40,000 characters for reasonably complete coverage. CJK characters_sentence_4

Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. CJK characters_sentence_5

The use of Chinese characters in Korea is becoming increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. CJK characters_sentence_6

However, even today, students in South Korea are taught 1,800 characters. CJK characters_sentence_7

Other scripts used for these languages, such as bopomofo and the Latin-based pinyin for Chinese, hiragana and katakana for Japanese, and hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages. CJK characters_sentence_8

Until the early 20th century, Classical Chinese was the written language of government and scholarship in Vietnam. CJK characters_sentence_9

Popular literature in Vietnamese was written in the chữ Nôm script, consisting of borrowed Chinese characters together with many characters created locally. CJK characters_sentence_10

By the end of the 1920s, both scripts had been replaced by writing in Vietnamese using the Latin-based Vietnamese alphabet. CJK characters_sentence_11

The sinologist Carl Leban (1971) produced an early survey of CJK encoding systems. CJK characters_sentence_12

Encoding CJK characters_section_1

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. CJK characters_sentence_13

The 16-bit fixed width encodings, such as those from Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the GB 18030 character set. CJK characters_sentence_14

Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. CJK characters_sentence_15

Unicode has attempted, with some controversy, to unify the character sets in a process known as Han unification. CJK characters_sentence_16

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana and hangul. CJK characters_sentence_17

CJK character encodings include: CJK characters_sentence_18

The CJK character sets take up the bulk of the assigned Unicode code space. CJK characters_sentence_19

There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters. CJK characters_sentence_20

All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues. CJK characters_sentence_21

Legal status CJK characters_section_2

Libraries cooperated on encoding standards for JACKPHY characters in the early 1980s. CJK characters_sentence_22

According to Ken Lunde, the abbreviation "CJK" was a registered trademark of Research Libraries Group (which merged with OCLC in 2006). CJK characters_sentence_23

The trademark owned by OCLC between 1987 and 2009 has now expired. CJK characters_sentence_24

See also CJK characters_section_3

CJK characters_unordered_list_0


Credits to the contents of this page go to the authors of the corresponding Wikipedia page: en.wikipedia.org/wiki/CJK characters.