ISO 639-3

From Wikipedia for FEVERv2
Jump to navigation Jump to search

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. ISO 639-3_sentence_0

It defines three-letter codes for identifying languages. ISO 639-3_sentence_1

The standard was published by International Organization for Standardization (ISO) on 1 February 2007. ISO 639-3_sentence_2

ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. ISO 639-3_sentence_3

The extended language coverage was based primarily on the language codes used in the Ethnologue (volumes 10-14) published by SIL International, which is now the registration authority for ISO 639-3. ISO 639-3_sentence_4

It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten. ISO 639-3_sentence_5

However, it does not include reconstructed languages such as Proto-Indo-European. ISO 639-3_sentence_6

ISO 639-3 is intended for use as metadata codes in a wide range of applications. ISO 639-3_sentence_7

It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. ISO 639-3_sentence_8

In archives and other information storage, it is used in cataloging systems, indicating what language a resource is in or about. ISO 639-3_sentence_9

The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous. ISO 639-3_sentence_10

ISO 639-3_table_general_0

Find a languageISO 639-3_header_cell_0_0_0
Enter an ISO 639-3 code to find the corresponding language article.ISO 639-3_cell_0_1_0

Language codes ISO 639-3_section_0

Main article: List of ISO 639-3 codes ISO 639-3_sentence_11

ISO 639-3 includes all languages in ISO 639-1 and all individual languages in ISO 639-2. ISO 639-3_sentence_12

ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. ISO 639-3_sentence_13

Since ISO 639-2 also includes language collections and Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. ISO 639-3_sentence_14

Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes. ISO 639-3_sentence_15

ISO 639-3_table_general_1

Example ISO language codesISO 639-3_table_caption_1
LanguageISO 639-3_header_cell_1_0_0 639-1ISO 639-3_header_cell_1_0_1 639-2 (B/T)ISO 639-3_header_cell_1_0_2 639-3 typeISO 639-3_header_cell_1_0_3 639-3 codeISO 639-3_header_cell_1_0_4
EnglishISO 639-3_header_cell_1_1_0 enISO 639-3_cell_1_1_1 engISO 639-3_cell_1_1_2 individualISO 639-3_cell_1_1_3 engISO 639-3_cell_1_1_4
GermanISO 639-3_header_cell_1_2_0 deISO 639-3_cell_1_2_1 ger/deuISO 639-3_cell_1_2_2 individualISO 639-3_cell_1_2_3 deuISO 639-3_cell_1_2_4
ArabicISO 639-3_header_cell_1_3_0 arISO 639-3_cell_1_3_1 araISO 639-3_cell_1_3_2 macroISO 639-3_cell_1_3_3 araISO 639-3_cell_1_3_4
individualISO 639-3_cell_1_4_0 arb + othersISO 639-3_cell_1_4_1
ChineseISO 639-3_header_cell_1_5_0 zhISO 639-3_cell_1_5_1 chi/zhoISO 639-3_cell_1_5_2 macroISO 639-3_cell_1_5_3 zhoISO 639-3_cell_1_5_4
MandarinISO 639-3_header_cell_1_6_0 individualISO 639-3_cell_1_6_1 cmnISO 639-3_cell_1_6_2
CantoneseISO 639-3_header_cell_1_7_0 individualISO 639-3_cell_1_7_1 yueISO 639-3_cell_1_7_2
MinnanISO 639-3_header_cell_1_8_0 individualISO 639-3_cell_1_8_1 nanISO 639-3_cell_1_8_2

As of 30 January 2020, the standard contains 7,868 entries. ISO 639-3_sentence_16

The inventory of languages is based on a number of sources including: the individual languages contained in 639-2, modern languages from the Ethnologue, historic varieties, ancient languages and artificial languages from the Linguist List, as well as languages recommended within the annual public commenting period. ISO 639-3_sentence_17

Machine-readable data files are provided by the registration authority. ISO 639-3_sentence_18

Mappings from ISO 639-1 or ISO 639-2 to ISO 639-3 can be done using these data files. ISO 639-3_sentence_19

ISO 639-3 is intended to assume distinctions based on criteria that are not entirely subjective. ISO 639-3_sentence_20

It is not intended to document or provide identifiers for dialects or other sub-language variations. ISO 639-3_sentence_21

Nevertheless, judgments regarding distinctions between languages may be subjective, particularly in the case of language varieties without established literary traditions, usage in education or media, or other factors that contribute to language conventionalization. ISO 639-3_sentence_22

Therefore, the standard should not be regarded as an authoritative statement of what distinct languages exist in the world (about which there may be substantial disagreement in some cases), but rather simply one useful way for identifying different language varieties precisely. ISO 639-3_sentence_23

Code space ISO 639-3_section_1

Since the code is three-letter alphabetic, one upper bound for the number of languages that can be represented is 26 × 26 × 26 = 17,576. ISO 639-3_sentence_24

Since ISO 639-2 defines special codes (4), a reserved range (520) and B-only codes (22), 546 codes cannot be used in part 3. ISO 639-3_sentence_25

Therefore, a stricter upper bound is 17,576 − 546 = 17,030. ISO 639-3_sentence_26

The upper bound gets even stricter if one subtracts the language collections defined in 639-2 and the ones yet to be defined in ISO 639-5. ISO 639-3_sentence_27

Macrolanguages ISO 639-3_section_2

Main article: ISO 639 macrolanguage ISO 639-3_sentence_28

There are 58 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3. ISO 639-3_sentence_29

Some of these macrolanguages had no individual language as defined by ISO 639-3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic). ISO 639-3_sentence_30

Others like 'nor' (Norwegian) had their two individual parts ('nno' (Nynorsk), 'nob' (Bokmål)) already in ISO 639-2. ISO 639-3_sentence_31

That means some languages (e.g. 'arb', Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ('ara') are now in ISO 639-3 in certain contexts considered to be individual languages themselves. ISO 639-3_sentence_32

This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as two forms of the same language, e.g. in cases of diglossia. ISO 639-3_sentence_33

For example: ISO 639-3_sentence_34

ISO 639-3_unordered_list_0

  • (Generic Arabic, 639-2)ISO 639-3_item_0_0
  • (Standard Arabic, 639-3)ISO 639-3_item_0_1

See for the complete list. ISO 639-3_sentence_35

Collective languages ISO 639-3_section_3

See also: ISO 639-2 § Collective language codes, and ISO 639-5 ISO 639-3_sentence_36

"A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context." ISO 639-3_sentence_37

These codes do not precisely represent a particular language or macrolanguage. ISO 639-3_sentence_38

While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. ISO 639-3_sentence_39

Hence ISO 639-3 is not a superset of ISO 639-2. ISO 639-3_sentence_40

ISO 639-5 defines 3-letter collective codes for language families and groups, including the collective language codes from ISO 639-2. ISO 639-3_sentence_41

Special codes ISO 639-3_section_4

Four codes are set aside in ISO 639-2 and ISO 639-3 for cases where none of the specific codes are appropriate. ISO 639-3_sentence_42

These are intended primarily for applications like databases where an ISO code is required regardless of whether one exists. ISO 639-3_sentence_43

ISO 639-3_unordered_list_1

  • mis (uncoded languages, originally an abbreviation for 'miscellaneous') is intended for languages which have not (yet) been included in the ISO standard.ISO 639-3_item_1_2
  • mul (multiple languages) is intended for cases where the data includes more than one language, and (for example) the database requires a single ISO code.ISO 639-3_item_1_3
  • und (undetermined) is intended for cases where the language in the data has not been identified, such as when it is mislabeled or never had been labeled. It is not intended for cases such as Trojan where an unattested language has been given a name.ISO 639-3_item_1_4
  • zxx (no linguistic content / not applicable) is intended for data which is not a language at all, such as animal calls.ISO 639-3_item_1_5

In addition, 520 codes in the range qaa–qtz are 'reserved for local use'. ISO 639-3_sentence_44

For example, the Linguist List uses them for extinct languages. ISO 639-3_sentence_45

Linguist List has assigned one of them a generic value: qnp, unnamed proto-language. ISO 639-3_sentence_46

This is used for proposed intermediate nodes in a family tree that have no name. ISO 639-3_sentence_47

Maintenance processes ISO 639-3_section_5

The code table for ISO 639-3 is open to changes. ISO 639-3_sentence_48

In order to protect stability of existing usage, the changes permitted are limited to: ISO 639-3_sentence_49

ISO 639-3_unordered_list_2

  • modifications to the reference information for an entry (including names or categorizations for type and scope),ISO 639-3_item_2_6
  • addition of new entries,ISO 639-3_item_2_7
  • deprecation of entries that are duplicates or spurious,ISO 639-3_item_2_8
  • merging one or more entries into another entry, andISO 639-3_item_2_9
  • splitting an existing language entry into multiple new language entries.ISO 639-3_item_2_10

The code assigned to a language is not changed unless there is also a change in denotation. ISO 639-3_sentence_50

Changes are made on an annual cycle. ISO 639-3_sentence_51

Every request is given a minimum period of three months for public review. ISO 639-3_sentence_52

The ISO 639-3 Web site has pages that describe "scopes of denotation" ( types) and types of languages, which explain what concepts are in scope for encoding and certain criteria that need to be met. ISO 639-3_sentence_53

For example, constructed languages can be encoded, but only if they are designed for human communication and have a body of literature, preventing requests for idiosyncratic inventions. ISO 639-3_sentence_54

The registration authority documents on its Web site instructions made in the text of the ISO 639-3 standard regarding how the code tables are to be maintained. ISO 639-3_sentence_55

It also documents the processes used for receiving and processing change requests. ISO 639-3_sentence_56

A change request form is provided, and there is a second form for collecting information about proposed additions. ISO 639-3_sentence_57

Any party can submit change requests. ISO 639-3_sentence_58

When submitted, requests are initially reviewed by the registration authority for completeness. ISO 639-3_sentence_59

When a fully documented request is received, it is added to a published Change Request Index. ISO 639-3_sentence_60

Also, announcements are sent to the general LINGUIST discussion list at Linguist List and other lists the registration authority may consider relevant, inviting public review and input on the requested change. ISO 639-3_sentence_61

Any list owner or individual is able to request notifications of change requests for particular regions or language families. ISO 639-3_sentence_62

Comments that are received are published for other parties to review. ISO 639-3_sentence_63

Based on consensus in comments received, a change request may be withdrawn or promoted to "candidate status". ISO 639-3_sentence_64

Three months prior to the end of an annual review cycle (typically in September), an announcement is set to the LINGUIST discussion list and other lists regarding Candidate Status Change Requests. ISO 639-3_sentence_65

All requests remain open for review and comment through the end of the annual review cycle. ISO 639-3_sentence_66

Decisions are announced at the end of the annual review cycle (typically in January). ISO 639-3_sentence_67

At that time, requests may be adopted in whole or in part, amended and carried forward into the next review cycle, or rejected. ISO 639-3_sentence_68

Rejections often include suggestions on how to modify proposals for resubmission. ISO 639-3_sentence_69

A public archive of every change request is maintained along with the decisions taken and the rationale for the decisions. ISO 639-3_sentence_70

Criticism ISO 639-3_section_6

Linguists Morey, Post and Friedman raise various criticisms of ISO 639, and in particular ISO 639-3: ISO 639-3_sentence_71

ISO 639-3_unordered_list_3

  • The three-letter codes themselves are problematic, because while officially arbitrary technical labels, they are often derived from mnemonic abbreviations for language names, some of which are pejorative. For example, Yemsa was assigned the code jnj, from pejorative "Janejero". These codes may thus be considered offensive by native speakers, but codes in the standard, once assigned, cannot be changed.ISO 639-3_item_3_11
  • The administration of the standard is problematic because SIL is a missionary organization with inadequate transparency and accountability. Decisions as to what deserves to be encoded as a language are made internally. While outside input may or may not be welcomed, the decisions themselves are opaque, and many linguists have given up trying to improve the standard.ISO 639-3_item_3_12
  • Permanent identification of a language is incompatible with language change.ISO 639-3_item_3_13
  • Languages and dialects often cannot be rigorously distinguished, and dialect continua may be subdivided in many ways, whereas the standard privileges one choice. Such distinctions are often based instead on social and political factors.ISO 639-3_item_3_14
  • ISO 639-3 may be misunderstood and misused by authorities that make decisions about people's identity and language, abolishing the right of speakers to identify or identify with their speech variety. Though SIL is sensitive to such issues, this problem is inherent in the nature of an established standard, which may be used (or mis-used) in ways that ISO and SIL do not intend.ISO 639-3_item_3_15

Martin Haspelmath agrees with four of these points, but not the point about language change. ISO 639-3_sentence_72

He disagrees because any account of a language requires identifying it, and we can easily identify different stages of a language. ISO 639-3_sentence_73

He suggests that linguists may prefer to use a codification that is made at the level since "it rarely matters to linguists whether what they are talking about is a language, a dialect or a close-knit family of languages." ISO 639-3_sentence_74

He also questions whether an ISO standard for language identification is appropriate since ISO is an industrial organization, while he views language documentation and nomenclature as a scientific endeavor. ISO 639-3_sentence_75

He cites the original need for standardized language identifiers as having been "the economic significance of translation and software localization," for which purposes the ISO 639-1 and 639-2 standards were established. ISO 639-3_sentence_76

But he raises doubts about industry need for the comprehensive coverage provided by ISO 639-3, including as it does "little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction". ISO 639-3_sentence_77

Usage ISO 639-3_section_7

ISO 639-3_unordered_list_4

  • EthnologueISO 639-3_item_4_16
  • Linguist ListISO 639-3_item_4_17
  • OLAC: the Open Languages Archive CommunityISO 639-3_item_4_18
  • Microsoft Windows 8: Supports all codes in ISO 639-3 at the time of release.ISO 639-3_item_4_19
  • Wikimedia foundation: New language-based projects (e.g. Wikipedias in new languages) must have an identifier from ISO 639-1, -2, or -3.ISO 639-3_item_4_20
  • Other standards that rely on ISO 639-3:ISO 639-3_item_4_21
    • Language tags as defined by the Internet Engineering Task Force (IETF), as documented in:ISO 639-3_item_4_22
      • BCP 47: Best Current Practice 47, which includesISO 639-3_item_4_23
      • , which superseded , which superseded . (Therefore, all standards which depend on any of these 3 IETF standards now use ISO 639-3.)ISO 639-3_item_4_24
    • The ePub 3.0 standard for language metadata uses Dublin Core Metadata elements. These language metadata elements in ePubs must contain valid codes for languages. RFC5646 points to ISO 639-3 for languages without shorter IANA codes.ISO 639-3_item_4_25
    • Dublin Core Metadata Initiative: DCMI Metadata Term for language, via IETF's (now superseded by ).ISO 639-3_item_4_26
    • Internet Assigned Numbers Authority (IANA) The W3C's internationalization effort recommends the use of the IANA Language Subtag Registry for selecting codes for languages. The IANA Language Subtag Registry depends on ISO 639-3 codes for languages which did not previously have codes in other parts of the ISO 639 standard.ISO 639-3_item_4_27
    • HTML5: via IETF's BCP 47.ISO 639-3_item_4_28
    • MARC library codes.ISO 639-3_item_4_29
    • MODS library codes: Incorporates IETF's (now superseded by ).ISO 639-3_item_4_30
    • Text Encoding Initiative (TEI): via IETF's BCP 47.ISO 639-3_item_4_31
    • Lexical Markup Framework: ISO specification for representation of machine-readable dictionaries.ISO 639-3_item_4_32
    • Unicode's Common locale data repository: Uses several hundred codes from ISO 639-3 not included in ISO 639-2.ISO 639-3_item_4_33

Credits to the contents of this page go to the authors of the corresponding Wikipedia page: 639-3.