Skip to content

Enum: EncodingEnum

Character encoding schemes for text representation in different languages and scripts.

URI: data_sheets_schema:EncodingEnum

Permissible Values

Value Meaning Description
ASCII None American Standard Code for Information Interchange (7-bit, English characters...
Big5 None Traditional Chinese character encoding (primarily Taiwan and Hong Kong)
EUC-JP None Extended Unix Code for Japanese
EUC-KR None Extended Unix Code for Korean
EUC-TW None Extended Unix Code for Traditional Chinese
GB2312 None Simplified Chinese character encoding standard
HZ-GB-2312 None 7-bit encoding for Simplified Chinese (GB2312)
ISO-2022-CN-EXT None Extended ISO-2022 encoding for Chinese (includes both Simplified and Traditio...
ISO-2022-CN None ISO-2022 encoding for Chinese
ISO-2022-JP-2 None Extended ISO-2022 encoding for Japanese (includes additional character sets)
ISO-2022-JP None ISO-2022 encoding for Japanese
ISO-2022-KR None ISO-2022 encoding for Korean
ISO-8859-10 None Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic)
ISO-8859-11 None Latin/Thai encoding
ISO-8859-13 None Latin-7 (Baltic Rim languages)
ISO-8859-14 None Latin-8 (Celtic languages)
ISO-8859-15 None Latin-9 (Western European with Euro sign)
ISO-8859-16 None Latin-10 (South-Eastern European languages)
ISO-8859-1 None Latin-1 (Western European languages)
ISO-8859-2 None Latin-2 (Central European languages)
ISO-8859-3 None Latin-3 (South European languages - Turkish, Maltese, Esperanto)
ISO-8859-4 None Latin-4 (North European languages)
ISO-8859-5 None Latin/Cyrillic encoding
ISO-8859-6 None Latin/Arabic encoding
ISO-8859-7 None Latin/Greek encoding
ISO-8859-8 None Latin/Hebrew encoding
ISO-8859-9 None Latin-5 (Turkish)
KOI8-R None Russian character encoding (Kod Obmena Informatsiey)
KOI8-U None Ukrainian character encoding
Shift_JIS None Japanese character encoding (Microsoft and other systems)
UTF-16 None Unicode Transformation Format 16-bit (variable-width encoding)
UTF-32 None Unicode Transformation Format 32-bit (fixed-width encoding)
UTF-7 None Unicode Transformation Format 7-bit (for 7-bit channels)
UTF-8 None Unicode Transformation Format 8-bit (variable-width, most common Unicode enco...
Windows-1250 None Windows code page for Central European languages
Windows-1251 None Windows code page for Cyrillic script
Windows-1252 None Windows code page for Western European languages
Windows-1253 None Windows code page for Greek
Windows-1254 None Windows code page for Turkish
Windows-1255 None Windows code page for Hebrew
Windows-1256 None Windows code page for Arabic
Windows-1257 None Windows code page for Baltic languages
Windows-1258 None Windows code page for Vietnamese

Slots

Name Description
encoding The character encoding of the data

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/bridge2ai/data-sheets-schema

LinkML Source

name: EncodingEnum
description: Character encoding schemes for text representation in different languages
  and scripts.
from_schema: https://w3id.org/bridge2ai/data-sheets-schema
rank: 1000
permissible_values:
  ASCII:
    text: ASCII
    description: American Standard Code for Information Interchange (7-bit, English
      characters only).
  Big5:
    text: Big5
    description: Traditional Chinese character encoding (primarily Taiwan and Hong
      Kong).
  EUC-JP:
    text: EUC-JP
    description: Extended Unix Code for Japanese.
  EUC-KR:
    text: EUC-KR
    description: Extended Unix Code for Korean.
  EUC-TW:
    text: EUC-TW
    description: Extended Unix Code for Traditional Chinese.
  GB2312:
    text: GB2312
    description: Simplified Chinese character encoding standard.
  HZ-GB-2312:
    text: HZ-GB-2312
    description: 7-bit encoding for Simplified Chinese (GB2312).
  ISO-2022-CN-EXT:
    text: ISO-2022-CN-EXT
    description: Extended ISO-2022 encoding for Chinese (includes both Simplified
      and Traditional).
  ISO-2022-CN:
    text: ISO-2022-CN
    description: ISO-2022 encoding for Chinese.
  ISO-2022-JP-2:
    text: ISO-2022-JP-2
    description: Extended ISO-2022 encoding for Japanese (includes additional character
      sets).
  ISO-2022-JP:
    text: ISO-2022-JP
    description: ISO-2022 encoding for Japanese.
  ISO-2022-KR:
    text: ISO-2022-KR
    description: ISO-2022 encoding for Korean.
  ISO-8859-10:
    text: ISO-8859-10
    description: Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic).
  ISO-8859-11:
    text: ISO-8859-11
    description: Latin/Thai encoding.
  ISO-8859-13:
    text: ISO-8859-13
    description: Latin-7 (Baltic Rim languages).
  ISO-8859-14:
    text: ISO-8859-14
    description: Latin-8 (Celtic languages).
  ISO-8859-15:
    text: ISO-8859-15
    description: Latin-9 (Western European with Euro sign).
  ISO-8859-16:
    text: ISO-8859-16
    description: Latin-10 (South-Eastern European languages).
  ISO-8859-1:
    text: ISO-8859-1
    description: Latin-1 (Western European languages).
  ISO-8859-2:
    text: ISO-8859-2
    description: Latin-2 (Central European languages).
  ISO-8859-3:
    text: ISO-8859-3
    description: Latin-3 (South European languages - Turkish, Maltese, Esperanto).
  ISO-8859-4:
    text: ISO-8859-4
    description: Latin-4 (North European languages).
  ISO-8859-5:
    text: ISO-8859-5
    description: Latin/Cyrillic encoding.
  ISO-8859-6:
    text: ISO-8859-6
    description: Latin/Arabic encoding.
  ISO-8859-7:
    text: ISO-8859-7
    description: Latin/Greek encoding.
  ISO-8859-8:
    text: ISO-8859-8
    description: Latin/Hebrew encoding.
  ISO-8859-9:
    text: ISO-8859-9
    description: Latin-5 (Turkish).
  KOI8-R:
    text: KOI8-R
    description: Russian character encoding (Kod Obmena Informatsiey).
  KOI8-U:
    text: KOI8-U
    description: Ukrainian character encoding.
  Shift_JIS:
    text: Shift_JIS
    description: Japanese character encoding (Microsoft and other systems).
  UTF-16:
    text: UTF-16
    description: Unicode Transformation Format 16-bit (variable-width encoding).
  UTF-32:
    text: UTF-32
    description: Unicode Transformation Format 32-bit (fixed-width encoding).
  UTF-7:
    text: UTF-7
    description: Unicode Transformation Format 7-bit (for 7-bit channels).
  UTF-8:
    text: UTF-8
    description: Unicode Transformation Format 8-bit (variable-width, most common
      Unicode encoding).
  Windows-1250:
    text: Windows-1250
    description: Windows code page for Central European languages.
  Windows-1251:
    text: Windows-1251
    description: Windows code page for Cyrillic script.
  Windows-1252:
    text: Windows-1252
    description: Windows code page for Western European languages.
  Windows-1253:
    text: Windows-1253
    description: Windows code page for Greek.
  Windows-1254:
    text: Windows-1254
    description: Windows code page for Turkish.
  Windows-1255:
    text: Windows-1255
    description: Windows code page for Hebrew.
  Windows-1256:
    text: Windows-1256
    description: Windows code page for Arabic.
  Windows-1257:
    text: Windows-1257
    description: Windows code page for Baltic languages.
  Windows-1258:
    text: Windows-1258
    description: Windows code page for Vietnamese.