Enum: EncodingEnum
Character encoding schemes for text representation in different languages and scripts.
URI: data_sheets_schema:EncodingEnum
Permissible Values
| Value | Meaning | Description |
|---|---|---|
| ASCII | None | American Standard Code for Information Interchange (7-bit, English characters... |
| Big5 | None | Traditional Chinese character encoding (primarily Taiwan and Hong Kong) |
| EUC-JP | None | Extended Unix Code for Japanese |
| EUC-KR | None | Extended Unix Code for Korean |
| EUC-TW | None | Extended Unix Code for Traditional Chinese |
| GB2312 | None | Simplified Chinese character encoding standard |
| HZ-GB-2312 | None | 7-bit encoding for Simplified Chinese (GB2312) |
| ISO-2022-CN-EXT | None | Extended ISO-2022 encoding for Chinese (includes both Simplified and Traditio... |
| ISO-2022-CN | None | ISO-2022 encoding for Chinese |
| ISO-2022-JP-2 | None | Extended ISO-2022 encoding for Japanese (includes additional character sets) |
| ISO-2022-JP | None | ISO-2022 encoding for Japanese |
| ISO-2022-KR | None | ISO-2022 encoding for Korean |
| ISO-8859-10 | None | Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic) |
| ISO-8859-11 | None | Latin/Thai encoding |
| ISO-8859-13 | None | Latin-7 (Baltic Rim languages) |
| ISO-8859-14 | None | Latin-8 (Celtic languages) |
| ISO-8859-15 | None | Latin-9 (Western European with Euro sign) |
| ISO-8859-16 | None | Latin-10 (South-Eastern European languages) |
| ISO-8859-1 | None | Latin-1 (Western European languages) |
| ISO-8859-2 | None | Latin-2 (Central European languages) |
| ISO-8859-3 | None | Latin-3 (South European languages - Turkish, Maltese, Esperanto) |
| ISO-8859-4 | None | Latin-4 (North European languages) |
| ISO-8859-5 | None | Latin/Cyrillic encoding |
| ISO-8859-6 | None | Latin/Arabic encoding |
| ISO-8859-7 | None | Latin/Greek encoding |
| ISO-8859-8 | None | Latin/Hebrew encoding |
| ISO-8859-9 | None | Latin-5 (Turkish) |
| KOI8-R | None | Russian character encoding (Kod Obmena Informatsiey) |
| KOI8-U | None | Ukrainian character encoding |
| Shift_JIS | None | Japanese character encoding (Microsoft and other systems) |
| UTF-16 | None | Unicode Transformation Format 16-bit (variable-width encoding) |
| UTF-32 | None | Unicode Transformation Format 32-bit (fixed-width encoding) |
| UTF-7 | None | Unicode Transformation Format 7-bit (for 7-bit channels) |
| UTF-8 | None | Unicode Transformation Format 8-bit (variable-width, most common Unicode enco... |
| Windows-1250 | None | Windows code page for Central European languages |
| Windows-1251 | None | Windows code page for Cyrillic script |
| Windows-1252 | None | Windows code page for Western European languages |
| Windows-1253 | None | Windows code page for Greek |
| Windows-1254 | None | Windows code page for Turkish |
| Windows-1255 | None | Windows code page for Hebrew |
| Windows-1256 | None | Windows code page for Arabic |
| Windows-1257 | None | Windows code page for Baltic languages |
| Windows-1258 | None | Windows code page for Vietnamese |
Slots
| Name | Description |
|---|---|
| encoding | The character encoding of the data |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/bridge2ai/data-sheets-schema
LinkML Source
name: EncodingEnum
description: Character encoding schemes for text representation in different languages
and scripts.
from_schema: https://w3id.org/bridge2ai/data-sheets-schema
rank: 1000
permissible_values:
ASCII:
text: ASCII
description: American Standard Code for Information Interchange (7-bit, English
characters only).
Big5:
text: Big5
description: Traditional Chinese character encoding (primarily Taiwan and Hong
Kong).
EUC-JP:
text: EUC-JP
description: Extended Unix Code for Japanese.
EUC-KR:
text: EUC-KR
description: Extended Unix Code for Korean.
EUC-TW:
text: EUC-TW
description: Extended Unix Code for Traditional Chinese.
GB2312:
text: GB2312
description: Simplified Chinese character encoding standard.
HZ-GB-2312:
text: HZ-GB-2312
description: 7-bit encoding for Simplified Chinese (GB2312).
ISO-2022-CN-EXT:
text: ISO-2022-CN-EXT
description: Extended ISO-2022 encoding for Chinese (includes both Simplified
and Traditional).
ISO-2022-CN:
text: ISO-2022-CN
description: ISO-2022 encoding for Chinese.
ISO-2022-JP-2:
text: ISO-2022-JP-2
description: Extended ISO-2022 encoding for Japanese (includes additional character
sets).
ISO-2022-JP:
text: ISO-2022-JP
description: ISO-2022 encoding for Japanese.
ISO-2022-KR:
text: ISO-2022-KR
description: ISO-2022 encoding for Korean.
ISO-8859-10:
text: ISO-8859-10
description: Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic).
ISO-8859-11:
text: ISO-8859-11
description: Latin/Thai encoding.
ISO-8859-13:
text: ISO-8859-13
description: Latin-7 (Baltic Rim languages).
ISO-8859-14:
text: ISO-8859-14
description: Latin-8 (Celtic languages).
ISO-8859-15:
text: ISO-8859-15
description: Latin-9 (Western European with Euro sign).
ISO-8859-16:
text: ISO-8859-16
description: Latin-10 (South-Eastern European languages).
ISO-8859-1:
text: ISO-8859-1
description: Latin-1 (Western European languages).
ISO-8859-2:
text: ISO-8859-2
description: Latin-2 (Central European languages).
ISO-8859-3:
text: ISO-8859-3
description: Latin-3 (South European languages - Turkish, Maltese, Esperanto).
ISO-8859-4:
text: ISO-8859-4
description: Latin-4 (North European languages).
ISO-8859-5:
text: ISO-8859-5
description: Latin/Cyrillic encoding.
ISO-8859-6:
text: ISO-8859-6
description: Latin/Arabic encoding.
ISO-8859-7:
text: ISO-8859-7
description: Latin/Greek encoding.
ISO-8859-8:
text: ISO-8859-8
description: Latin/Hebrew encoding.
ISO-8859-9:
text: ISO-8859-9
description: Latin-5 (Turkish).
KOI8-R:
text: KOI8-R
description: Russian character encoding (Kod Obmena Informatsiey).
KOI8-U:
text: KOI8-U
description: Ukrainian character encoding.
Shift_JIS:
text: Shift_JIS
description: Japanese character encoding (Microsoft and other systems).
UTF-16:
text: UTF-16
description: Unicode Transformation Format 16-bit (variable-width encoding).
UTF-32:
text: UTF-32
description: Unicode Transformation Format 32-bit (fixed-width encoding).
UTF-7:
text: UTF-7
description: Unicode Transformation Format 7-bit (for 7-bit channels).
UTF-8:
text: UTF-8
description: Unicode Transformation Format 8-bit (variable-width, most common
Unicode encoding).
Windows-1250:
text: Windows-1250
description: Windows code page for Central European languages.
Windows-1251:
text: Windows-1251
description: Windows code page for Cyrillic script.
Windows-1252:
text: Windows-1252
description: Windows code page for Western European languages.
Windows-1253:
text: Windows-1253
description: Windows code page for Greek.
Windows-1254:
text: Windows-1254
description: Windows code page for Turkish.
Windows-1255:
text: Windows-1255
description: Windows code page for Hebrew.
Windows-1256:
text: Windows-1256
description: Windows code page for Arabic.
Windows-1257:
text: Windows-1257
description: Windows code page for Baltic languages.
Windows-1258:
text: Windows-1258
description: Windows code page for Vietnamese.