Unicode supports almost all existing character sets. The best form of Unicode character set encoding is UTF-8 encoding. It implements compatibility with ASCII, resistance to data distortion, efficiency and ease of processing. But first things first.
Coding forms
Computers operate with numbers not just as abstract mathematical objects, but as combinations of fixed-size information storage and processing units — bytes and 32-bit words. The encoding standard should take this into account when determining how characters are represented by numbers.
8 (1 ), 16 32 . , , . : 8, 16 32- . , UTF-8, UTF-16 UTF-32. UTF . , .
. , , . .
. , Windows-932 . , . . , , D ( 44) «» ( 84 44). , , .
, . , . , , , .
, , . , - . , , .
, . . . , . 8- , , 10xxxxxx ( ), .
3 . UTF-8 , – .
-
UTF-32 32- , . UTF-16 – 16- . UTF-8 4 .
UTF-8 - ASCII. . ASCII . , 8- , ASCII . UTF-8.
UTF-8 – , 8- , , . , – . .
ASCII
UTF-8- ASCII (0x00-0x7F). , U+0000-U+007F 0x00-0x7F UTF-8 ASCII. , , 0x00-0x7F . , ASCII, . U+0800-U+FFFF , U+FFFF .
UTF-8 HTML .
XML UTF-8. , , . URL, ASCII-, , W3 IETF URL UTF-8.
ASCII . UTF-8 , JEdit, Emacs, BBEdit, Eclipse "" Windows. .
, . UTF-8 C . , BOM XML.
, 8- , , UTF-8 :
UTF-8- . (, , , ) 3- . UTF-8- . , .
. (BOM, Byte order mark).
BOM UTF-8 . UTF-8 , . BOM , . BOM , , , UTF-8. 3 EF16 BB16 BF16.
UTF-8
HTML UTF-8 :
˂head˃
˂meta http-equiv="Content-Type" content="text/html; charset=utf-8"˃
PHP UTF-8 header() :
˂?php
error_reporting(-1);
header('Content-Type: text/html; charset=utf-8');
MySQL UTF-8 :
˂?php
mysql_set_charset('utf8');
CSS- UTF-8 :
@charset "utf-8";
UTF-8 BOM, . DreamWeave « – – /», UTF-8. , « (BOM)» . - , . u.
UTF-8 «» Windows. « – ...» UTF-8.
Notepad++, UTF-8, « UTF-8 BOM» UTF-8.
, , , , . . UTF-8 – , :
UTF-8 , , .