|
|
utf8_encode (PHP 3 >= 3.0.6, PHP 4, PHP 5) utf8_encode -- Encodes an ISO-8859-1 string to UTF-8 Descriptionstring utf8_encode ( string data )
This function encodes the string data to
UTF-8, and returns the encoded version.
UTF-8 is a standard mechanism used by
Unicode for encoding wide
character values into a byte stream.
UTF-8 is transparent to plain ASCII
characters, is self-synchronized (meaning it is possible for a program to
figure out where in the bytestream characters start) and can be used with
normal string comparison functions for sorting and such.PHP encodes
UTF-8 characters in up to four bytes, like this:
Таблица 1. UTF-8 encoding | bytes | bits | representation |
|---|
| 1 | 7 | 0bbbbbbb | | 2 | 11 | 110bbbbb 10bbbbbb | | 3 | 16 | 1110bbbb 10bbbbbb 10bbbbbb | | 4 | 21 | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb |
Each b represents a bit that can be
used to store character data.
luka8088 at gmail dot com
22-Jun-2007 07:19
simple HTML to UTF-8 conversion:
function html_to_utf8 ($data)
{
return preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\\1")', $data);
}
function _html_to_utf8 ($data)
{
if ($data > 127)
{
$i = 5;
while (($i--) > 0)
{
if ($data != ($a = $data % ($p = pow(64, $i))))
{
$ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p));
for ($i; $i > 0; $i--)
$ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p));
break;
}
}
}
else
$ret = "&#$data;";
return $ret;
}
Example:
echo html_to_utf8("a b č ć ž こ に ち わ ()[]{}!#$?* < >");
Output:
a b
|
|