|
|
convert_cyr_string (PHP 3 >= 3.0.6, PHP 4, PHP 5) convert_cyr_string --
Преобразует строку из одной кириллической кодировки в другую
Описаниеstring convert_cyr_string ( string str, string from, string to )
Эта функция преобразует строку str из одной
кириллической кодировки в другую. Аргументы from
и to задают входную и выходную кодировки
соответственно, и состоят из одного символа. Поддерживаются следующие
кодировки:
k - koi8-r
w - windows-1251
i - iso8859-5
a - x-cp866
d - x-cp866
m - x-mac-cyrillic
Замечание: Эта функция безопасна
для обработки данных в двоичной форме.
add a note
User Contributed Notes
convert_cyr_string
apoc at ukr dot net
17-Sep-2007 10:43
:) what about NUMBER!!!???
function Utf8Win($str,$type="w")
{
static $conv='';
if (!is_array($conv))
{
$conv = array();
for($x=128;$x<=143;$x++)
{
$conv['u'][]=chr(209).chr($x);
$conv['w'][]=chr($x+112);
}
for($x=144;$x<=191;$x++)
{
$conv['u'][]=chr(208).chr($x);
$conv['w'][]=chr($x+48);
}
$conv['u'][]=chr(208).chr(129); // Ё
$conv['w'][]=chr(168);
$conv['u'][]=chr(209).chr(145); // ё
$conv['w'][]=chr(184);
$conv['u'][]=chr(208).chr(135); // Ї
$conv['w'][]=chr(175);
$conv['u'][]=chr(209).chr(151); // ї
$conv['w'][]=chr(191);
$conv['u'][]=chr(208).chr(134); // І
$conv['w'][]=chr(178);
$conv['u'][]=chr(209).chr(150); // і
$conv['w'][]=chr(179);
$conv['u'][]=chr(210).chr(144); // Ґ
$conv['w'][]=chr(165);
$conv['u'][]=chr(210).chr(145); // ґ
$conv['w'][]=chr(180);
$conv['u'][]=chr(208).chr(132); // Є
$conv['w'][]=chr(170);
$conv['u'][]=chr(209).chr(148); // є
$conv['w'][]=chr(186);
$conv['u'][]=chr(226).chr(132).chr(150); // №
$conv['w'][]=chr(185);
}
if ($type == 'w') { return str_replace($conv['u'],$conv['w'],$str); }
elseif ($type == 'u') { return str_replace($conv['w'], $conv['u'],$str); }
else { return $str; }
}
Vasyl Skotona
14-Sep-2007 09:28
A better function to convert cp1251 string to utf8.
Works with russian and ukrainian text.
function unicod($str) {
$conv=array();
for($x=128;$x<=143;$x++) $conv[$x+112]=chr(209).chr($x);
for($x=144;$x<=191;$x++) $conv[$x+48]=chr(208).chr($x);
$conv[184]=chr(209).chr(145); #ё
$conv[168]=chr(208).chr(129); #Ё
$conv[179]=chr(209).chr(150); #і
$conv[178]=chr(208).chr(134); #І
$conv[191]=chr(209).chr(151); #ї
$conv[175]=chr(208).chr(135); #ї
$conv[186]=chr(209).chr(148); #є
$conv[170]=chr(208).chr(132); #Є
$conv[180]=chr(210).chr(145); #ґ
$conv[165]=chr(210).chr(144); #Ґ
$conv[184]=chr(209).chr(145); #Ґ
$ar=str_split($str);
foreach($ar as $b) if(isset($conv[ord($b)])) $nstr.=$conv[ord($b)]; else $nstr.=$b;
return $nstr;
}
Sote Korveziroski
24-May-2006 02:53
I have made mistake remove this test line:
echo "<p>".ord($xchr)."</p>\n";
code should be like this:
// Modificated by tapin13
// Corrected by Timuretis
// Corrected by Sote for macedonian cyrillic
// Convert win-1251 to utf-8
function unicode_mk_cyr($str) {
$encode = "";
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
if (ord($xchr)>191) {
$xchr=ord($xchr)+848;
$xchr="&#" . $xchr . ";";
}
if(ord($xchr) == 129) {
$xchr = "Ѓ";
}
if(ord($xchr) == 163) {
$xchr = "Ј";
}
if(ord($xchr) == 138) {
$xchr = "Љ";
}
if(ord($xchr) == 140) {
$xchr = "Њ";
}
if(ord($xchr) == 143) {
$xchr = "Џ";
}
if(ord($xchr) == 141) {
$xchr = "Ќ";
}
if(ord($xchr) == 189) {
$xchr = "Ѕ";
}
if(ord($xchr) == 188) {
$xchr = "ј";
}
if(ord($xchr) == 131) {
$xchr = "ѓ";
}
if(ord($xchr) == 190) {
$xchr = "ѕ";
}
if(ord($xchr) == 154) {
$xchr = "љ";
}
if(ord($xchr) == 156) {
$xchr = "њ";
}
if(ord($xchr) == 159) {
$xchr = "џ";
}
if(ord($xchr) == 157) {
$xchr = "ќ";
}
$encode=$encode . $xchr;
}
return $encode;
}
Sote Korveziroski
24-May-2006 10:24
Only this code works OK for me, for translating win-1251 to utf-8 for macedonian letters!
// Modificated by tapin13
// Corrected by Timuretis
// Corrected by Sote for macedonian cyrillic
// Convert win-1251 to utf-8
function unicode_mk_cyr($str) {
$encode = "";
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
echo "<p>".ord($xchr)."</p>\n";
if (ord($xchr)>191) {
$xchr=ord($xchr)+848;
$xchr="&#" . $xchr . ";";
}
if(ord($xchr) == 129) {
$xchr = "Ѓ";
}
if(ord($xchr) == 163) {
$xchr = "Ј";
}
if(ord($xchr) == 138) {
$xchr = "Љ";
}
if(ord($xchr) == 140) {
$xchr = "Њ";
}
if(ord($xchr) == 143) {
$xchr = "Џ";
}
if(ord($xchr) == 141) {
$xchr = "Ќ";
}
if(ord($xchr) == 189) {
$xchr = "Ѕ";
}
if(ord($xchr) == 188) {
$xchr = "ј";
}
if(ord($xchr) == 131) {
$xchr = "ѓ";
}
if(ord($xchr) == 190) {
$xchr = "ѕ";
}
if(ord($xchr) == 154) {
$xchr = "љ";
}
if(ord($xchr) == 156) {
$xchr = "њ";
}
if(ord($xchr) == 159) {
$xchr = "џ";
}
if(ord($xchr) == 157) {
$xchr = "ќ";
}
$encode=$encode . $xchr;
}
return $encode;
}
zehyaat] yandex dotru
23-Mar-2006 12:15
Sorry for my previous post. NOT array_reverce, array_flip is actual function. Correct function:
function Encode($str,$type=u)
{
$conv=array();
for($x=192;$x<=239;$x++)
$conv[u][chr($x)]=chr(208).chr($x-48);
for($x=240;$x<=255;$x++)
$conv[u][chr($x)]=chr(209).chr($x-112);
$conv[u][chr(168)]=chr(208).chr(129);
$conv[u][chr(184)]=chr(209).chr(209);
$conv[w]=array_flip($conv[u]);
if($type=='w' || $type=='u')
return strtr($str,$conv[$type]);
else
return $str;
}
Sorry for my English ;)
zehya [at] yandex dotru
23-Mar-2006 11:58
cathody at mail dot ru(27-Jul-2005 06:41)
You function doesn't work on my PC..
It's work:
function Encode2($str,$type)
{
$conv=array();
for($x=192;$x<=239;$x++)
$conv[u][chr($x)]=chr(208).chr($x-48);
for($x=240;$x<=255;$x++)
$conv[u][chr($x)]=chr(209).chr($x-112);
$conv[u][chr(168)]=chr(208).chr(129);
$conv[u][chr(184)]=chr(209).chr(209);
$conv[w]=array_reverse($conv[u]);
if($type=='w' || $type=='u')
return strtr($str,$conv[$type]);
else
return $str;
}
sidor <sidor at sidor dot nnov dot ru>
09-Mar-2006 09:23
Sorry for my English
100% worked function for convertion string to utf-8. In this implementation support main cyrilic encodings (cp1251, koi8-r, cp866, mac) For supporting another codepages - just add needed codepage in $recode array (codes in UCS-4. Add just second part of codetable). Second argument for this function for cyrilic codepages - like in convert_cyr_string function ('k','w','a','d','m')
Writed in accordance with rfc2279
Created by Andrey A Sidorenko aka sidor
http://sidor.nnov.ru/str2utf.txt
Timuretis
06-Nov-2005 06:56
// Modificated by tapin13
// Corrected by Timuretis
// Convert win-1251 to utf-8
function unicode_russian($str) {
$encode = "";
// 1025 = "Ё";
// 1105 = "ё";
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
if (ord($xchr)>191) {
$xchr=ord($xchr)+848;
$xchr="&#" . $xchr . ";";
}
if(ord($xchr) == 168) {
// $xchr = "Ё";
$xchr = "Ё"; //!!!!!!!!!!!!!!!!!!!!!!!
}
if(ord($xchr) == 184) {
// $xchr = "ё";
$xchr = "ё"; //!!!!!!!!!!!!!!!!!!!!!!
}
$encode=$encode . $xchr;
}
return $encode;
}
tapin13 at atilian dot co dot il
18-Oct-2005 02:20
// Modificated by tapin13
// Convert win-1251 to utf-8
function unicode_russian($str) {
$encode = "";
// 1025 = "Ё";
// 1105 = "ё";
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
if (ord($xchr)>191) {
$xchr=ord($xchr)+848;
$xchr="&#" . $xchr . ";";
}
if(ord($xchr) == 168) {
$xchr = "Ё";
}
if(ord($xchr) == 184) {
$xchr = "ё";
}
$encode=$encode . $xchr;
}
return $encode;
}
webmaster at chassidus dot ru
29-Aug-2005 10:51
//I've also built the same way for hebrew to utf converting
function heb2utf($s) {
for($i=0, $m=strlen($s); $i<$m; $i++) {
$c=ord($s[$i]);
if ($c<=127) {$t.=chr($c); continue; }
if ($c>=224 ) {$t.=chr(215).chr($c-80); continue; }
}
return $t;
}
//Simple unicoder and decoder for hebrew and russian:
function unicode_hebrew($str) {
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
if (ord($xchr)>223) {
$xchr=ord($xchr)+1264;
$xchr="&#" . $xchr . ";";
}
$encode=$encode . $xchr;
}
return $encode;
}
function unicode_russian($str) {
for ($ii=0;$ii<strlen($str);$ii++) {
$xchr=substr($str,$ii,1);
if (ord($xchr)>191) {
$xchr=ord($xchr)+848;
$xchr="&#" . $xchr . ";";
}
$encode=$encode . $xchr;
}
return $encode;
}
function decode_unicoded_hebrew($str) {
$decode="";
$ar=split("&#",$str);
foreach ($ar as $value ) {
$in1=strpos($value,";"); //end of code
if ($in1>0) {// unicode
$code=substr($value,0,$in1);
if ($code>=1456 and $code<=1514) { //hebrew
$code=$code-1264;
$xchr=chr($code);
} else { //other unicode
$xchr="&#" . $code . ";";
}
$xchr=$xchr . substr($value,$in1+1);
} else //not unicode
$xchr = $value;
$decode=$decode . $xchr;
}
return $decode;
}
function decode_unicoded_russian($str) {
$decode="";
$ar=split("&#",$str);
foreach ($ar as $value ) {
$in1=strpos($value,";"); //end of code
if ($in1>0) {// unicode
$code=substr($value,0,$in1);
if ($code>=1040 and $code<=1103) {
$code=$code-848;
$xchr=chr($code);
} else {
$xchr="&#" . $code . ";";
}
$xchr=$xchr . substr($value,$in1+1);
} else
$xchr = $value;
$decode=$decode . $xchr;
}
return $decode;
}
cathody at mail dot ru
27-Jul-2005 06:41
Praising other people for their efforts to write a convenient UTF8 to Win-1251 functions may I mention that, since str_replace allows arrays as parameters, the function may be rewritten in a slightly efficient way (moreover, the array generated may be stored for performance improvement):
<?php
function Encode ( $str, $type )
{
static $conv='';
if (!is_array ( $conv ))
{
$conv=array ();
for ( $x=128; $x <=143; $x++ )
{
$conv['utf'][]=chr(209).chr($x);
$conv['win'][]=chr($x+112);
}
for ( $x=144; $x <=191; $x++ )
{
$conv['utf'][]=chr(208).chr($x);
$conv['win'][]=chr($x+48);
}
$conv['utf'][]=chr(208).chr(129);
$conv['win'][]=chr(168);
$conv['utf'][]=chr(209).chr(145);
$conv['win'][]=chr(184);
}
if ( $type=='w' )
return str_replace ( $conv['utf'], $conv['win'], $str );
elseif ( $type=='u' )
return str_replace ( $conv['win'], $conv['utf'], $str );
else
return $str;
}
?>
artyomch at coolfold dot com
26-Apr-2005 01:38
I needed a code for taking UTF8 encoded string from DB and printing it in Win1251 encoded HTML. The problem was that I had to print not just english & cyrillic characters, but all characters stored in UTF encoded string (in my case the DB contained english, russian & hebrew characters).
After reading carefully the UTF8 manual, I've written the following code, that converts all non-win1251 characters into html entities (&#XXXX;).
function utf8_2_win1251 ($str_src)
{
$str_dst = "";
$i = 0;
while ($i<strlen($str_src))
{
$code_dst = 0;
$code_src1 = ord($str_src[$i]);
$i++;
if ($code_src1<=127)
{
$str_dst .= chr($code_src1);
continue;
}
else
if (($code_src1 & 0xE0) == 0xC0)
{
$code_src2 = ord($str_src[$i++]);
if (($code_src2 & 0xC0) != 0x80)
continue;
$code_dst = ( ($code_src1 & 0x1F) << 6) + ($code_src2 & 0x3F);
}
else
if (($code_src1 & 0xF0) == 0xE0)
{
$code_src2 = ord($str_src[$i++]);
if (($code_src2 & 0xC0) != 0x80)
continue;
$code_src3 = ord($str_src[$i++]);
if (($code_src3 & 0xC0) != 0x80)
continue;
$code_dst = ( ($code_src1 & 0x1F) << 12) + ( ($code_src2 & 0x3F) << 6) + ($code_src3 & 0x3F);
}
else
if (($code_src1 & 0xF8) == 0xF0)
{
$code_src2 = ord($str_src[$i++]);
if (($code_src2 & 0xC0) != 0x80)
continue;
$code_src3 = ord($str_src[$i++]);
if (($code_src3 & 0xC0) != 0x80)
continue;
$code_src4 = ord($str_src[$i++]);
if (($code_src4 & 0xC0) != 0x80)
continue;
$code_dst = ( ($code_src1 & 0x1F) << 18) + ( ($code_src2 & 0x3F) << 12) + ( ($code_src3 & 0x3F) << 6) + ($code_src4 & 0x3F);
}
else
{
continue;
}
if ($code_dst)
{
if ($code_dst==0x401)
{
$str_dst .= "
|
|