|
|
soundex (PHP 3, PHP 4, PHP 5) soundex -- Возвращает ключ soundex для строки Описаниеstring soundex ( string str )
Возвращает ключ soundex для строки str.
Двум словам, имеющим схожее произношение, соответствует один и тот же
ключ soundex. Это свойство может быть использовано, например, при
поиске по базе даных, когда известно произношение слова и неизвестно
его написание. Данная функция возвращает строку из 4 символов,
начинающуюся с буквы.
Данная реализация функции soundex описана Дональдом Кнутом (Donald
Knuth) в книге "The Art Of Computer Programming, vol. 3: Sorting And
Searching", Addison-Wesley (1973), стр. 391-392.
Пример 1. Примеры soundex |
<?php
soundex("Euler") == soundex("Ellery"); soundex("Gauss") == soundex("Ghosh"); soundex("Hilbert") == soundex("Heilbronn"); soundex("Knuth") == soundex("Kant"); soundex("Lloyd") == soundex("Ladd"); soundex("Lukasiewicz") == soundex("Lissajous"); ?>
|
|
См. также описание функций
levenshtein(),
metaphone() и
similar_text().
shortcut
13-Nov-2006 11:24
The answer to whether soundex works except for the first letter in klancy vs clancy is to always prefix words with the same letter.
aklancy will match aclancy
bklancy will match bclancy
soundex seems to only check the 1st 2 syllables.??
ie: spectacular matches spectacle
just a thought if you rely on soundex.
k-
04-Oct-2005 12:25
Since the first letter is included in the phonetic representation in the output, it is worth pointing out that if you want a soundex key to work without the problems of klansy and clansy sounding different, take the substring from the first letter, as the first letter is the main constant of the word, and the numerical value is that of the phontic structure of the word.
crchafer-php at c2se dot com
13-Sep-2005 08:25
Rewritten, maybe -- but the algorithm has some obvious
optimisations which can be done, for example...
function text__soundex( $text ) {
$k = ' 123 12 22455 12623 1 2 2';
$nl = strlen( $tN = strtoupper( $text ) );
$p = trim( $k{ ord( $tS = $tN{0} ) - 65 } );
for( $n = 1; $n < $nl; ++$n )
if( ( $l = trim( $k{ ord( $tN{ $n } ) - 65 } ) ) != $p )
$tS .= ( $p = $l );
return substr( $tS . '000', 0, 4 );
}
// Notes:
// $k is the $key, essentially $SoundKey inverted
// $tN is the uppercase of the text to be optimised
// $tS is the partaully generated output
// $l is the current letter, $p the previous
// $n and $nl are iteration indicies
// 65 is ord('A'), precalculated for speed
// none ascii letters are not supported
// watch the brackets, quite a mixture here
(Code has suffered only basic tests, though it appears to
match the output of PHP's soundex(), speed untested --
though this should be /much/ faster than a4_perfect's
rewrite due to the removal of most loops and compares.)
C
2005-09-13
a4_perfect at mail dot ru
01-Aug-2005 06:18
Even be rewritten, function of [administrator at zinious dot com] is slower than soundex() for approx 30 times:
<?php
function MakeSoundEx($stringtomakesoundexof)
{
$temp_Name = strtoupper($stringtomakesoundexof);
$SoundKey = array(1=>"BPFV", "CSKGJQXZ", "DT", "L", "MN", "R", "AEHIOUWY");
$temp_Last = "";
$temp_Soundex = substr($temp_Name, 0, 1);
for ($x = 1; $x <= sizeof($SoundKey); $x++)
for ($i = 0; $i < strlen($SoundKey[$x]); $i++)
if ($temp_Soundex == substr($SoundKey[$x], $i - 1, 1))
$temp_Last = (string)($x==7?"":$x);
for ($n = 1; $n < strlen($temp_Name); $n++)
if (strlen($temp_Soundex) < 4)
{
for ($x = 1; $x <= sizeof($SoundKey); $x++)
for ($i = 0; $i < strlen($SoundKey[$x]); $i++)
if (substr($temp_Name, $n-1, 1)==substr($SoundKey[$x], $i-1, 1))
{
if($x<7 && $temp_Last!=(string)$x)
$temp_Soundex = $temp_Soundex.$x;
$temp_Last = (string)($x);
}
}
return $temp_Soundex . str_repeat("0", 4-strlen($temp_Soundex));
}
?>
justin at NO dot blukrew dot SPAM dot com
21-Sep-2004 04:18
I originally looked at soundex() because I wanted to compare how individual letters sounded. So, when pronouncing a string of generated characters it would be easy to to distinguish them from eachother. (ie, TGDE is hard to distinguish, whereas RFQA is easier to understand). The goal was to generate IDs that could be easily understood with a high degree of accuracy over a radio of varying quality. I quickly figured out that soundex and metaphone wouldn't do this (they work for words), so I wrote the following to help out. The ID generation function iteratively calls chrSoundAlike() to compare each new character with the preceeding characters. I'd be interested in recieving any feedback on this. Thanks.
<?php
function chrSoundAlike($char1, $char2, $opts = FALSE) {
$char1 = strtoupper($char1);
$char2 = strtoupper($char2);
$opts = strtoupper($opts);
switch ($opts) {
case 'NUMBERS':
$sets = array(0 => array('A', 'J', 'K'),
1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
2 => array('F', 'S', 'X'),
3 => array('I', 'Y'),
4 => array('M', 'N'),
5 => array('Q', 'U', 'W'));
break;
case 'STRICT':
$sets = array(0 => array('A', 'J', 'K'),
1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
2 => array('F', 'S', 'X'),
3 => array('I', 'Y'),
4 => array('M', 'N'),
5 => array('Q', 'U', 'W'));
break;
case 'BOTH':
$sets = array(0 => array('A', 'J', 'K'),
1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
2 => array('F', 'S', 'X'),
3 => array('I', 'Y'),
4 => array('M', 'N'),
5 => array('Q', 'U', 'W'));
break;
default:
$sets = array(0 => array('A', 'J', 'K'),
1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
2 => array('F', 'S', 'X'),
3 => array('I', 'Y'),
4 => array('M', 'N'),
5 => array('Q', 'U'));
break;
}
$matchset = array();
for ($i = 0; $i < count($sets); $i++) {
if (in_array($char1, $sets[$i])) {
$matchset = $sets[$i];
}
}
if (in_array($char2, $matchset) OR $char1 == $char2) {
return TRUE;
} else {
return FALSE;
}
}
?>
mail at gettheeawayspam dot iaindooley dot com
10-Jul-2003 11:04
The soundex 'different letter in front' problem can be solved by using levenshtein() on the soundex codes. in my application, which is searching a database of album names for entries that match a particular user provided string, i do the following:
1. Search the database for the exact name
2. Search the database for entries where the name occurs anyway as a string
3. Search the database for entries where any of the words in the name (if the user has typed in more than one word) is present, except for little words (and, the, of etc)
4. Then, if all this fails, I go to plan b:
- calculate the levenshtein distance (levenshtein()) between the user search term and each of the entries in the database as a percentage of the length of the user search term entered
- calculate the levenshtein distance between the metphone codes of the user search term entered and each field in the database as a percentage of the length of the metaphone code of the user search term entered
- calculate the levenshtein distance between the soundex codes of the user search term entered and each field in the database as a percentage of the length of the soundex code of the original user search term entered
if any of these percentages is less than 50 (means that two soundex codes with different first letters will be accepted!!) then the entry is accepted as a possible match.
php.net AT djwice DoT com
25-Jun-2003 05:01
Ik made the Soundex in JavaScript.
http://www.vanderharg.nl/soundex.php
Explanation of the algoritm is on the above page.
It returns two values if a name has "van der" or something alike in it. One with that in the Soundex test and one without.
The use of regular expressions makes the ectual soundex algoritm short. Two conditions of the algoritm I did remove because in this implementation they are redundant.
<script language="javascript">
var koppelteken = ""; // kan ook - zijn.
var vv=new Array(
"de la ",
"in het ",
"in 't ",
"op den ",
"op het ",
"op de ",
"op te ",
"op 't ",
"up te",
"uit de ",
"van den ",
"van der ",
"van het ",
"van de ",
"van 't ",
"opte ",
"upte ",
"con ",
"den ",
"der ",
"ten ",
"ter ",
"van ",
"de ",
"di ",
"du ",
"la ",
"le ",
"te ",
"vd ",
"l' ",
"l'",
"'t ");
function removePrefix(name)
{
i=0;
var strippedresult = "";
while ((strippedresult=="")&&(i<vv.length))
{
if (name.substr(0,vv[i].length)==vv[i].toUpperCase())
strippedresult = name.substr(vv[i].length);
i++;
}
return strippedresult;
}
function Soundex(name)
{
if (name.length>1)
{
// zet om naar hoofdletters.
name=name.toUpperCase();
// converteer leestekens
re = new RegExp ('[
|
|