27 07 2010

Someone sent me recently not one but two lengthy comments on Algorithm Season which unfortunately lacked a valid email address. However, I think it is worthy to raise some of the issues that sender puts forth. He/she explains that neither solution scales well because of iterating through the string twice. For a simple string of ten letters, the effect is probably negligible. But, given a string that is one or two million characters long, using either of the suggested algorithms in the previous post would likely pose a real scalability problem. So, what to do?

Sender suggests the following as a superior solution:

```// Algorithm #3
function algo3(\$string) {
\$totals = array();
\$i = 0;
while (isset(\$string{\$i}))
{
\$totals[\$string{\$i}] += 1;
\$i++;
}
}
```

The code sets two variables initially, one an array to hold the totals for occurrence of each letter in \$string and initializes \$i. There is only one loop and it is a while-loop which tests whether for any position \$i in \$string there is a letter. The while loop dynamically creates the totals array by using the set of alphabetical letters in string as indices. If a letter occurs more than once, the corresponding index of \$totals is incremented by 1.

Sender could have had a simple one liner in the body of the loop with a post-increment operator such as:

`\$totals[ \$string{\$i++} ] += 1;`

or even

`\$totals[ \$string{\$i++} ]++;`

Sender’s style is superior to either of these one-liners since the code in the while loop body is far more legible.

When I ran sender’s code with PHP5.3, it worked and was fast, but I got notices for each index that was dynamically created, saying “Undefined index ….” I could screen them out by adjusting the error reporting, except the nagging thought “Isn’t there a better way?” caused me to google for other ideas. I discovered others who are also dealing with this issue at: http://stackoverflow.com/questions/1242184/how-to-get-rid-of-hundreds-of-php-undefined-index-notices

I took the liberty of adding to sender’s code by testing for whether the total’s index is set, as follows:

```function algo3(\$string) {
\$totals = array();
\$i = 0;
while ( isset( \$string{\$i} ) )
{
if (!isset(\$totals[ \$string{\$i}])) {
\$dex = \$string{\$i};
\$totals [ \$dex ] = 0;
}
\$totals[ \$string{\$i} ] += 1;
\$i++;
}
return \$totals;
}
```

Now the code runs without those pestiferous notices. But, one can criticize that I’m calling isset() twice each time through the loop, thereby probably creating an inefficiency, especially if we’re dealing with a string of a million or more letters. Can’t we do better? How about the following:

```function myalg(\$string) {
\$totals = array();
for (\$i=0, \$max = strlen(\$string); \$i < \$max; \$i++ )
{
// get rid of undefined index notices in PHP5.3
if (!isset(\$totals[ \$string{\$i}])) {
\$dex = \$string{\$i};
\$totals [ \$dex ] = 0;
}

\$totals[ \$dex ] += 1;

}
return \$totals;
}
```

Note that strlen() is called only once to initialize \$max in the for loop. The rest of the time isset() is called and called only once.

If you have any comments on this blog’s post or any others, I welcome your comments. If you do comment please be sure to provide a valid email address.

This work is licensed under a .

### Information

This site uses Akismet to reduce spam. Learn how your comment data is processed.