Search code examples
phparrayssortingrankingarray-merge

PHP array not merging


I have this code that performs word ranking from text files. It opens the file and outputs an array of how many times every word within the file appears. This part works well, but on the second part, the code is to then look through every other text file within the given folder and output how many times every word appears as a total from all the files. The issue is the output array is not the merged total. There are repetitions. For instance, I get -

the -- 2
quick -- 1
brown -- 1
fox -- 1
jumped -- 1
over -- 1
lazy -- 1
dog -- 1
dog -- 2
a -- 2
lazy -- 1
fox -- 1
cannot -- 1
catch -- 1
fast -- 1
the -- 1
may -- 1
be -- 1

Instead of -

the -- 3
dog -- 3
fox -- 2
lazy -- 2
a -- 2
quick -- 1
brown -- 1
jumped -- 1
over -- 1
very -- 1
cannot -- 1
catch -- 1
fast -- 1
may -- 1
be -- 1

This is the entire code-

<?php
echo "<h3>Word Rank From One File</h3>";
$counted = strtolower(file_get_contents("docs/one.txt"));
$wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
$wordFrequencyArray = array_count_values($wordArray);

/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);

/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray,0,10);

/* display them */
foreach ($top10words as $topWord => $frequency)
    echo "$topWord --  $frequency<br/>";

echo "<h3>Total From All Files</h3>";
$path = realpath('docs');
foreach(glob($path.'/*.*') as $file) {
    $counted = strtolower(file_get_contents($file));
    $wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
    $wordFrequencyArray = array_count_values($wordArray);
    $combine = array_merge($wordFrequencyArray);
    /* Sort array from higher to lower, keeping keys */
    arsort($wordFrequencyArray);

    /* grab Top 10, huh sorted? */
    $top10words = array_slice($wordFrequencyArray,0,10);

    /* display them */
    foreach ($top10words as $topWord => $frequency)
        echo "$topWord --  $frequency<br/>";
    }

?>

What am I doing wrong or not doing? The two sample text files have;

The quick brown fox jumped over the lazy dog. The dog that the fox jumped ran so fast afterwards.

and

A lazy fox cannot catch a fast dog. The dog may be very quick. I noticed too that some words have been skipped.


Solution

  • You must aggregate all words from your files, and then count its frequencies.

    $wordArrayTotal = [];
    foreach (glob($path.'/*.*') as $file) {
        $counted = strtolower(file_get_contents($file));
        $wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
        $wordArrayTotal = array_merge($wordArrayTotal, $wordArray);
    }
    
    $wordFrequencyArray = array_count_values($wordArrayTotal);
    
    /* Sort array from higher to lower, keeping keys */
    arsort($wordFrequencyArray);
    
    /* grab Top 10, huh sorted? */
    $top10words = array_slice($wordFrequencyArray, 0, 10);
    
    /* display them */
    foreach ($top10words as $topWord => $frequency) {
        echo "$topWord --  $frequency<br/>";
    }