Search code examples
phparraysrecursive-regexfind-occurrences

search array for duplicates php


It's been years since I've used PHP and I am more than a little rusty. I am trying to write a quick script that will open a large file and split it into an array and then look for similar occurrences in each value. For example, the file consist of something like this:

Chapter 1. The Beginning 
 Art. 1.1 The story of the apple
 Art. 1.2 The story of the banana
 Art. 1.3 The story of the pear
Chapter 2. The middle
 Art. 1.1 The apple gets eaten
 Art. 1.2 The banana gets split
 Art. 1.3 Looks like the end for the pear!
Chapter 3. The End
…

I would like the script to automatically tell me that two of the values have the string "apple" in it and return "Art. 1.1 The Story of the apple" and "Art. 1.1 The apple gets eaten", and then also does the same for the banana and pear.

I am not looking to search through the array for a specific string I just need it to count occurrences and return what and where.

I have already got the script to open a file and then split it into an array. Just cant figure out how to find similar occurrences.

<?php
$file = fopen("./index.txt", "r");
$blah = array();
while (!feof($file)) {
   $blah[] = fgets($file);
}
fclose($file);

var_dump($blah);
?>

Any help would be appreciated.


Solution

  • This solution is not perfect as it counts every single word in the text, so maybe you will have to modify it to better serve your needs, but it gives accurate statistic about how many times each word is mentioned in the file and also exactly on which rows.

    $blah = file('./index.txt') ;
    
    $stats = array();
    foreach ($blah as $key=>$row) {
        $words = array_map('trim', explode(' ', $row));
        foreach ($words as $word)
            if (empty($stats[$word]))  {
                $stats[$word]['rows'] = $key.", ";
                $stats[$word]['count'] = 1;
            } else {
                $stats[$word]['rows'] .= $key.", ";
                $stats[$word]['count']++;
            }
    }
    print_r($stats);
    

    I hope this idea will help you to get going on and polish it further to better suit your needs!