Search code examples
phptextcount

PHP count and seperate text from text file


Let's say inside the text file is this information:

<Amanda> Hi there, how are you?
<Jack> Hi, im fine 
.
.
.
.
<Jack> see you later

I want to count the words each user have said the output should be for example like this

Amanda: 50
Jack: 40

First I want to not count the <Amanda> or <Jack> and next I want to count every words they said and insert it to variables Amanda and Jack

This is what I have done

    $usercount1 = 0;
    $usercount2 = 0;  

    //Opens a file in read mode  
    $file = fopen("logfile.txt", "r");  
    //Gets each line till end of file is reached  
    while (($line = fgets($file)) !== false) {  
        //Splits each line into words
        $words = explode(" ", $line);  
        $words = explode("<Amanda>", $line);  
        //Counts each word  
        $usercount1 = $usercount1 + count($words);  
    }

    while (($line = fgets($file)) !== false) {  
        //Splits each line into words  
        $words = explode(" ", $line);
        //Counts each word  
        $usercount2 = $usercount2 + count($words);  
    } 

Solution

  • I would go a more general approach. This way you can analyze all users. Using a blacklist, just exclude them.

    • First go through all the lines and match for username and text.
    • Rebuild data structure by iterating and counting up using a blacklist.

    The blacklist is formatted like this, because finding keys is faster than finding values.

    $input = <<<'_TEXT'
    <Amanda> Hi there, how are you?
    <Jack> Hi, im fine
    <Jack> see you later
    <John> Hello World, my friends!
    <Daniel> Foo!
    _TEXT;
    preg_match_all('/^<([^>]+)>(.*?)$/m', $input, $matches);
    
    $blacklist = ['Amanda' => 1, 'Jack' => 1];
    $words = [];
    foreach ($matches[2] as $index => $match) {
        $user = $matches[1][$index];
        if (isset($blacklist[$user])) {
            continue;
        }
        $words[$user] = ($words[$user] ?? 0) + str_word_count($match);
    }
    print_r($words);
    
    Array
    (
        [John] => 4
        [Daniel] => 1
    )