Search code examples
phpsortingsearchkeywordmeta

Using PHP to find key words in text file arsort()


I am trying to use a script to search a text file and return words that meet certain criteria:

*The word is only listed once *They are not one words in an ignore list *they are the top 10% of the longest words *they are not repeating letters *The final list would be a random ten that met the above criteria. *If any of the above were false then words reported would be null.

I've put together the following but the script dies at arsort() saying it expects an array. Can anyone suggest a change to make arsort work? Or suggest an alternative (simpler) script to find metadata?**I realize this second question may be a question better suited for another StackExchange.

<?php
  $fn="../story_link";
  $str=readfile($fn);
    function top_words($str, $limit=10, $ignore=""){
        if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by"; 
        $ignore_arr = explode(" ", $ignore);
        $str = trim($str);
        $str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
        $str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
        $str = preg_replace("#\s+#sim", " ", $str);
        $arraw = explode(" ", $str);
        foreach($arraw as $v){
            $v = trim($v);
            if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
            $arr[$v]++;
        }
        arsort($arr);   
        return array_keys( array_slice($arr, 0, $limit) );
    }
    $meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>

Solution

  • The problem is when your loop never increments $arr[$v], which results in the possibility of $arr not becoming defined. This is the reason for your error because then arsort() is given null as its argument - not an array.

    The solution is to define $arr as an array before the loop for instances where $arr[$v]++; isn't executed.

    function top_words($str, $limit=10, $ignore=""){
        if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by"; 
        $ignore_arr = explode(" ", $ignore);
        $str = trim($str);
        $str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
        $str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
        $str = preg_replace("#\s+#sim", " ", $str);
        $arraw = explode(" ", $str);
        $arr = array(); // Defined $arr here.
        foreach($arraw as $v){
            $v = trim($v);
            if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
            $arr[$v]++;
        }
        arsort($arr);   
        return array_keys( array_slice($arr, 0, $limit) );
    }