Search code examples
phparrayspreg-matchstrpos

How would I compare two text files for matches with PHP


$domains = file('../../domains.txt');
$keywords = file('../../keywords.txt');

$domains will be in format of:

3kool4u.com,9/29/2013 12:00:00 AM,AUC
3liftdr.com,9/29/2013 12:00:00 AM,AUC
3lionmedia.com,9/29/2013 12:00:00 AM,AUC
3mdprod.com,9/29/2013 12:00:00 AM,AUC
3mdproductions.com,9/29/2013 12:00:00 AM,AUC

keywords will be in format of:

keyword1
keyword2
keyword3

I guess I would really like to do an array for keywords from a file and search each line of domains.txt for matches. Not sure where to start as I'm confused at the difference of preg_match, preg_match_all, and strpos and more or less when to use one over the other.

Thanks ahead for the help.


Solution

  • //EMPTY array to hold each line on domains that has a match
    $matches = array();
    
    //for each line on the domains file
    foreach($domains as $domain){
    
        //for each keyword
        foreach($keywords as $keyword){
    
              //if the domain line contains the keyword on any position no matter the case
              if(preg_match("/$keyword/i", $domain)) {
                        //Add the domain line to the matches array
                $matches[] = $domain;
              }     
         }   
    }
    

    Now you have the $matches array with all the lines of the domain file that match the keywords

    NOTE THAT WITH THE PREVIOUS APPROACH THE TWO ENTIRE FILES ARE LOADED INTO MEMORY AND DEPENDING ON THE FILE SIZES YOU CAN RUN OUT OF MEMORY OR THE OS WILL START USING THE SWAP WHICH IS MUCH SLOWER THAN RAM

    THIS IS ANOTHER AND MORE EFFICIENT APPROACH THAT WILL LOAD ONE LINE IF THE FILE AT THE TIME.

    <?php
    
    // Allow automatic detection of line endings
    ini_set('auto_detect_line_endings',true);
    
    //Array that will hold the lines that match
    $matches = array();
    
    //Opening the two files on read mode
    $domains_handle = fopen('../../domains.txt', "r");
    $keywords_handle = fopen('../../keywords.txt', "r");
    
        //Iterate the domains one line at the time
        while (($domains_line = fgets($domains_handle)) !== false) {
    
            //For each line on the domains file, iterate the kwywords file a line at the time
            while (($keywords_line = fgets($keywords_handle)) !== false) {
    
                  //remove any whitespace or new line from the beginning or the end of string
                  $trimmed_keyword = trim($keywords_line);
    
                  //Check if the domain line contains the keyword on any position
                  // using case insensitive comparison
                  if(preg_match("/$trimmed_keyword/i", trim($domains_line))) {
                        //Add the domain line to the matches array
                    $matches[] = $domains_line;
                  } 
            }
            //Set the pointer to the beginning of the keywords file
            rewind($keywords_handle);
        }
    
    //Release the resources
    fclose($domains_handle);
    fclose($keywords_handle);
    
    var_dump($matches);