Search code examples
phploopsfgetcsv

Searching a file for multiple strings and output the data


How can I search a .tsv file for multiple matches to a string and export them to a database?

What I'm trying to do is search a large file called mdata.tsv (1.5m lines) for a string given to it from an array. Afterwards output matching columns data.

The current code is what I've gotten stuck at:

<?php 

$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){ 
    $pos = strrpos($movie, '/');
    $xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
    array_push($movID, $xml->id);

}

//Loop through the TSV rows and search for the $tmdbID then print out the movies category.
foreach ($movID as $tmdbID) { 
    while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
        fseek($file,0);
        $myString = $row[0];

        $b = strstr( $myString, $tmdbID );
        //Dump out the row for the sake of clarity.
        //var_dump($row);
        $myString = $row[0];
        if ($b == $tmdbID){
            echo 'Match ' . $row[0] .' '. $row[8];
        }       // Displays movie ID and category
    }
    }

fclose($file);

?>

Example of tsv file:

tt0043936   movie   The Lawton Story    The Lawton Story    0   1949    \N  \N  Drama,Family
tt0043937   short   The Prize Pest  The Prize Pest  0   1951    \N  7   Animation,Comedy,Family
tt0043938   movie   The Prowler The Prowler 0   1951    \N  92  Drama,Film-Noir,Thriller
tt0043939   movie   Przhevalsky Przhevalsky 0   1952    \N  \N  Biography,Drama

Solution

  • It looks as though you can simplify this code by using in_array() instead of the nested loops to see if the current line is in the list of required ID's. The one change needed to make sure this works is that you need to ensure that you store strings in the $movID array.

    $file = fopen("mdata.tsv","r"); //open file
    $movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
    $movID = array(); //Array for movies IDs
    //Get XML and add the IDs to $movID()
    foreach ($movies as $movie){
        $pos = strrpos($movie, '/');
        $xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
        // Store ID as string
        $movID[] = (string) $xml->id;
    }
    
    while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
        if ( in_array($row[0], $movID) ){
            echo 'Match ' . $row[0] .' '. $row[8];
        }       // Displays movie ID and category
    }