How can I search a .tsv file for multiple matches to a string and export them to a database?
What I'm trying to do is search a large file called mdata.tsv
(1.5m lines) for a string given to it from an array. Afterwards output matching columns data.
The current code is what I've gotten stuck at:
<?php
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
array_push($movID, $xml->id);
}
//Loop through the TSV rows and search for the $tmdbID then print out the movies category.
foreach ($movID as $tmdbID) {
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
fseek($file,0);
$myString = $row[0];
$b = strstr( $myString, $tmdbID );
//Dump out the row for the sake of clarity.
//var_dump($row);
$myString = $row[0];
if ($b == $tmdbID){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}
}
fclose($file);
?>
Example of tsv file:
tt0043936 movie The Lawton Story The Lawton Story 0 1949 \N \N Drama,Family
tt0043937 short The Prize Pest The Prize Pest 0 1951 \N 7 Animation,Comedy,Family
tt0043938 movie The Prowler The Prowler 0 1951 \N 92 Drama,Film-Noir,Thriller
tt0043939 movie Przhevalsky Przhevalsky 0 1952 \N \N Biography,Drama
It looks as though you can simplify this code by using in_array()
instead of the nested loops to see if the current line is in the list of required ID's. The one change needed to make sure this works is that you need to ensure that you store strings in the $movID
array.
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
// Store ID as string
$movID[] = (string) $xml->id;
}
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
if ( in_array($row[0], $movID) ){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}