I am working in a project where it is required to download a big file from a remote server which contains pipe separated data for about 5 million records.
After downloading is finished it's needed to load data into a database. Currently I have been working with MySQL database so I have not considered other options.
I use object oriented MySQLi and I call a LOAD DATA LOCAL INFILE
query.
It takes between an hour and half and 2 hours to finish and I require a way to display progress and the only option I found was set_local_infile_handler
method. Seems this method's purpose is to allow programmer to change the formatting of data before feeding it to the query, but being the only option I had found I want to use it for my progress purpose.
However all I get is:
PHP Fatal error: Uncaught Error: Call to undefined method mysqli::set_local_infile_handler() in C:\Repositories\project\tools\loaddata.php:65
I am asking for:
mysqli::set_local_infile_handler
in my codeI have tried a few ideas (only with a 100 thousand records):
new mysqli()
to mysqli_init()
and mysqli::real_connect()
but the only reason is because all examples with mysqli::set_local_infile_handler
use the latter form.mysqli.allow_local_infile = On
at php.ini
file, however I had no trouble running the code with the local infile query before that, but I expected that might make the troubling method visible. I stopped and started the server by the way.I called a second query
SHOW STATUS WHERE Variable_name in ('bytes_received','bytes_sent','innodb_buffer_pool_pages_data','innodb_buffer_pool_bytes_data','innodb_buffer_pool_pages_flushed','innodb_buffer_pool_read_requests','innodb_buffer_pool_reads','innodb_buffer_pool_write_requests','innodb_data_read','innodb_data_reads','innodb_data_writes','innodb_data_written','innodb_rows_inserted')
.
I kept all those variables because I haven't looked after each one exact meaning, I noticed only those variables changed meaningfully, but seems the last one might be enough. However I do this with separate files, I don't know yet if I can have two connections running in the same code and two threads to have the slowest running in background. In fact I was browsing for php mysqli callback function when I found mysqli::set_local_infile_handler
reference.
I have been with this the whole day so I must had tried some other stuff but either I don't remember them or they are irrelevant for the moment. Thanks in advance.
I dont see bugs in code, it works without local infile methods, but here it is:
require_once("connectvars.php");
$filepath = $_SERVER["argv"][1];
$bloqIdx=0;
$conn = mysqli_init();
$conn->real_connect($mysvr,$myusr,$mypwd,$mydb);
// $conn = new mysqli($mysvr,$myusr,$mypwd,$mydb);
if ($conn->connect_error) {
trigger_error("SQL".$conn->connect_error,E_USER_ERROR);
die("Connect Error");
}
function countData($stream, &$buffer, $buflen, &$errmsg) {
global $bloqIdx;
$len = strlen($buffer);
if ($bloqIdx%1000==0) echo ".";
return $len;
}
function getRowsInserted() {
global $conn;
$result = $conn->query("show status where Variable_name='innodb_rows_inserted')";
$rowsInserted=0;
if ($result && $conn->affected_rows>0 && $row = $result->fetch_assoc()) $rowsInserted=$row["Value"];
$result->close();
return $rowsInserted;
}
$conn->query("truncate mytable");
$riStart = getRowsInserted();
$start = time();
$query = "LOAD DATA LOCAL INFILE '$filepath' INTO TABLE mytable FIELDS TERMINATED by '|' LINES TERMINATED BY '\n'";
$conn->set_local_infile_handler("countData");
$conn->query($query);
$conn->set_local_infile_default();
$minutes = abs(time() - $start)/60;
$riEnd = getRowsInserted()-$riStart;
echo "Finished Loading $riEnd rows from $filepath for ".round($minutes,2)." minutes\n";
$conn->close();
By commenting lines:
$conn->set_local_infile_handler("countData");
and
$conn->set_local_infile_default();
code works but there is no progress info.
At http://php.net/manual/en/mysqli.set-local-infile-handler.php valid versions apparently are (PHP 5, PHP 7)
If you really need to test the code the text file as argument can be something like:
1|one|alpha|C|2012-10-21 17:44:18
2|two|beta|C|2013-02-05 12:23:57
3|three|gamma|C|2012-12-10 07:18:09
4|four|delta|X|2012-11-27 11:51:32
5|five|phi|C|2013-01-07 14:03:29
And table script:
create table `mytable` (
`id` INT NOT NULL,
`num` CHAR(10) NULL,
`code` CHAR(13) NULL,
`status` CHAR(1) NULL,
`registered` DATETIME NULL,
INDEX `mycode` (`code` ASC, `registered` ASC),
PRIMARY KEY (`id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_bin;
You could just separate these into 2 different processes.
The first process is run every minute by a cron job and looks for the presence of some sort of trigger (like a tmp file). When it finds the trigger, it deletes the trigger (tmp file), downloads the large file, renames the file to a format that contains the total number of rows to be inserted, then begins inserting the rows into the database.
The second process is the gui. It provides the user with a button that creates the trigger (tmp file), then periodically checks for the existence of the download file. Once the download file is renamed to the given format by the first process, it can parse the filename for the total number of rows, then periodically query the database to determine the existing number of rows and present that to the user as a percentage complete.