I have an export of Customer Records that needed to be split up over several chunks of 500 records. I grab each chunk through a REST request, save it to my server:
public function createImportFile($json)
{
$filePath = storage_path().'/import/'.$this->getImportFileName($this->import->chunkNumber);
$importFile = fopen($filePath, 'w');
$array = json_decode($json);
fwrite($importFile, $json);
fclose($importFile);
return $filePath;
}
Then after grabbing all of the chunks, I import all of the records. I'm wondering what the best way would be to find the Nth record among all the chunks?
Currently, I divide the record number that I'm looking for by the total number of chunks to find out which chunk the record will be in. Then, I get the total records for the previous chunks and subtract this number from the record number to get the record's position in the chunk.
while ($this->recordNumber <= $this->totalRecords) {
$item = $this->getRecord($this->recordNumber);
if (empty($item)) {
$this->recordNumber++;
continue;
}
$results = $this->translateItem($item);
$this->recordNumber++;
}
public function getRecord($recordNumber)
{
if ($this->import->isChunkedImport()) {
$chunkNumber = (integer) $this->returnChunkFromRecordNumber($recordNumber);
$countInPrevChunks = intval($this->returnRecordCountForPrevChunks($chunkNumber));
$chunkPosition = intval($this->getChunkPosition($recordNumber, $countInPrevChunks));
$jsonObj = $this->getJsonObjectForChunkNumer($chunkNumber);
return $jsonObj[$chunkPosition];
}
else {
$chunkPosition = $this->getChunkPosition($recordNumber, 0);
$filePath = storage_path().'/import/'.$this->getImportFileName();
return (array) json_decode(file_get_contents($filePath))[$chunkPosition];
}
}
private function &getJsonObjectForChunkNumer($chunkNumber)
{
if ($this->currentFileArray == null || ($chunkNumber != $this->lastChunkNumber)) {
$filePath = storage_path().'/import/'.$this->getImportFileName($chunkNumber);
$this->currentFileArray = json_decode(file_get_contents($filePath), true);
$this->lastChunkNumber = $chunkNumber;
}
return $this->currentFileArray;
}
public function getChunkCount()
{
$filePath = storage_path().'/import/'.$this->getImportFileName();
return count(json_decode(file_get_contents($filePath)));
}
public function returnChunkFromRecordNumber($recordNumber)
{
if ($recordNumber >= $this->getChunkCount()) {
if (is_int($recordNumber/$this->getChunkCount())) {
if (($recordNumber/$this->getChunkCount()) == 1) {
return intval(1);
}
return intval(($recordNumber/$this->getChunkCount())-1);
}
else {
return intval($recordNumber/$this->getChunkCount());
}
}
else {
return intval(0);
}
}
public function getChunkPosition($recordNumber, $countInPrevChunks)
{
$positionInChunk = $recordNumber - $countInPrevChunks;
if ($positionInChunk == 0) {
return $positionInChunk;
}
return $positionInChunk - 1;
}
public function returnRecordCountForPrevChunks($chunkNumber)
{
if ($chunkNumber == 0) {
return 0;
}
else {
return $this->getChunkCount() * $chunkNumber;
I try to account for the first key for both Chunks and Records in the Chunks being 0, but I'm still missing the last record of the import. It also seems like I might be making this more complicated than it needs to be. I was wondering if anyone had advice or a more simple way to grab the Nth record. I thought about possibly just numbering the records as I bring them in with the REST request, then I could find the Chunk containing the record number as an array key and then return that record:
public function createImportFile($json)
{
$filePath = storage_path().'/import/'.$this->getImportFileName($this->import->chunkNumber);
$importFile = fopen($filePath, 'w');
if ($this->import->chunkNumber == 0 && $this->recordNumber == 0) $this->recordNumber = 1;
$array = json_decode($json);
$ordered_array = [];
foreach ($array as $record) {
$ordered_array[$this->recordNumber] = $record;
$this->recordNumber++;
}
fwrite($importFile, json_encode($ordered_array));
fclose($importFile);
return $filePath;
}
But I wasn't sure if that was the best approach.
With a lot of records, you could use a database table. MySQL would easily handle tens of thousands of records. You wouldn't even need to store the whole records. Perhaps just:
record_no | chunk_no | position_in_chunk
record_no
: Primary key. Unique identifier for this recordchunk_no
: Which chunk contains the recordposition_in_chunk
: Where within the chunk is the record locatedPut a UNIQUE(chunk_no, position_in_chunk)
index on the table.
Then as you pull records, assign them a number, build up the DB table, and save the table as you write records to disk. In the future, to get a specific record, all you'll need is its number.
If you don't want to use a database, you can also store this data as a JSON file, though retrieval performance will suffer from having to open and parse a big JSON file each time.