I currently have the problem that I get "Allowed memory size of 134217728 bytes exhausted " when writing with PHP into an csv file. Before throwing more hardware at it I would rather check with you if what I am doing is the best way to do it.
What I am doing is fetching data chunks from the Db with offset and limit and write the result with fputcsv into a file.
This is what I am doing:
Checking if the target folder exists and checking via a lockfile if the export is already running. If those checks pass I open the file handle with:
$this->handle = @fopen($filePath, 'wb+');
$resultData = $this->getDatasets($offset);
while ($resultData) {
foreach ($resultData as $dataArray) {
$this->writer->addRow($dataArray);
++$offset;
}
$resultData = $this->getDatasets($offset);
}
$this->writer->close();
getDatasets() always fetches 100 datasets from the db. addRow() is using fputcsv
public function addRow(array $row): bool|int
{
if (false !== $this->handle) {
return fputcsv($this->handle, $row, ';', '"', '\\', "\n");
}
return false;
}
public function close(): bool
{
if ($this->isOpen()) {
return fclose($this->handle);
}
return false;
}
I thought that with this setup I should never have more than 100 datasets in memory and since each line is written to the file that also should be no problem. But maybe the opened file is completely in the memory?
The file is written into the file system, since it should be only generated once per day and not for each request.
What can I improve to avoid the memory problem?
We are talking about 40000 datasets. So yes, that is a lot of data but not millions of datasets.
Setup: php 8.2, nginx server with memory: 1024M
Fetching the data symfony project using doctrine:
public function getDatasets(int $offset = 0, int $chunkSize = 100): array
{
$foos = $this->fooRepo->findBy(
['active' => true],
['familyName' => 'ASC'],
$chunkSize,
$offset
);
$i = 1;
$resultData = [];
/** @var Foo $foo */
foreach ($foos as $foo) {
$resultData[] = $foo->__toArray();
++$i;
}
$total = $offset + $i;
return $resultData;
}
A colleague helped me to identify the problem. The code as posted above is totally fine. The problem was that doctrine is keeping the entity data in the memory.
So I post the improved method getDatasets() here:
public function getDatasets(int $offset = 0, int $chunkSize = self::CHUNK_SIZE): array
{
$suspects = $this->suspectRepo->findBy(
['active' => true],
['familyName' => 'ASC'],
$chunkSize,
$offset
);
$i = 1;
$resultData = [];
/** @var Suspect $suspect */
foreach ($suspects as $suspect) {
$resultData[] = $suspect->__toArray();
// detach empties each entity information from the memory after use
$this->entityManager->detach($suspect);
// to be sure I also unset the object via php
unset($suspect);
++$i;
}
return $resultData;
}
With these adaptions my code did run without problems. So yes the code in my question is fine and with those improvements with the data management it runs as desired!