I need to extract a tar.gz file in PHP. The file contains many JSON files, tar.gz, zip files, and subdirectories. I need to move only the JSON files to a directory ./Dataset/processing and keep extracting the nested tar.gz and zip to get all the JSON files from there. Those files could also have nested folders/ directories.
The structure is like the following:
origin.tar.gz
├───sub1.tar.gz
│ ├───sub2.tar.gz
│ ├───├───a.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───b.json
│ ├───c.json
├───sub3.zip
│ ├───sub4.tar.gz
│ ├───├───d.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───e.json
│ ├───f.json
├───subdirectory
│ ├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
Once it gets extracted ./Dataset will look like this
Dataset/processing
├───a.json
├───b.json
├───c.json
├───d.json
├───e.json
├───f.json
├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
I know how to extract a tar.gz using PharData in PHP, but it works only at a single level depth. I was thinking if some kind of recursion could make this work for multi-level depth.
$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz
I have refined my code a bit and tried this, it works for multi-depth but fails when there is a directory(folder or nested folders) that also contains JSON. Can someone help me to extract them as well.
<?php
$path = './';
// Extraction of compressed file
function fun($path) {
$array = scandir($path);
for ($i = 0; $i < count($array); $i++) {
if($i == 0 OR $i == 1){continue;}
else {
$item = $array[$i];
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
$pathnew = $path.$item; // Dataset ./data1.tar.gz
$phar = new PharData($pathnew);
// Moving the files
$phar->extractTo($path);
// Del the files
unlink($pathnew);
$i=0;
}
}
$array = scandir($path);
}
}
fun($path);
// Move only the json to ./dataset(I will add it later)
?>
Thanks in advance.
I solved it after doing a bit of research. This solves the problem.
There are 3 functions:
<?php
// Root directory
$path = './';
// Directory where I want to extract the JSON files
$path_json = $path.'processing/';
// Function to extract all the compressed files
function recursiveScanProtected($dir, $conn) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
for ($i = 0; $i < count($tree); $i++) {
$file = $tree[$i];
if (is_dir($file)) {
recursiveScanProtected($file, $conn); // Recursive call if directory
} elseif (is_file($file)) {
$item = $file;
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
// Check if the file is a zip or a tar.gz
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
// Moving the file - Overwriting true
$phar->extractTo($dir.$i."/", null, true);
// Del the compressed file
unlink($item);
recursiveScanProtected($dir.$i, $conn); // Recursive call
}
}
}
}
}
}
recursiveScanProtected($path, $conn);
// Move the JSON files to processing
function scanJSON($dir, $path_json) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
foreach($tree as $file) {
if (is_dir($file)) {
// Do not scan processing recursively, but all other directories should be scanned
if($file != './processing'){
scanJSON($file, $path_json);
}
} elseif (is_file($file)) {
$ext = pathinfo($file);
if(strtolower($ext['extension']) == 'json'){
// Move the JSON files to processing
rename($file, $path_json.$ext['basename']);
}
}
}
}
}
}
scanJSON($path, $path_json);
/*
* php delete function that deals with directories recursively
* It deletes everything except ./dataset/processing and index.php
*/
function delete_files($target) {
if(is_dir($target)){
$files = glob( $target . '*', GLOB_MARK ); //GLOB_MARK adds a slash to directories returned
foreach( $files as $file ){
if($file == './processing/' || $file == './index.php'){
continue;
} else{
delete_files( $file );
}
}
if($target != './'){
rmdir( $target );
}
} elseif(is_file($target)) {
unlink( $target );
}
}
delete_files($path);
?>