Search code examples

Extracting an archive created via Java RandomAccessFile with PHP

I'm trying to recreate a long lost PHP website. One of the pages of this website allowed employees to upload archive files that were created by a local script they executed. The webserver would then extract the contents into separate files to be stored in different folders for other purposes.

Thankfully I have the script that created the archives, but it is in Java. I imagine it can be reversed though? The script they used would basically just run the below addFile on multiple file paths.

public class Archive {
    static void create(File f) throws IOException {
        BufferedOutputStream w = new BufferedOutputStream(new FileOutputStream(f));
        w.write(new byte[]{1, 3, 3, 7});
        w.write(new byte[4]);

    static int addFile(File archive, File add, String name) throws IOException {
        if (!add.exists()) {
            throw new IOException("File to be added does not exist!");
        if (add.isDirectory()) {
            throw new IOException("Cannot add directories!");
        if (!archive.exists()) {
        if (archive.isDirectory()) {
            throw new IOException("Archive is no valid archive!");
        RandomAccessFile r = new RandomAccessFile(archive, "rw");
        int code = r.readInt();
        if (code != 16974599) {
            throw new IOException("Archive is no valid archive!");
        int fileCount = r.readInt();;
        r.writeInt(fileCount + 1);;
        RandomAccessFile bi = new RandomAccessFile(add, "r");
        byte[] swap = new byte[(int)bi.length()];
        return fileCount + 1;

    public static void main(String[] args) throws IOException {


I have created a function using fread() but then it runs out of memory after the first file. That is with the memory limit temporarily set at 512mb. Is there an alternative?


  • According to the Java code, the file format is as follows:

    • 0x01030307 (that is 16974599 in decimal representation)
    • 32-byte file count, little endian
    • File 1 32-byte length, little endian
    • File 1 name followed by 0x00
    • File 1 bytes
    • ...
    • File N 32-byte length, little endian
    • File N name followed by 0x00
    • File N bytes

    It is not an archive format but a simple concatenation of files with some metadata.

    To extract the files from such an 'archive,' we can use a PHP code like this:

    class MyArchiveHeader {
        public function __construct(
            private int $typeCode,
            private int $fileCount
        ) {}
        public function getTypeCode(): int
            return $this->typeCode;
        public function getFileCount(): int
            return $this->fileCount;
    class MyArchiveFile {
        public function __construct(
            private string $filename,
            private string $contents
        ) {}
        public function getFilename(): string
            return $this->filename;
        public function getContents(): string
            return $this->contents;
    class MyArchive {
        public function __construct(private string $filename) {}
        public function extractFiles(string $outputDirectory): void
            if (!is_dir($outputDirectory)) {
                throw new \InvalidArgumentException('Output directory does not exist');
            $file = new \SplFileObject($this->filename, 'rb');
            $header = $this->parseHeader($file);
            $fileCount = $header->getFileCount();
            for ($i = 0; $i < $fileCount; $i++) {
                $parsedFile = $this->parseFile($file);
                $outputFilename = $outputDirectory . DIRECTORY_SEPARATOR . $parsedFile->getFilename();
                file_put_contents($outputFilename, $parsedFile->getContents());
        private function parseHeader(\SplFileObject $file): MyArchiveHeader
            $typeCodeBytes = $file->fread(4);
            if ($typeCodeBytes === false) {
                throw new \RuntimeException('Could not read file type code');
            $typeCode = unpack('V', $typeCodeBytes)[1]; // Unpack 4 bytes as unsigned integer
            if ($typeCode !== 0x01030307) {
                throw new \RuntimeException('Invalid file type code');
            $fileCountBytes = $file->fread(4);
            if ($fileCountBytes === false) {
                throw new \RuntimeException('Could not read file count');
            $fileCount = unpack('V', $fileCountBytes)[1]; // Unpack 4 bytes as unsigned integer
            return new MyArchiveHeader($typeCode, $fileCount);
        private function parseFile(\SplFileObject $file): MyArchiveFile
            $fileLengthBytes = $file->fread(4);
            if ($fileLengthBytes === false) {
                throw new \RuntimeException('Could not read file length');
            $fileLength = unpack('V', $fileLengthBytes)[1]; // Unpack 4 bytes as unsigned integer
            $filename = "";
            while (!$file->eof()) {
                $char = $file->fread(1);
                if ($char === "\0") {
                $filename .= $char;
            // TODO Might need to convert $filename to UTF-8, for instance.
            $contents = $file->fread($fileLength);
            if ($contents === false) {
                throw new \RuntimeException('Could not read file contents');
            return new MyArchiveFile($filename, $contents);

    I haven't tested the code, but it should give you a good starting point. You can use it like this:

    $archiveFilename = 'archive';
    $outputDir = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'extracted';
    echo "Extracting archive to $outputDir\n";
    $archive = new MyArchive($archiveFilename);