I'm facing an issue with ensuring consistent cipher output between Node.js and PHP when reading large files by chunks. The output differs even though the same values are being read and processed. Below is the code for both PHP and Node.js implementations.
<?php
require 'vendor/autoload.php';
define('PLAINTEXT_DATA_KEY', 'poSENHhkGVG/4fEHvhRO6j9W3goETWZAg+ZgTWxhw34=');
define('IV', "X1bIRjgIoDn/BDFhHIbg7g==");
define('ALGORITHM', 'aes-256-cbc');
define('CHUNK_SIZE', 16 * 1024);
class Cipher
{
private function pkcs7_pad(string $data, int $blockSize)
{
$padLength = $blockSize - (strlen($data) % $blockSize);
return $data . str_repeat(chr($padLength), $padLength);
}
public function encrypt($source, $destination)
{
$inputFile = fopen($source, 'rb');
$outputFile = fopen($destination, 'wb');
try {
fwrite($outputFile, base64_decode(IV));
while (!feof($inputFile)) {
$buffer = fread($inputFile, CHUNK_SIZE);
// Pad the last chunk if it is not the block size
if (feof($inputFile)) {
$buffer = $this->pkcs7_pad($buffer, 16);
}
$cipherText = openssl_encrypt($buffer, ALGORITHM, PLAINTEXT_DATA_KEY, OPENSSL_NO_PADDING, base64_decode(IV));
fwrite($outputFile, $cipherText);
}
} catch (Exception $e) {
throw $e;
} finally {
fclose($inputFile);
fclose($outputFile);
}
}
}
?>
const PADDING_BLOCK_SIZE = 16;
const ALGORITHM = "aes-256-cbc";
const PLAINTEXT_DATA_KEY = "poSENHhkGVG/4fEHvhRO6j9W3goETWZAg+ZgTWxhw34=";
const IV = "X1bIRjgIoDn/BDFhHIbg7g=="; // randombytes(16) converted to base64
const CHUNK_SIZE = 16 * 1024;
class Cipher {
private pkcs7Pad(buffer: Buffer, blockSize: number = PADDING_BLOCK_SIZE): Buffer {
const padding = blockSize - (buffer.length % blockSize);
const padBuffer = Buffer.alloc(padding, padding);
return Buffer.concat([buffer, padBuffer]);
}
async encrypt(source: string, dest: string) {
return new Promise(async (res, rej) => {
const iv = base64ToBuffer(IV);
const cipher = createCipheriv(ALGORITHM, base64ToUint8Array(PLAINTEXT_DATA_KEY), iv);
cipher.setAutoPadding(false);
const readStream = createReadStream(source, { highWaterMark: CHUNK_SIZE });
const writeStream = createWriteStream(dest, { highWaterMark: CHUNK_SIZE });
writeStream.write(iv);
let tempChunkStorage = Buffer.alloc(0); // Buffer to store remaining data
readStream.on(DATA_EVENT, (chunk) => {
if (typeof chunk === "string") {
chunk = Buffer.from(chunk);
}
// Append the new chunk to the temp storage
tempChunkStorage = Buffer.concat([tempChunkStorage, chunk]);
while (tempChunkStorage.length >= CHUNK_SIZE) {
const block = tempChunkStorage.subarray(0, CHUNK_SIZE);
const encryptedBuffer = cipher.update(block);
writeStream.write(encryptedBuffer);
tempChunkStorage = tempChunkStorage.subarray(CHUNK_SIZE);
}
});
readStream.on("end", () => {
if (tempChunkStorage.length > 0) {
const encryptedBuffer = cipher.update(this.pkcs7Pad(tempChunkStorage)); // Add padding
writeStream.write(encryptedBuffer);
cipher.final();
}
writeStream.end();
res(true);
});
readStream.on("error", (err) => {
writeStream.close();
rej(err);
});
});
}
}
First 50 characters of the cipher (base64) in PHP: 0tCb9xtx5KpG+56ukYvcQDoNKCdoPtAFUrFDRc4TiqQrQocQRK
First 50 characters of the cipher (base64) in Node: sUUI4nXHwhKNdRs+Brqc5neKuKb3fx4qqBohlDSn/7FVrYo46/
There are several problems in both codes.
The Base64 decoding of the key is missing in the PHP code.
You should also change the encryption in the PHP code so that the same ciphertext is produced as with an encryption that encrypts the entire plaintext at once. With this change, any chunk size can be used (as long as it is an integer multiple of the block size).
This not only leads to a decoupling of encryption and decryption, but also simplifies the NodeJS implementation, as will be explained later.
To achieve this, the following changes are required for CBC/PKCS#7 padding:
n
-th ciphertext chunk must be used as IV of the n+1
-th ciphertext chunk.In addition, inconsistencies and inefficiencies should be eliminated (which also makes it easier to implement the above changes):
OPENSSL_NO_PADDING
flag (value: 3) in the PHP code, which is actually applied in the context of asymmetric encryption. The value of this flag corresponds to the bitwise OR-ing of the flags OPENSSL_RAW_DATA
(value: 1) and OPENSSL_ZERO_PADDING
(value: 2), which are intended for the context of symmetric encryption, i.e. the default Base64 encoding is disabled as well as the default PKCS#7 padding.Overall, the described changes in the PHP code can be implemented as follows:
...
$key = base64_decode(PLAINTEXT_DATA_KEY); // Base64 decode key
$iv = base64_decode(IV);
$inputFile = fopen($source, 'rb');
$outputFile = fopen($dest, 'wb');
fwrite($outputFile, $iv); // write initial IV
$options = OPENSSL_RAW_DATA | OPENSSL_ZERO_PADDING; // disable Base64 encoding and padding
while (!feof($inputFile)) {
$buffer = fread($inputFile, CHUNK_SIZE); // CHUNK_SIZE must be an integer multiple of blocksize (16 bytes for AES)
if (feof($inputFile)) {
$options = OPENSSL_RAW_DATA; // enable padding for the last chunk
}
$cipherText = openssl_encrypt($buffer, ALGORITHM, $key, $options, $iv);
$iv = substr($cipherText, -16); // determine IV for the next chunk
fwrite($outputFile, $cipherText); // write ciphertext chunk
}
fclose($inputFile);
fclose($outputFile);
...
As already mentioned above, one advantage of the changes made is the independence of the ciphertext from the chunk size.
This allows the chunk size to be handled internally on the NodeJS side, which significantly shortens the encryption code:
...
var key = Buffer.from(PLAINTEXT_DATA_KEY, 'base64');
var iv = Buffer.from(IV, 'base64');
var readStream = fs.createReadStream(pathPlaintextFile);
var writeStream = fs.createWriteStream(pathCiphertextFile);
writeStream.write(iv); // write IV
var cipher = crypto.createCipheriv('aes-256-cbc', key, iv);
readStream.pipe(cipher).pipe(writeStream); // write ciphertext chunk
...
With these changes, both sides produce identical ciphertexts (for identical input data).
Security:
In the event that the static IV is not only used for test purposes, it should be noted that the reuse of key/IV pairs is a vulnerability.
Therefore, for a fixed key no static IV should be used, but instead a random IV should be generated for each encryption.