How can I reproduce Matlab file reading with Python and Numpy for decoding a .dat file?

I have a Matlab script that reads an encoded .dat file, decodes it and saves it. I was trying to translate this into Python using numpy. I've found that for the same file, I get different output results ( the python numbers make no sense). The code initially ran as part of a script reading from a serial port, hence the structure of the data.

I first thought bit shifting was the issue, due to indexing differences and since Numpy separates right and left shift into different functions, but that does not seem to be the case. I also tried to separate the bit shift and the dot product operations to confirm they were done in the correct order, but that was also not the issue. I then replaced the python .read() with np.fromfile() , thinking that not specifying the correct uint8 format at read time might corrupt the data. At this time I noticed while debugging that packet_data contains different values from what I get in Matlab. I assumed reshaping the data shuffled it, but in fact the matrix contains different values altogether. Of course, that means also packet_data_bytes is completely wrong. I am not sure why, but the same file gives different values when read in Python, as opposed to Matlab. I'm not sure what the differences in the read functions are, or if it has to do with how I open the file in the scripts.

Here is the code in both matlab and python.

Matlab:

'matlab'

% Define input and output file paths
inputFilePath = 'C:/Users/x/Downloads/Raw.dat'
outputFilePath = 'C:/Users/x/Downloads/decoded.dat'


% Open the input file
fileID = fopen(inputFilePath, 'r');

% Open the output file
outputFileID = fopen(outputFilePath, 'w');

% Check if files are successfully opened
if fileID == -1
    error('Cannot open the input file.');
end

if outputFileID == -1
    error('Cannot open the output file.');
end

% Constants
packetNumberBytesLength = 4;
packetDataBytesRows = 12;
packetDataBytesCols = 1024;
packetDataMasks = [1,2,4,8,16,32];
numSamples = 6;
numChannels = 1024;

% Read and process each packet
while ~feof(fileID)
    % Read packet number bytes
    packetNumberBytes = fread(fileID, packetNumberBytesLength, 'uint8');
    if numel(packetNumberBytes) < packetNumberBytesLength
        break;
    end
    
    % Read packet data bytes
    packetDataBytes = fread(fileID, [packetDataBytesRows, packetDataBytesCols], 'uint8');
    if numel(packetDataBytes) < packetDataBytesRows * packetDataBytesCols
        break;
    end
    
    % Decode packet number
    packetNumber = [1677216, 65536, 256, 1] * packetNumberBytes;
    
    % Decoding packet data
    Samples = zeros(numSamples, numChannels);
    for n = 1:numSamples
        Samples(n, :) = [2048,1024,512,256,128,64,32,16,8,4,2,1] * bitshift(bitand(packetDataBytes, packetDataMasks(n)), 1-n);
        Samples(n, numChannels) = 0; % Invalid sample in case of Single Cell scan mode
    end

    % Get current date and time
    currentDateTime = datestr(now, 'dd-mmm-yyyy HH:MM:SS.FFF');
    
    % Write decoded data to output file
    writematrix(currentDateTime, outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix([packetNumber, 1], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix(Samples(1, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix([3], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix(Samples(3, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix([5], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
    writematrix(Samples(5, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
end

% Close the input and output files
fclose(fileID);
fclose(outputFileID);

Python:

'python'

import numpy as np
import datetime

# Define input and output file paths
input_file_path = 'C:/Users/x/Downloads/Raw.dat'
output_file_path = 'C:/Users/x/Downloads/decoded.dat'

# Constants
packet_number_bytes_length = 4
packet_data_bytes_rows = 12
packet_data_bytes_cols = 1024
num_samples = 6
num_channels = 1024
packet_data_masks = [1, 2, 4, 8, 16, 32]

# Open the input and output files
try:
    with open(input_file_path, 'rb') as input_file, open(output_file_path, 'w') as output_file:

        # Read and process each packet
        while True:
            # Read packet number bytes
            packet_number_bytes = np.fromfile(input_file, dtype=np.uint8, count=packet_number_bytes_length)
            if len(packet_number_bytes) < packet_number_bytes_length:
                break

            # Read packet data bytes
            packet_data_bytes = np.fromfile(input_file, dtype=np.uint8,
                                            count=packet_data_bytes_rows * packet_data_bytes_cols)
            if len(packet_data_bytes) < packet_data_bytes_rows * packet_data_bytes_cols:
                break

            # Reshape the packet data into a 2D array
            packet_data = packet_data_bytes.reshape((packet_data_bytes_rows, packet_data_bytes_cols))

            # Decode packet number
            packet_number = np.dot([1677216, 65536, 256, 1], np.frombuffer(packet_number_bytes, dtype=np.uint8))

            # Decode packet data
            Samples = np.zeros((num_samples, num_channels), dtype=int)
            for n in range(num_samples):
                Samples[n, :] = np.dot(
                    [2048, 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1],
                    np.right_shift(np.bitwise_and(packet_data, packet_data_masks[n]), n)
                )
            Samples[:, num_channels - 1] = 0  # Invalid sample in case of Single Cell scan mode

            # Get current date and time
            current_date_time = datetime.datetime.now().strftime('%d-%b-%Y %H:%M:%S.%f')[:-3]

            # Write decoded data to output file
            output_file.write(current_date_time + '\n')
            output_file.write(f"{packet_number}\t1\n")
            np.savetxt(output_file, Samples[0, :].reshape(1, -1), delimiter='\t', fmt='%d')
            output_file.write('3\n')
            np.savetxt(output_file, Samples[2, :].reshape(1, -1), delimiter='\t', fmt='%d')
            output_file.write('5\n')
            np.savetxt(output_file, Samples[4, :].reshape(1, -1), delimiter='\t', fmt='%d')

except FileNotFoundError as e:
    print(f"Error: {e}")

Solution

It turns out, the numpy.reshape() was indeed the culprit. Initially, I got confused by the numpy.fromfile() output because the values looked so different, however I was dealing with a large 1D array and simply failed to recognize it in its 2D shape in Matlab.

Matlab's array read order versus Numpy's

Matlab's fread populates an array column-wise, also known as "Fortran style", whereas Numpy prefers to do it row-wise, also known as "C-style". After finding this, it was as simple as using the order attribute for the function in Python, making numpy.reshape() construct the matrix in the same manner as Matlab:

# Reshape the packet data into a 2D array
packet_data = packet_data_bytes.reshape((packet_data_bytes_rows, packet_data_bytes_cols), order='F')