Search code examples
pythonnumpytxtdata-files

What is the syntax for writing txt file with multiple numpy arrays+scalars and how to read it in again?


I have 2 numpy arrays of same length lets call them A and B and 2 scalar values named C and D. I want to store these values into a single txt file. I thought of the following structure:

enter image description here

It doesnt have to have this format I just thought its convenient and clear. I know how to write a the numpy arrays into a txt file and read them out again, but I struggle how to write the txt file as a combination of arrays and scalar values and how to read them out again from txt to numpy.

A = np.array([1, 2, 3, 4, 5])
B = np.array([5, 4, 3, 2, 1])
C = [6]
D = [7]
np.savetxt('file.txt', (A, B))
A_B_load = np.loadtxt('file.txt')
A_load = A_B_load[0,:]
B_load= A_B_load[1,:]

This doesnt give me the same column structure that I proposed but stores the arrays in rows but that doesnt really matter.

I found one solution which is a bit unhandy since I have to fill up the scalar values with 0 for them to become of the same length like the arrays A and B there must be a smarter solution.

    A = np.array([1, 2, 3, 4, 5])
    B = np.array([5, 4, 3, 2, 1])
    C = [6]
    D = [7]
    fill = np.zeros(len(A)-1)
    C = np.concatenate((C,fill))
    D = np.concatenate((D, fill))
    np.savetxt('file.txt', (A,B,C,D))
    A_B_load = np.loadtxt('file.txt')
    A_load = A_B_load[0,:]
    B_load = A_B_load[1,:]
    C_load = A_B_load[2,0]
    D_load = A_B_load[3,0]

Solution

  • In [123]: A = np.array([1, 2, 3, 4, 5])
         ...: B = np.array([5, 4, 3, 2, 1])
         ...: C = [6]
         ...: D = [7]
    

    savetxt is designed to write a 2d array in a consistent csv form - a neat table with the same number of columns in each row.

    In [124]: arr = np.stack((A,B), axis=1)
    In [125]: arr
    Out[125]: 
    array([[1, 5],
           [2, 4],
           [3, 3],
           [4, 2],
           [5, 1]])
    

    Here's one possible write format:

    In [126]: np.savetxt('foo.txt', arr, fmt='%d', header=f'{C} {D}', delimiter=',')
         ...: 
    In [127]: cat foo.txt
    # [6] [7]
    1,5
    2,4
    3,3
    4,2
    5,1
    

    I put the scalars in a header line, since they don't match with the arrays.

    loadtxt can recreate that arr array:

    In [129]: data = np.loadtxt('foo.txt', dtype=int, skiprows=1, delimiter=',')
    In [130]: data
    Out[130]: 
    array([[1, 5],
           [2, 4],
           [3, 3],
           [4, 2],
           [5, 1]])
    

    The header line can be read with:

    In [138]: with open('foo.txt') as f:
         ...:     header = f.readline().strip()
         ...:     line = header[1:]
         ...: 
    In [139]: line
    Out[139]: ' [6] [7]'
    

    I should have saved it as something that's simpler to parse, like '# 6,7'

    Your accepted answer creates a dataframe with nan values and blanks in the csv

    In [143]: import pandas as pd
    In [144]: df = pd.concat([pd.DataFrame(arr) for arr in [A,B,C,D]], axis=1)
         ...: df.to_csv("test.txt", na_rep="", sep=" ", header=False, index=False)
    In [145]: df
    Out[145]: 
       0  0    0    0
    0  1  5  6.0  7.0
    1  2  4  NaN  NaN
    2  3  3  NaN  NaN
    3  4  2  NaN  NaN
    4  5  1  NaN  NaN
    In [146]: cat test.txt
    1 5 6.0 7.0
    2 4  
    3 3  
    4 2  
    5 1 
    

    Note that np.nan is a float, so some of the columns are float as a result. loadtxt can't handle those "blank" columns; np.genfromtxt is better at that, but it needs a delimiter like , to mark them.

    Writing and reading the full length arrays is easy. But mixing types gets messy.

    Here's a format that would be easier to write and read:

    In [149]: arr = np.zeros((5,4),int)
         ...: for i,var in enumerate([A,B,C,D]):
         ...:     arr[:,i] = var
         ...: 
    In [150]: arr
    Out[150]: 
    array([[1, 5, 6, 7],
           [2, 4, 6, 7],
           [3, 3, 6, 7],
           [4, 2, 6, 7],
           [5, 1, 6, 7]])
    In [151]: np.savetxt('foo.txt', arr, fmt='%d', delimiter=',')
    In [152]: cat foo.txt
    1,5,6,7
    2,4,6,7
    3,3,6,7
    4,2,6,7
    5,1,6,7
    In [153]: np.loadtxt('foo.txt', delimiter=',', dtype=int)
    Out[153]: 
    array([[1, 5, 6, 7],
           [2, 4, 6, 7],
           [3, 3, 6, 7],
           [4, 2, 6, 7],
           [5, 1, 6, 7]])