Search code examples
pythonstringnumpytext

Prevent newline rule to apply on header np.savetxt


I've got a .csv file which I wanna save as .txt Here's my original data: Org-data

I save this file as .txt with the following rules for newlines, coments, etc:

np.savetxt(r'/test/text.txt', df, newline=',\n', comments='',fmt='%f', header=''.join(f'{col}\t' for col in df.columns)[:-1])

Result-textfile

The problem is that I need all lines to have "," and the end of them except for the first line But in this situation, the newline rule which I specified in the code above applies to all the lines!

Know any way to prevent this to happen?

Or do you know another way to create the desired text file?

Example:

Consider this as the original data:

df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0},
 'bid:token': {0: 3, 1: 3, 2: 5}})

the output should look like this:

bid:token   NumberOfPages:float
3.000000 96.000000,
3.000000 96.000000,
5.000000 144.000000,

But I get this:

bid:token   NumberOfPages:float,
3.000000 96.000000,
3.000000 96.000000,
5.000000 144.000000,

*Note the "," symbol after float in the first line.


Solution

  • You could remove the character afterwards

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0},
     'bid:token': {0: 3, 1: 3, 2: 5}})
    
    np.savetxt('test.txt', df, newline=',\n', comments='',fmt='%f', header=''.join(f'{col}\t' for col in df.columns)[:-1])
    
    with open("test.txt", 'r+') as f:
        lines = f.readlines()
        lines[0] = lines[0].replace(',', '') # Only modify header
        f.seek(0)
        f.writelines(lines)
    

    Output:

    NumberOfPages:float bid:token
    96.000000 3.000000,
    96.000000 3.000000,
    144.000000 5.000000,
    

    Note that this could be slow for very large files, since f.readlines() should read all lines of the file. If it is possible to overwrite the comma with a space, you can also use this, which does not load the complete file into memory:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0},
     'bid:token': {0: 3, 1: 3, 2: 5}})
    
    np.savetxt('test.txt', df, newline=',\n', comments='',fmt='%f', header=''.join(f'{col}\t' for col in df.columns)[:-1])
    
    with open("test.txt", 'r+') as f:
        header = f.readline()
        f.seek(0)
        f.write(f"{header[:-2]} ")
    

    Output:

    NumberOfPages:float bid:token <--- beware this space
    96.000000 3.000000,
    96.000000 3.000000,
    144.000000 5.000000,