Search code examples
pythonarraysfor-looptext-filesnested-loops

how to save individual columns from an input text file to individual output text files in python


I've just started using python (anaconda3) and I can't figure out the problem below which should be really simple... I have searched all over the internet for a solution but I can't find it.

Goal: I want my script to write individual columns (indexed via --column) from an input text file into respective output text files. The user can select any number of columns (with matching number of output files).

Example: python septc.py --infile infile.txt --column 0 2 3 --outfile out1.txt out2.txt out3.txt

My questions:

  1. How can I save individual col of the input file as defined by the --column vector in the respective output files?
  2. The index number of col given by the user will probably be off by 1 as users start counting col at 1 while python starts at 0 so choosing the last col would be out of bounds...though I could say in the help file of the script that counting starts with 0.

The script below is supposed to print the 1st, 3rd, and 4t col of the infile, which it does, but it writes all three col into each output file instead of 1st col into out1.txt, 3rd col into out2.txt, and 4th col into out3.txt. This is bc the inner loop is carried out for every instance of the outer loop. Similarly, changing the loop order writes the 4th col in each output file, which isn't what I want. I've tried other ways (e.g., for c in np.nditer(col)) but to no avail.

I suspect that this for loop approach isn't appropriate here. It should be something like for c in col write c into associated text file...but how to link a col with its output file?!

I'd be really grateful for your help!

Thank you much in advance,

Nic

cols = [0,2,3]
data = np.arange(20).reshape(5,4)
np.savetxt('infile.txt', data, delimiter='  ', fmt='%1.0f')
f = np.loadtxt('infile.txt')
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.],
       [ 16.,  17.,  18.,  19.]])

######### Script (shorter version) #########
#!/usr/bin/env python
import numpy as np
import sys
import argparse
# Parse cmd line arguments
p = argparse.ArgumentParser()
p.add_argument('--infile', nargs='?', action="store", default=sys.stdin)
p.add_argument('--column', nargs='+', action="store", type=int)
p.add_argument('--outfile', nargs='+', action="store", default=sys.stdout)
nargs = p.parse_args()
# Assign cmd line arguments to variables
col = nargs.column
outfile = nargs.outfile
infile = nargs.infile
with open(infile) as infile:
    data = np.loadtxt(infile)
# This is supposed to save each col into its respective output file ... supposed to ...
for out in outfile:
    with open(out, 'wb') as f:
        for c in col:
            y = data[:,c]
            np.savetxt(f, y, fmt='%1.0f')

Solution

  • You are iterating through all columns for each outfile. Try forming a relation between columns and outfiles by say using zip. Then just save text for respective columns to respective file.

    See more on builtin function zip here.

    for out, c in zip(outfile,col):
        with open(out, 'wb') as f:
            y = data[:,c]            
            np.savetxt(f, y, fmt='%1.0f')
    

    Hope this helps.

    Result:

    $ python col2files.py  --infile infile.txt --column 0 2 3 --outfile out1.txt out2.txt out3.txt
    
    $ cat out1.txt
    0
    4
    8
    12
    16
    
    $ cat out2.txt
    2
    6
    10
    14
    18
    
    $ cat out3.txt
    3
    7
    11
    15
    19