I've just started using python (anaconda3) and I can't figure out the problem below which should be really simple... I have searched all over the internet for a solution but I can't find it.
Goal: I want my script to write individual columns (indexed via --column) from an input text file into respective output text files. The user can select any number of columns (with matching number of output files).
Example: python septc.py --infile infile.txt --column 0 2 3 --outfile out1.txt out2.txt out3.txt
My questions:
The script below is supposed to print the 1st, 3rd, and 4t col of the infile, which it does, but it writes all three col into each output file instead of 1st col into out1.txt, 3rd col into out2.txt, and 4th col into out3.txt. This is bc the inner loop is carried out for every instance of the outer loop. Similarly, changing the loop order writes the 4th col in each output file, which isn't what I want. I've tried other ways (e.g., for c in np.nditer(col)) but to no avail.
I suspect that this for loop approach isn't appropriate here. It should be something like for c in col write c into associated text file...but how to link a col with its output file?!
I'd be really grateful for your help!
Thank you much in advance,
Nic
cols = [0,2,3]
data = np.arange(20).reshape(5,4)
np.savetxt('infile.txt', data, delimiter=' ', fmt='%1.0f')
f = np.loadtxt('infile.txt')
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.],
[ 16., 17., 18., 19.]])
######### Script (shorter version) #########
#!/usr/bin/env python
import numpy as np
import sys
import argparse
# Parse cmd line arguments
p = argparse.ArgumentParser()
p.add_argument('--infile', nargs='?', action="store", default=sys.stdin)
p.add_argument('--column', nargs='+', action="store", type=int)
p.add_argument('--outfile', nargs='+', action="store", default=sys.stdout)
nargs = p.parse_args()
# Assign cmd line arguments to variables
col = nargs.column
outfile = nargs.outfile
infile = nargs.infile
with open(infile) as infile:
data = np.loadtxt(infile)
# This is supposed to save each col into its respective output file ... supposed to ...
for out in outfile:
with open(out, 'wb') as f:
for c in col:
y = data[:,c]
np.savetxt(f, y, fmt='%1.0f')
You are iterating through all columns for each outfile. Try forming a relation between columns and outfiles by say using zip
. Then just save text for respective columns to respective file.
See more on builtin function zip
here.
for out, c in zip(outfile,col):
with open(out, 'wb') as f:
y = data[:,c]
np.savetxt(f, y, fmt='%1.0f')
Hope this helps.
Result:
$ python col2files.py --infile infile.txt --column 0 2 3 --outfile out1.txt out2.txt out3.txt
$ cat out1.txt
0
4
8
12
16
$ cat out2.txt
2
6
10
14
18
$ cat out3.txt
3
7
11
15
19