So I've been working on a Python script that combines some information into a "bed" format. Which means that I'm working with features on a genome, my first column is the scaffold name (string), the second the start position on that scaffold (integer) and the third column is the stop position (integer), the other columns contain other information which is not relevant to my question. My issue is that my output is unsorted.
Now I know I can sort my files using this bash command:
$sort -k1,1 -k2,2n -k3,3n infile > outfile
But in the interest efficacy I'd like to know if there's a way to do this in Python. So far I've only seen list based sorts that deal with one either lexicographical or numerical sort. Not a combination of the two. So, do you guys have any ideas?
Snippet of my data (I want to sort by column 1, 2 and 3 (in that order)):
Scf_3R 8599253 8621866 FBgn0000014 FBgn0191744 -0.097558026153
Scf_3R 8497493 8503049 FBgn0000015 FBgn0025043 0.437973284047
Scf_3L 16209309 16236428 FBgn0000017 FBgn0184183 -1.19105585707
Scf_2L 10630469 10632308 FBgn0000018 FBgn0193617 0.073153454539
Scf_3R 12087670 12124207 FBgn0000024 FBgn0022516 -0.023946795475
Scf_X 14395665 14422243 FBgn0000028 FBgn0187465 0.00300558969397
Scf_3R 25163062 25165316 FBgn0000032 FBgn0189058 0.530118698187
Scf_3R 19757441 19808894 FBgn0000036 FBgn0189822 -0.282508464261
Load data, sort them with sorted
, write to a new file.
# Load data
lists = list()
with open(filename, 'r') as f:
for line in f:
lists.append(line.rstrip().split())
# Sort data
results = sorted(lists, key=lambda x:(x[0], int(x[1]), int(x[2])))
# Write to a file
import csv
with open(filename, 'w') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(results)