I have a input file that has data in 2 columns. I need to merge both the columns and remove the duplication. Any suggestions how to start with ? Thanks !
Input file
5045 2317
5045 1670
5045 2156
5045 1509
5045 3833
5045 1013
5045 3491
5045 32
5045 1482
5045 2495
5045 4280
5045 1380
5045 3998
Expected output
5045
2317
1670
2156
1509
3833
1013
3491
32
1482
2495
4280
1380
3998
To keep the order:
from itertools import chain
with open("in.txt") as f:
lines = list(chain.from_iterable(x.split() for x in f))
with open("in.txt","w") as f1:
for ind, line in enumerate(lines,1):
if not line in lines[:ind-1]:
f1.write(line+"\n")
output:
5045
2317
1670
2156
1509
3833
1013
3491
32
1482
2495
4280
1380
3998
If order does not matter:
from itertools import chain
with open("in.txt") as f:
lines = set(chain.from_iterable(x.split() for x in f))
with open("in.txt","w") as f1:
f1.writelines("\n".join(lines))
If there is only one number repeated in the first column:
with open("in.txt") as f:
col_1 = f.next().split()[0] # get first column number
lines = set(x.split()[1] for x in f) # get all second column nums
lines.add(col_1) # add first column num
with open("in.txt","w") as f1:
f1.writelines("\n".join(lines))