I have a file containing lots of lines looking like this:
one two three four
one three four five
one one three four
one two three four
I have written a function that will need the input of the first and the last words on each line.
But only in the case where the second word is "two", So in the best of worlds, I should extract certain lines and remove the words, and should get this:
one four
one four
Since the word two can occur in other columns, I can't just search for the word and extract that line to a new file. Should I perhaps convert it to a csv somehow, and then work from there?
At the moment my script is only remove the first 4 columns of a text file
f = open("blah.txt", "r")
g = open("datafile_fixed.txt", "w")
for line in f:
if line.strip():
g.write(" ".join(line.split()[4:]) + "\n")
f.close()
g.close()
So I already remove part of the original file. Could I magically strip the lines even more to remove lines that I don't want? Most important is to be able to just have the lines left, then I could easy remove the second and third columns.
Just need to add a condition to check if second word is two:
with open('blah.txt', mode='r') as f, open('datafile_fixed.txt', mode='w') as g:
for line in f.readlines():
w1, w2, _, w4 = line.split()
if w2 == 'two':
g.write(w1 + ' ' + w4)
Here the key line is w1, w2, _, w4 = line.split()
. What split does is return a list of strings after breaking the given string by the specified separator (in this nothing, that by default refers to a space), so it will return a list with 4 elements ["one", "two", "three", "four"]
.
Python allows to unpack this list if you assign it to multiple variables (same number as the list length), so by doing w1, w2, w3, w4 = ["one", "two", "three", "four"]
you are assigning "one"
to w1, "two"
to w2 and so on.
Then, we just need to check if the second word is "two"
. if so, we will write in the new file. Otherwise we will do nothing to this line and skip to the next one in the loop, doing the same stuff.